WorldWideScience

Sample records for audio-visual speech cue

  1. Audio-visual speech cue combination.

    Directory of Open Access Journals (Sweden)

    Derek H Arnold

    Full Text Available BACKGROUND: Different sources of sensory information can interact, often shaping what we think we have seen or heard. This can enhance the precision of perceptual decisions relative to those made on the basis of a single source of information. From a computational perspective, there are multiple reasons why this might happen, and each predicts a different degree of enhanced precision. Relatively slight improvements can arise when perceptual decisions are made on the basis of multiple independent sensory estimates, as opposed to just one. These improvements can arise as a consequence of probability summation. Greater improvements can occur if two initially independent estimates are summated to form a single integrated code, especially if the summation is weighted in accordance with the variance associated with each independent estimate. This form of combination is often described as a Bayesian maximum likelihood estimate. Still greater improvements are possible if the two sources of information are encoded via a common physiological process. PRINCIPAL FINDINGS: Here we show that the provision of simultaneous audio and visual speech cues can result in substantial sensitivity improvements, relative to single sensory modality based decisions. The magnitude of the improvements is greater than can be predicted on the basis of either a Bayesian maximum likelihood estimate or a probability summation. CONCLUSION: Our data suggest that primary estimates of speech content are determined by a physiological process that takes input from both visual and auditory processing, resulting in greater sensitivity than would be possible if initially independent audio and visual estimates were formed and then subsequently combined.

  2. Do gender differences in audio-visual benefit and visual influence in audio-visual speech perception emerge with age?

    Directory of Open Access Journals (Sweden)

    Magnus eAlm

    2015-07-01

    Full Text Available Gender and age have been found to affect adults’ audio-visual (AV speech perception. However, research on adult aging focuses on adults over 60 years, who have an increasing likelihood for cognitive and sensory decline, which may confound positive effects of age-related AV-experience and its interaction with gender. Observed age and gender differences in AV speech perception may also depend on measurement sensitivity and AV task difficulty. Consequently both AV benefit and visual influence were used to measure visual contribution for gender-balanced groups of young (20-30 years and middle-aged adults (50-60 years with task difficulty varied using AV syllables from different talkers in alternative auditory backgrounds. Females had better speech-reading performance than males. Whereas no gender differences in AV benefit or visual influence were observed for young adults, visually influenced responses were significantly greater for middle-aged females than middle-aged males. That speech-reading performance did not influence AV benefit may be explained by visual speech extraction and AV integration constituting independent abilities. Contrastingly, the gender difference in visually influenced responses in middle adulthood may reflect an experience-related shift in females’ general AV perceptual strategy. Although young females’ speech-reading proficiency may not readily contribute to greater visual influence, between young and middle-adulthood recurrent confirmation of the contribution of visual cues induced by speech-reading proficiency may gradually shift females AV perceptual strategy towards more visually dominated responses.

  3. Talker variability in audio-visual speech perception.

    Science.gov (United States)

    Heald, Shannon L M; Nusbaum, Howard C

    2014-01-01

    A change in talker is a change in the context for the phonetic interpretation of acoustic patterns of speech. Different talkers have different mappings between acoustic patterns and phonetic categories and listeners need to adapt to these differences. Despite this complexity, listeners are adept at comprehending speech in multiple-talker contexts, albeit at a slight but measurable performance cost (e.g., slower recognition). So far, this talker variability cost has been demonstrated only in audio-only speech. Other research in single-talker contexts have shown, however, that when listeners are able to see a talker's face, speech recognition is improved under adverse listening (e.g., noise or distortion) conditions that can increase uncertainty in the mapping between acoustic patterns and phonetic categories. Does seeing a talker's face reduce the cost of word recognition in multiple-talker contexts? We used a speeded word-monitoring task in which listeners make quick judgments about target word recognition in single- and multiple-talker contexts. Results show faster recognition performance in single-talker conditions compared to multiple-talker conditions for both audio-only and audio-visual speech. However, recognition time in a multiple-talker context was slower in the audio-visual condition compared to audio-only condition. These results suggest that seeing a talker's face during speech perception may slow recognition by increasing the importance of talker identification, signaling to the listener a change in talker has occurred. PMID:25076919

  4. Audio-Visual Speech Perception: A Developmental ERP Investigation

    Science.gov (United States)

    Knowland, Victoria C. P.; Mercure, Evelyne; Karmiloff-Smith, Annette; Dick, Fred; Thomas, Michael S. C.

    2014-01-01

    Being able to see a talking face confers a considerable advantage for speech perception in adulthood. However, behavioural data currently suggest that children fail to make full use of these available visual speech cues until age 8 or 9. This is particularly surprising given the potential utility of multiple informational cues during language…

  5. Audio-Visual Speech Intelligibility Benefits with Bilateral Cochlear Implants when Talker Location Varies

    OpenAIRE

    van Hoesel, Richard J. M.

    2015-01-01

    One of the key benefits of using cochlear implants (CIs) in both ears rather than just one is improved localization. It is likely that in complex listening scenes, improved localization allows bilateral CI users to orient toward talkers to improve signal-to-noise ratios and gain access to visual cues, but to date, that conjecture has not been tested. To obtain an objective measure of that benefit, seven bilateral CI users were assessed for both auditory-only and audio-visual speech intelligib...

  6. Intermodal timing relations and audio-visual speech recognition by normal-hearing adults.

    Science.gov (United States)

    McGrath, M; Summerfield, Q

    1985-02-01

    Audio-visual identification of sentences was measured as a function of audio delay in untrained observers with normal hearing; the soundtrack was replaced by rectangular pulses originally synchronized to the closing of the talker's vocal folds and then subjected to delay. When the soundtrack was delayed by 160 ms, identification scores were no better than when no acoustical information at all was provided. Delays of up to 80 ms had little effect on group-mean performance, but a separate analysis of a subgroup of better lipreaders showed a significant trend of reduced scores with increased delay in the range from 0-80 ms. A second experiment tested the interpretation that, although the main disruptive effect of the delay occurred on a syllabic time scale, better lipreaders might be attempting to use intermodal timing cues at a phonemic level. Normal-hearing observers determined whether a 120-Hz complex tone started before or after the opening of a pair of liplike Lissajou figures. Group-mean difference limens (70.7% correct DLs) were - 79 ms (sound leading) and + 138 ms (sound lagging), with no significant correlation between DLs and sentence lipreading scores. It was concluded that most observers, whether good lipreaders or not, possess insufficient sensitivity to intermodal timing cues in audio-visual speech for them to be used analogously to voice onset time in auditory speech perception. The results of both experiments imply that delays of up to about 40 ms introduced by signal-processing algorithms in aids to lipreading should not materially affect audio-visual speech understanding.

  7. Audio-visual speech perception: a developmental ERP investigation

    OpenAIRE

    Knowland, V.; Mercure, E.; Karmiloff-Smith, A; Dick, F; Thomas, M.

    2014-01-01

    Being able to see a talking face confers a considerable advantage for speech perception in adulthood. However, behavioural data currently suggest that children fail to make full use of these available visual speech cues until age 8 or 9. This is particularly surprising given the potential utility of multiple informational cues during language learning. We therefore explored this at the neural level. The event-related potential (ERP) technique has been used to assess the mechanisms of audio-vi...

  8. APPLICATION OF PARTIAL LEAST SQUARES REGRESSION FOR AUDIO-VISUAL SPEECH PROCESSING AND MODELING

    Directory of Open Access Journals (Sweden)

    A. L. Oleinik

    2015-09-01

    Full Text Available Subject of Research. The paper deals with the problem of lip region image reconstruction from speech signal by means of Partial Least Squares regression. Such problems arise in connection with development of audio-visual speech processing methods. Audio-visual speech consists of acoustic and visual components (called modalities. Applications of audio-visual speech processing methods include joint modeling of voice and lips’ movement dynamics, synchronization of audio and video streams, emotion recognition, liveness detection. Method. Partial Least Squares regression was applied to solve the posed problem. This method extracts components of initial data with high covariance. These components are used to build regression model. Advantage of this approach lies in the possibility of achieving two goals: identification of latent interrelations between initial data components (e.g. speech signal and lip region image and approximation of initial data component as a function of another one. Main Results. Experimental research on reconstruction of lip region images from speech signal was carried out on VidTIMIT audio-visual speech database. Results of the experiment showed that Partial Least Squares regression is capable of solving reconstruction problem. Practical Significance. Obtained findings give the possibility to assert that Partial Least Squares regression is successfully applicable for solution of vast variety of audio-visual speech processing problems: from synchronization of audio and video streams to liveness detection.

  9. An audio-visual corpus for multimodal speech recognition in Dutch language

    NARCIS (Netherlands)

    Wojdel, J.; Wiggers, P.; Rothkrantz, L.J.M.

    2002-01-01

    This paper describes the gathering and availability of an audio-visual speech corpus for Dutch language. The corpus was prepared with the multi-modal speech recognition in mind and it is currently used in our research on lip-reading and bimodal speech recognition. It contains the prompts used also i

  10. Effects of Audio-Visual Information on the Intelligibility of Alaryngeal Speech

    Science.gov (United States)

    Evitts, Paul M.; Portugal, Lindsay; Van Dine, Ami; Holler, Aline

    2010-01-01

    Background: There is minimal research on the contribution of visual information on speech intelligibility for individuals with a laryngectomy (IWL). Aims: The purpose of this project was to determine the effects of mode of presentation (audio-only, audio-visual) on alaryngeal speech intelligibility. Method: Twenty-three naive listeners were…

  11. Recognition of Isolated Words using Zernike and MFCC features for Audio Visual Speech Recognition

    OpenAIRE

    Bordea, Prashant; Varpeb, Amarsinh; Manzac, Ramesh; Yannawara, Pravin

    2014-01-01

    Automatic Speech Recognition (ASR) by machine is an attractive research topic in signal processing domain and has attracted many researchers to contribute in this area. In recent year, there have been many advances in automatic speech reading system with the inclusion of audio and visual speech features to recognize words under noisy conditions. The objective of audio-visual speech recognition system is to improve recognition accuracy. In this paper we computed visual features using Zernike m...

  12. Audio-visual integration of speech with time-varying sine wave speech replicas

    Science.gov (United States)

    Tuomainen, Jyrki; Andersen, Tobias; Tiippana, Kaisa; Sams, Mikko

    2002-11-01

    We tested whether listener's knowledge about the nature of the auditory stimuli had an effect on audio-visual (AV) integration of speech. First, subjects were taught to categorize two sine-wave (sw) replicas of the real speech tokens /omso/ and /onso/ into two arbitrary nonspeech categories without knowledge of the speech-like nature of the sounds. A test with congruent and incongruent AV-stimulus condition (together with auditory-only presentations of the sw stimuli) demonstrated no AV integration, but instead close to perfect categorization of stimuli in the two arbitrary categories according to the auditory presentation channel. Then, the same subjects (of which most were still under the impression that the sw-stimuli were nonspeech sounds) were taught to categorize the sw stimuli as /omso/ and /onso/, and again tested with the same AV stimuli as used in the nonspeech sw condition. This time, subjects showed highly reliable AV integration similar to integration obtained with real speech stimuli in a separate test. We suggest that AV integration only occurs when subject are in a so-called ''speech mode.''

  13. ANALYSIS OF MULTIMODAL FUSION TECHNIQUES FOR AUDIO-VISUAL SPEECH RECOGNITION

    Directory of Open Access Journals (Sweden)

    D.V. Ivanko

    2016-05-01

    Full Text Available The paper deals with analytical review, covering the latest achievements in the field of audio-visual (AV fusion (integration of multimodal information. We discuss the main challenges and report on approaches to address them. One of the most important tasks of the AV integration is to understand how the modalities interact and influence each other. The paper addresses this problem in the context of AV speech processing and speech recognition. In the first part of the review we set out the basic principles of AV speech recognition and give the classification of audio and visual features of speech. Special attention is paid to the systematization of the existing techniques and the AV data fusion methods. In the second part we provide a consolidated list of tasks and applications that use the AV fusion based on carried out analysis of research area. We also indicate used methods, techniques, audio and video features. We propose classification of the AV integration, and discuss the advantages and disadvantages of different approaches. We draw conclusions and offer our assessment of the future in the field of AV fusion. In the further research we plan to implement a system of audio-visual Russian continuous speech recognition using advanced methods of multimodal fusion.

  14. Audio-Visual Speech Recognition Using Lip Information Extracted from Side-Face Images

    Directory of Open Access Journals (Sweden)

    Koji Iwano

    2007-03-01

    Full Text Available This paper proposes an audio-visual speech recognition method using lip information extracted from side-face images as an attempt to increase noise robustness in mobile environments. Our proposed method assumes that lip images can be captured using a small camera installed in a handset. Two different kinds of lip features, lip-contour geometric features and lip-motion velocity features, are used individually or jointly, in combination with audio features. Phoneme HMMs modeling the audio and visual features are built based on the multistream HMM technique. Experiments conducted using Japanese connected digit speech contaminated with white noise in various SNR conditions show effectiveness of the proposed method. Recognition accuracy is improved by using the visual information in all SNR conditions. These visual features were confirmed to be effective even when the audio HMM was adapted to noise by the MLLR method.

  15. Audio-Visual Speech Recognition Using Lip Information Extracted from Side-Face Images

    Directory of Open Access Journals (Sweden)

    Iwano Koji

    2007-01-01

    Full Text Available This paper proposes an audio-visual speech recognition method using lip information extracted from side-face images as an attempt to increase noise robustness in mobile environments. Our proposed method assumes that lip images can be captured using a small camera installed in a handset. Two different kinds of lip features, lip-contour geometric features and lip-motion velocity features, are used individually or jointly, in combination with audio features. Phoneme HMMs modeling the audio and visual features are built based on the multistream HMM technique. Experiments conducted using Japanese connected digit speech contaminated with white noise in various SNR conditions show effectiveness of the proposed method. Recognition accuracy is improved by using the visual information in all SNR conditions. These visual features were confirmed to be effective even when the audio HMM was adapted to noise by the MLLR method.

  16. The effect of combined sensory and semantic components on audio-visual speech perception in older adults

    Directory of Open Access Journals (Sweden)

    Corrina eMaguinness

    2011-12-01

    Full Text Available Previous studies have found that perception in older people benefits from multisensory over uni-sensory information. As normal speech recognition is affected by both the auditory input and the visual lip-movements of the speaker, we investigated the efficiency of audio and visual integration in an older population by manipulating the relative reliability of the auditory and visual information in speech. We also investigated the role of the semantic context of the sentence to assess whether audio-visual integration is affected by top-down semantic processing. We presented participants with audio-visual sentences in which the visual component was either blurred or not blurred. We found that there was a greater cost in recall performance for semantically meaningless speech in the audio-visual blur compared to audio-visual no blur condition and this effect was specific to the older group. Our findings have implications for understanding how aging affects efficient multisensory integration for the perception of speech and suggests that multisensory inputs may benefit speech perception in older adults when the semantic content of the speech is unpredictable.

  17. Audio-visual speech timing sensitivity is enhanced in cluttered conditions.

    Directory of Open Access Journals (Sweden)

    Warrick Roseboom

    Full Text Available Events encoded in separate sensory modalities, such as audition and vision, can seem to be synchronous across a relatively broad range of physical timing differences. This may suggest that the precision of audio-visual timing judgments is inherently poor. Here we show that this is not necessarily true. We contrast timing sensitivity for isolated streams of audio and visual speech, and for streams of audio and visual speech accompanied by additional, temporally offset, visual speech streams. We find that the precision with which synchronous streams of audio and visual speech are identified is enhanced by the presence of additional streams of asynchronous visual speech. Our data suggest that timing perception is shaped by selective grouping processes, which can result in enhanced precision in temporally cluttered environments. The imprecision suggested by previous studies might therefore be a consequence of examining isolated pairs of audio and visual events. We argue that when an isolated pair of cross-modal events is presented, they tend to group perceptually and to seem synchronous as a consequence. We have revealed greater precision by providing multiple visual signals, possibly allowing a single auditory speech stream to group selectively with the most synchronous visual candidate. The grouping processes we have identified might be important in daily life, such as when we attempt to follow a conversation in a crowded room.

  18. Audio-visual perception of compressed speech by profoundly hearing-impaired subjects.

    Science.gov (United States)

    Drullman, R; Smoorenburg, G F

    1997-01-01

    For many people with profound hearing loss conventional hearing aids give only little support in speechreading. This study aims at optimizing the presentation of speech signals in the severely reduced dynamic range of the profoundly hearing impaired by means of multichannel compression and multichannel amplification. The speech signal in each of six 1-octave channels (125-4000 Hz) was compressed instantaneously, using compression ratios of 1, 2, 3, or 5, and a compression threshold of 35 dB below peak level. A total of eight conditions were composed in which the compression ratio varied per channel. Sentences were presented audio-visually to 16 profoundly hearing-impaired subjects and syllable intelligibility was measured. Results show that all auditory signals are valuable supplements to speechreading. No clear overall preference is found for any of the compression conditions, but relatively high compression ratios (> 3-5) have a significantly detrimental effect. Inspection of the individual results reveals that compression may be beneficial for one subject.

  19. Neural entrainment to rhythmically-presented auditory, visual and audio-visual speech in children

    Directory of Open Access Journals (Sweden)

    Alan James Power

    2012-07-01

    Full Text Available Auditory cortical oscillations have been proposed to play an important role in speech perception. It is suggested that the brain may take temporal ‘samples’ of information from the speech stream at different rates, phase-resetting ongoing oscillations so that they are aligned with similar frequency bands in the input (‘phase locking’. Information from these frequency bands is then bound together for speech perception. To date, there are no explorations of neural phase-locking and entrainment to speech input in children. However, it is clear from studies of language acquisition that infants use both visual speech information and auditory speech information in learning. In order to study neural entrainment to speech in typically-developing children, we use a rhythmic entrainment paradigm (underlying 2 Hz or delta rate based on repetition of the syllable ba, presented in either the auditory modality alone, the visual modality alone, or as auditory-visual speech (via a talking head. To ensure attention to the task, children aged 13 years were asked to press a button as fast as possible when the ba stimulus violated the rhythm for each stream type. Rhythmic violation depended on delaying the occurrence of a ba in the isochronous stream. Neural entrainment was demonstrated for all stream types, and individual differences in standardized measures of language processing were related to auditory entrainment at the theta rate. Further, there was significant modulation of the preferred phase of auditory entrainment in the theta band when visual speech cues were present, indicating cross-modal phase resetting. The rhythmic entrainment paradigm developed here offers a method for exploring individual differences in oscillatory phase locking during development. In particular, a method for assessing neural entrainment and cross-modal phase resetting would be useful for exploring developmental learning difficulties thought to involve temporal sampling

  20. Superior temporal activation in response to dynamic audio-visual emotional cues

    OpenAIRE

    Robins, Diana L.; Hunyadi, Elinora; Schultz, Robert T.

    2008-01-01

    Perception of emotion is critical for successful social interaction, yet the neural mechanisms underlying the perception of dynamic, audiovisual emotional cues are poorly understood. Evidence from language and sensory paradigms suggests that the superior temporal sulcus and gyrus (STS/STG) play a key role in the integration of auditory and visual cues. Emotion perception research has focused on static facial cues; however, dynamic audiovisual (AV) cues mimic real-world social cues more accura...

  1. Large Scale Functional Brain Networks Underlying Temporal Integration of Audio-Visual Speech Perception: An EEG Study

    Science.gov (United States)

    Kumar, G. Vinodh; Halder, Tamesh; Jaiswal, Amit K.; Mukherjee, Abhishek; Roy, Dipanjan; Banerjee, Arpan

    2016-01-01

    Observable lip movements of the speaker influence perception of auditory speech. A classical example of this influence is reported by listeners who perceive an illusory (cross-modal) speech sound (McGurk-effect) when presented with incongruent audio-visual (AV) speech stimuli. Recent neuroimaging studies of AV speech perception accentuate the role of frontal, parietal, and the integrative brain sites in the vicinity of the superior temporal sulcus (STS) for multisensory speech perception. However, if and how does the network across the whole brain participates during multisensory perception processing remains an open question. We posit that a large-scale functional connectivity among the neural population situated in distributed brain sites may provide valuable insights involved in processing and fusing of AV speech. Varying the psychophysical parameters in tandem with electroencephalogram (EEG) recordings, we exploited the trial-by-trial perceptual variability of incongruent audio-visual (AV) speech stimuli to identify the characteristics of the large-scale cortical network that facilitates multisensory perception during synchronous and asynchronous AV speech. We evaluated the spectral landscape of EEG signals during multisensory speech perception at varying AV lags. Functional connectivity dynamics for all sensor pairs was computed using the time-frequency global coherence, the vector sum of pairwise coherence changes over time. During synchronous AV speech, we observed enhanced global gamma-band coherence and decreased alpha and beta-band coherence underlying cross-modal (illusory) perception compared to unisensory perception around a temporal window of 300–600 ms following onset of stimuli. During asynchronous speech stimuli, a global broadband coherence was observed during cross-modal perception at earlier times along with pre-stimulus decreases of lower frequency power, e.g., alpha rhythms for positive AV lags and theta rhythms for negative AV lags. Thus

  2. A Novel Algorithm for Acoustic and Visual Classifiers Decision Fusion in Audio-Visual Speech Recognition System

    Directory of Open Access Journals (Sweden)

    P.S. Sathidevi

    2010-03-01

    Full Text Available Audio-visual speech recognition (AVSR using acoustic and visual signals of speech have received attention recently because of its robustness in noisy environments. Perceptual studies also support this approach by emphasizing the importance of visual information for speech recognition in humans. An important issue in decision fusion based AVSR system is how to obtain the appropriate integration weight for the speech modalities to integrate and ensure the combined AVSR system’s performances better than that of the audio-only and visual-only systems under various noise conditions. To solve this issue, we present a genetic algorithm (GA based optimization scheme to obtain the appropriate integration weight from the relative reliability of each modality. The performance of the proposed GA optimized reliability-ratio based weight estimation scheme is demonstrated via single speaker, mobile functions isolated word recognition experiments. The results show that the proposed scheme improves robust recognition accuracy over the conventional unimodal systems and the baseline reliability ratio-based AVSR system under various signal to noise ratio conditions.

  3. The Effect of Onset Asynchrony in Audio Visual Speech and the Uncanny Valley in Virtual Characters

    DEFF Research Database (Denmark)

    Tinwell, Angela; Grimshaw, Mark; Abdel Nabi, Deborah

    2015-01-01

    sensitive to the uncanny in characters when the audio stream preceded the visual stream than with asynchronous footage where the video stream preceded the audio stream. This paper considers possible psychological explanations as to why the magnitude and direction of an asynchrony of speech dictates...

  4. Audio-visual speech perception in infants and toddlers with Down syndrome, fragile X syndrome, and Williams syndrome.

    Science.gov (United States)

    D'Souza, Dean; D'Souza, Hana; Johnson, Mark H; Karmiloff-Smith, Annette

    2016-08-01

    Typically-developing (TD) infants can construct unified cross-modal percepts, such as a speaking face, by integrating auditory-visual (AV) information. This skill is a key building block upon which higher-level skills, such as word learning, are built. Because word learning is seriously delayed in most children with neurodevelopmental disorders, we assessed the hypothesis that this delay partly results from a deficit in integrating AV speech cues. AV speech integration has rarely been investigated in neurodevelopmental disorders, and never previously in infants. We probed for the McGurk effect, which occurs when the auditory component of one sound (/ba/) is paired with the visual component of another sound (/ga/), leading to the perception of an illusory third sound (/da/ or /tha/). We measured AV integration in 95 infants/toddlers with Down, fragile X, or Williams syndrome, whom we matched on Chronological and Mental Age to 25 TD infants. We also assessed a more basic AV perceptual ability: sensitivity to matching vs. mismatching AV speech stimuli. Infants with Williams syndrome failed to demonstrate a McGurk effect, indicating poor AV speech integration. Moreover, while the TD children discriminated between matching and mismatching AV stimuli, none of the other groups did, hinting at a basic deficit or delay in AV speech processing, which is likely to constrain subsequent language development. PMID:27498221

  5. Audio-visual gender recognition

    Science.gov (United States)

    Liu, Ming; Xu, Xun; Huang, Thomas S.

    2007-11-01

    Combining different modalities for pattern recognition task is a very promising field. Basically, human always fuse information from different modalities to recognize object and perform inference, etc. Audio-Visual gender recognition is one of the most common task in human social communication. Human can identify the gender by facial appearance, by speech and also by body gait. Indeed, human gender recognition is a multi-modal data acquisition and processing procedure. However, computational multimodal gender recognition has not been extensively investigated in the literature. In this paper, speech and facial image are fused to perform a mutli-modal gender recognition for exploring the improvement of combining different modalities.

  6. Audio-Visual and Meaningful Semantic Context Enhancements in Older and Younger Adults.

    Science.gov (United States)

    Smayda, Kirsten E; Van Engen, Kristin J; Maddox, W Todd; Chandrasekaran, Bharath

    2016-01-01

    Speech perception is critical to everyday life. Oftentimes noise can degrade a speech signal; however, because of the cues available to the listener, such as visual and semantic cues, noise rarely prevents conversations from continuing. The interaction of visual and semantic cues in aiding speech perception has been studied in young adults, but the extent to which these two cues interact for older adults has not been studied. To investigate the effect of visual and semantic cues on speech perception in older and younger adults, we recruited forty-five young adults (ages 18-35) and thirty-three older adults (ages 60-90) to participate in a speech perception task. Participants were presented with semantically meaningful and anomalous sentences in audio-only and audio-visual conditions. We hypothesized that young adults would outperform older adults across SNRs, modalities, and semantic contexts. In addition, we hypothesized that both young and older adults would receive a greater benefit from a semantically meaningful context in the audio-visual relative to audio-only modality. We predicted that young adults would receive greater visual benefit in semantically meaningful contexts relative to anomalous contexts. However, we predicted that older adults could receive a greater visual benefit in either semantically meaningful or anomalous contexts. Results suggested that in the most supportive context, that is, semantically meaningful sentences presented in the audiovisual modality, older adults performed similarly to young adults. In addition, both groups received the same amount of visual and meaningful benefit. Lastly, across groups, a semantically meaningful context provided more benefit in the audio-visual modality relative to the audio-only modality, and the presence of visual cues provided more benefit in semantically meaningful contexts relative to anomalous contexts. These results suggest that older adults can perceive speech as well as younger adults when both

  7. Audio-Visual Tibetan Speech Recognition Based on a Deep Dynamic Bayesian Network for Natural Human Robot Interaction

    Directory of Open Access Journals (Sweden)

    Yue Zhao

    2012-12-01

    Full Text Available Audio‐visual speech recognition is a natural and robust approach to improving human-robot interaction in noisy environments. Although multi‐stream Dynamic Bayesian Network and coupled HMM are widely used for audio‐visual speech recognition, they fail to learn the shared features between modalities and ignore the dependency of features among the frames within each discrete state. In this paper, we propose a Deep Dynamic Bayesian Network (DDBN to perform unsupervised extraction of spatial‐temporal multimodal features from Tibetan audio‐visual speech data and build an accurate audio‐visual speech recognition model under a no frame‐independency assumption. The experiment results on Tibetan speech data from some real‐world environments showed the proposed DDBN outperforms the state‐of‐art methods in word recognition accuracy.

  8. Basic to Applied Research: The Benefits of Audio-Visual Speech Perception Research in Teaching Foreign Languages

    Science.gov (United States)

    Erdener, Dogu

    2016-01-01

    Traditionally, second language (L2) instruction has emphasised auditory-based instruction methods. However, this approach is restrictive in the sense that speech perception by humans is not just an auditory phenomenon but a multimodal one, and specifically, a visual one as well. In the past decade, experimental studies have shown that the…

  9. Audio-Visual Aids: Historians in Blunderland.

    Science.gov (United States)

    Decarie, Graeme

    1988-01-01

    A history professor relates his experiences producing and using audio-visual material and warns teachers not to rely on audio-visual aids for classroom presentations. Includes examples of popular audio-visual aids on Canada that communicate unintended, inaccurate, or unclear ideas. Urges teachers to exercise caution in the selection and use of…

  10. [Audio-visual aids and tropical medicine].

    Science.gov (United States)

    Morand, J J

    1989-01-01

    The author presents a list of the audio-visual productions about Tropical Medicine, as well as of their main characteristics. He thinks that the audio-visual educational productions are often dissociated from their promotion; therefore, he invites the future creator to forward his work to the Audio-Visual Health Committee.

  11. Alfasecuencialización: la enseñanza del cine en la era del audiovisual Sequential literacy: the teaching of cinema in the age of audio-visual speech

    Directory of Open Access Journals (Sweden)

    José Antonio Palao Errando

    2007-10-01

    Full Text Available En la llamada «sociedad de la información» los estudios sobre cine se han visto diluidos en el abordaje pragmático y tecnológico del discurso audiovisual, así como la propia fruición del cine se ha visto atrapada en la red del DVD y del hipertexto. El propio cine reacciona ante ello a través de estructuras narrativas complejas que lo alejan del discurso audiovisual estándar. La función de los estudios sobre cine y de su enseñanza universitaria debe ser la reintroducción del sujeto rechazado del saber informativo por medio de la interpretación del texto fílmico. In the so called «information society», film studies have been diluted in the pragmatic and technological approaching of the audiovisual speech, as well as the own fruition of the cinema has been caught in the net of DVD and hypertext. The cinema itself reacts in the face of it through complex narrative structures that take it away from the standard audio-visual speech. The function of film studies at the university education should be the reintroduction of the rejected subject of the informative knowledge by means of the interpretation of film text.

  12. Audio-Visual Aids in Universities

    Science.gov (United States)

    Douglas, Jackie

    1970-01-01

    A report on the proceedings and ideas expressed at a one day seminar on "Audio-Visual Equipment--Its Uses and Applications for Teaching and Research in Universities." The seminar was organized by England's National Committee for Audio-Visual Aids in Education in conjunction with the British Universities Film Council. (LS)

  13. Temporal structure and complexity affect audio-visual correspondence detection

    Directory of Open Access Journals (Sweden)

    Rachel N Denison

    2013-01-01

    Full Text Available Synchrony between events in different senses has long been considered the critical temporal cue for multisensory integration. Here, using rapid streams of auditory and visual events, we demonstrate how humans can use temporal structure (rather than mere temporal coincidence to detect multisensory relatedness. We find psychophysically that participants can detect matching auditory and visual streams via shared temporal structure for crossmodal lags of up to 200 ms. Performance on this task reproduced features of past findings based on explicit timing judgments but did not show any special advantage for perfectly synchronous streams. Importantly, the complexity of temporal patterns influences sensitivity to correspondence. Stochastic, irregular streams – with richer temporal pattern information – led to higher audio-visual matching sensitivity than predictable, rhythmic streams. Our results reveal that temporal structure and its complexity are key determinants for human detection of audio-visual correspondence. The distinctive emphasis of our new paradigms on temporal patterning could be useful for studying special populations with suspected abnormalities in audio-visual temporal perception and multisensory integration.

  14. Joint evaluation of communication quality and user experience in an audio-visual virtual reality meeting

    DEFF Research Database (Denmark)

    Møller, Anders Kalsgaard; Hoffmann, Pablo F.; Carrozzino, Marcello;

    2013-01-01

    The state-of-the-art speech intelligibility tests are created with the purpose of evaluating acoustic communication devices and not for evaluating audio-visual virtual reality systems. This paper present a novel method to evaluate a communication situation based on both the speech intelligibility...

  15. Crossmodal and incremental perception of audiovisual cues to emotional speech

    NARCIS (Netherlands)

    Barkhuysen, Pashiera; Krahmer, E.J.; Swerts, M.G.J.

    2010-01-01

    In this article we report on two experiments about the perception of audiovisual cues to emotional speech. The article addresses two questions: (1) how do visual cues from a speaker's face to emotion relate to auditory cues, and (2) what is the recognition speed for various facial cues to emotion? B

  16. Crossmodal and Incremental Perception of Audiovisual Cues to Emotional Speech

    Science.gov (United States)

    Barkhuysen, Pashiera; Krahmer, Emiel; Swerts, Marc

    2010-01-01

    In this article we report on two experiments about the perception of audiovisual cues to emotional speech. The article addresses two questions: (1) how do visual cues from a speaker's face to emotion relate to auditory cues, and (2) what is the recognition speed for various facial cues to emotion? Both experiments reported below are based on tests…

  17. Audio-visual classification video browser

    OpenAIRE

    Scott, David; Zhang, ZhenXing; Albatal, Rami; McGuinness, Kevin; Acar, Esra; Hopfgartner, Frank; Gurrin, Cathal; O'Connor, Noel; Smeaton, Alan

    2014-01-01

    This paper presents our third participation in the Video Browser Showdown. Building on the experience that we gained while participating in this event, we compete in the 2014 showdown with a more advanced browsing system based on incorporating several audio- visual retrieval techniques. This paper provides a short overview of the features and functionality of our new system.

  18. Stream Weight Training Based on MCE for Audio-Visual LVCSR

    Institute of Scientific and Technical Information of China (English)

    LIU Peng; WANG Zuoying

    2005-01-01

    In this paper we address the problem of audio-visual speech recognition in the framework of the multi-stream hidden Markov model. Stream weight training based on minimum classification error criterion is discussed for use in large vocabulary continuous speech recognition (LVCSR). We present the lattice re-scoring and Viterbi approaches for calculating the loss function of continuous speech. The experimental results show that in the case of clean audio, the system performance can be improved by 36.1% in relative word error rate reduction when using state-based stream weights trained by a Viterbi approach, compared to an audio only speech recognition system. Further experimental results demonstrate that our audio-visual LVCSR system provides significant enhancement of robustness in noisy environments.

  19. Segmentation cues in conversational speech: Robust semantics and fragile phonotactics

    Directory of Open Access Journals (Sweden)

    Laurence eWhite

    2012-10-01

    Full Text Available Multiple cues influence listeners’ segmentation of connected speech into words, but most previous studies have used stimuli elicited in careful readings rather than natural conversation. Discerning word boundaries in conversational speech may differ from the laboratory setting. In particular, a speaker’s articulatory effort – hyperarticulation vs hypoarticulation (H&H – may vary according to communicative demands, suggesting a compensatory relationship whereby acoustic-phonetic cues are attenuated when other information sources strongly guide segmentation. We examined how listeners’ interpretation of segmentation cues is affected by speech style (spontaneous conversation vs read, using cross-modal identity priming. To elicit spontaneous stimuli, we used a map task in which speakers discussed routes around stylised landmarks. These landmarks were two-word phrases in which the strength of potential segmentation cues – semantic likelihood and cross-boundary diphone phonotactics – was systematically varied. Landmark-carrying utterances were transcribed and later re-recorded as read speech.Independent of speech style, we found an interaction between cue valence (favourable/unfavourable and cue type (phonotactics/semantics. Thus, there was an effect of semantic plausibility, but no effect of cross-boundary phonotactics, indicating that the importance of phonotactic segmentation may have been overstated in studies where lexical information was artificially suppressed. These patterns were unaffected by whether the stimuli were elicited in a spontaneous or read context, even though the difference in speech styles was evident in a main effect. Durational analyses suggested speaker-driven cue trade-offs congruent with an H&H account, but these modulations did not impact on listener behaviour. We conclude that previous research exploiting read speech is reliable in indicating the primacy of lexically-based cues in the segmentation of natural

  20. Audio-visual affective expression recognition

    Science.gov (United States)

    Huang, Thomas S.; Zeng, Zhihong

    2007-11-01

    Automatic affective expression recognition has attracted more and more attention of researchers from different disciplines, which will significantly contribute to a new paradigm for human computer interaction (affect-sensitive interfaces, socially intelligent environments) and advance the research in the affect-related fields including psychology, psychiatry, and education. Multimodal information integration is a process that enables human to assess affective states robustly and flexibly. In order to understand the richness and subtleness of human emotion behavior, the computer should be able to integrate information from multiple sensors. We introduce in this paper our efforts toward machine understanding of audio-visual affective behavior, based on both deliberate and spontaneous displays. Some promising methods are presented to integrate information from both audio and visual modalities. Our experiments show the advantage of audio-visual fusion in affective expression recognition over audio-only or visual-only approaches.

  1. Learning bimodal structure in audio-visual data

    OpenAIRE

    Monaci, Gianluca; Vandergheynst, Pierre; Sommer, Friederich T.

    2009-01-01

    A novel model is presented to learn bimodally informative structures from audio-visual signals. The signal is represented as a sparse sum of audio- visual kernels. Each kernel is a bimodal function consisting of synchronous snippets of an audio waveform and a spatio-temporal visual basis function. To represent an audio-visual signal, the kernels can be positioned independently and arbitrarily in space and time. The proposed algorithm uses unsupervised learning to form dicti...

  2. The contribution of dynamic visual cues to audiovisual speech perception.

    Science.gov (United States)

    Jaekl, Philip; Pesquita, Ana; Alsius, Agnes; Munhall, Kevin; Soto-Faraco, Salvador

    2015-08-01

    Seeing a speaker's facial gestures can significantly improve speech comprehension, especially in noisy environments. However, the nature of the visual information from the speaker's facial movements that is relevant for this enhancement is still unclear. Like auditory speech signals, visual speech signals unfold over time and contain both dynamic configural information and luminance-defined local motion cues; two information sources that are thought to engage anatomically and functionally separate visual systems. Whereas, some past studies have highlighted the importance of local, luminance-defined motion cues in audiovisual speech perception, the contribution of dynamic configural information signalling changes in form over time has not yet been assessed. We therefore attempted to single out the contribution of dynamic configural information to audiovisual speech processing. To this aim, we measured word identification performance in noise using unimodal auditory stimuli, and with audiovisual stimuli. In the audiovisual condition, speaking faces were presented as point light displays achieved via motion capture of the original talker. Point light displays could be isoluminant, to minimise the contribution of effective luminance-defined local motion information, or with added luminance contrast, allowing the combined effect of dynamic configural cues and local motion cues. Audiovisual enhancement was found in both the isoluminant and contrast-based luminance conditions compared to an auditory-only condition, demonstrating, for the first time the specific contribution of dynamic configural cues to audiovisual speech improvement. These findings imply that globally processed changes in a speaker's facial shape contribute significantly towards the perception of articulatory gestures and the analysis of audiovisual speech.

  3. Cues That Language Users Exploit to Segment Speech

    Institute of Scientific and Technical Information of China (English)

    陈冰茹

    2015-01-01

    <正>The capability to segment words from fluent speech is an important step for learning and acquiring a language(Jusczyk,1999).Therefore,a number of researches and studies have focused on various cues that language learners exploit to locate word boundaries.During the half century,it has been discussed that there are mainly four crucial cues can be used by listeners to segment words in speech.Particularly,they are:(1)Prosody(Echols et al.1997;Jusczyk et al.1996):(2)Statistical and distributional regularities(Brent et al.1996;Saffran et al.1996);(3)Phonotactics(Brent et al.1996;Myers et al.1996);

  4. The Practical Audio-Visual Handbook for Teachers.

    Science.gov (United States)

    Scuorzo, Herbert E.

    The use of audio/visual media as an aid to instruction is a common practice in today's classroom. Most teachers, however, have little or no formal training in this field and rarely a knowledgeable coordinator to help them. "The Practical Audio-Visual Handbook for Teachers" discusses the types and mechanics of many of these media forms and proposes…

  5. Audio visual information materials for risk communication

    International Nuclear Information System (INIS)

    Japan Nuclear Cycle Development Institute (JNC), Tokai Works set up the Risk Communication Study Team in January, 2001 to promote mutual understanding between the local residents and JNC. The Team has studied risk communication from various viewpoints and developed new methods of public relations which are useful for the local residents' risk perception toward nuclear issues. We aim to develop more effective risk communication which promotes a better mutual understanding of the local residents, by providing the risk information of the nuclear fuel facilities such a Reprocessing Plant and other research and development facilities. We explain the development process of audio visual information materials which describe our actual activities and devices for the risk management in nuclear fuel facilities, and our discussion through the effectiveness measurement. (author)

  6. Audio-visual perception system for a humanoid robotic head.

    Science.gov (United States)

    Viciana-Abad, Raquel; Marfil, Rebeca; Perez-Lorenzo, Jose M; Bandera, Juan P; Romero-Garces, Adrian; Reche-Lopez, Pedro

    2014-01-01

    One of the main issues within the field of social robotics is to endow robots with the ability to direct attention to people with whom they are interacting. Different approaches follow bio-inspired mechanisms, merging audio and visual cues to localize a person using multiple sensors. However, most of these fusion mechanisms have been used in fixed systems, such as those used in video-conference rooms, and thus, they may incur difficulties when constrained to the sensors with which a robot can be equipped. Besides, within the scope of interactive autonomous robots, there is a lack in terms of evaluating the benefits of audio-visual attention mechanisms, compared to only audio or visual approaches, in real scenarios. Most of the tests conducted have been within controlled environments, at short distances and/or with off-line performance measurements. With the goal of demonstrating the benefit of fusing sensory information with a Bayes inference for interactive robotics, this paper presents a system for localizing a person by processing visual and audio data. Moreover, the performance of this system is evaluated and compared via considering the technical limitations of unimodal systems. The experiments show the promise of the proposed approach for the proactive detection and tracking of speakers in a human-robot interactive framework. PMID:24878593

  7. Audio-Visual Perception System for a Humanoid Robotic Head

    Directory of Open Access Journals (Sweden)

    Raquel Viciana-Abad

    2014-05-01

    Full Text Available One of the main issues within the field of social robotics is to endow robots with the ability to direct attention to people with whom they are interacting. Different approaches follow bio-inspired mechanisms, merging audio and visual cues to localize a person using multiple sensors. However, most of these fusion mechanisms have been used in fixed systems, such as those used in video-conference rooms, and thus, they may incur difficulties when constrained to the sensors with which a robot can be equipped. Besides, within the scope of interactive autonomous robots, there is a lack in terms of evaluating the benefits of audio-visual attention mechanisms, compared to only audio or visual approaches, in real scenarios. Most of the tests conducted have been within controlled environments, at short distances and/or with off-line performance measurements. With the goal of demonstrating the benefit of fusing sensory information with a Bayes inference for interactive robotics, this paper presents a system for localizing a person by processing visual and audio data. Moreover, the performance of this system is evaluated and compared via considering the technical limitations of unimodal systems. The experiments show the promise of the proposed approach for the proactive detection and tracking of speakers in a human-robot interactive framework.

  8. Audio-visual perception system for a humanoid robotic head.

    Science.gov (United States)

    Viciana-Abad, Raquel; Marfil, Rebeca; Perez-Lorenzo, Jose M; Bandera, Juan P; Romero-Garces, Adrian; Reche-Lopez, Pedro

    2014-01-01

    One of the main issues within the field of social robotics is to endow robots with the ability to direct attention to people with whom they are interacting. Different approaches follow bio-inspired mechanisms, merging audio and visual cues to localize a person using multiple sensors. However, most of these fusion mechanisms have been used in fixed systems, such as those used in video-conference rooms, and thus, they may incur difficulties when constrained to the sensors with which a robot can be equipped. Besides, within the scope of interactive autonomous robots, there is a lack in terms of evaluating the benefits of audio-visual attention mechanisms, compared to only audio or visual approaches, in real scenarios. Most of the tests conducted have been within controlled environments, at short distances and/or with off-line performance measurements. With the goal of demonstrating the benefit of fusing sensory information with a Bayes inference for interactive robotics, this paper presents a system for localizing a person by processing visual and audio data. Moreover, the performance of this system is evaluated and compared via considering the technical limitations of unimodal systems. The experiments show the promise of the proposed approach for the proactive detection and tracking of speakers in a human-robot interactive framework.

  9. HIFI-AV: An Audio-visual Corpus for Spoken Language Human-Machine Dialogue Research in Spanish

    OpenAIRE

    Fernández Martínez, Fernando; Lucas Cuesta, Juan Manuel; Barra Chicote, Roberto; Ferreiros López, Javier; Macías Guarasa, Javier

    2010-01-01

    In this paper, we describe a new multi-purpose audio-visual database on the context of speech interfaces for controlling household electronic devices. The database comprises speech and video recordings of 19 speakers interacting with a HIFI audio box by means of a spoken dialogue system. Dialogue management is based on Bayesian Networks and the system is provided with contextual information handling strategies. Each speaker was requested to fulfil different sets of specific goals following pred...

  10. Semantic Framing of Speech : Emotional and Topical Cues in Perception of Poorly Specified Speech

    OpenAIRE

    Lidestam, Björn

    2003-01-01

    The general aim of this thesis was to test the effects of paralinguistic (emotional) and prior contextual (topical) cues on perception of poorly specified visual, auditory, and audiovisual speech. The specific purposes were to (1) examine if facially displayed emotions can facilitate speechreading performance; (2) to study the mechanism for such facilitation; (3) to map information-processing factors that are involved in processing of poorly specified speech; and (4) to present a comprehensiv...

  11. Proper Use of Audio-Visual Aids: Essential for Educators.

    Science.gov (United States)

    Dejardin, Conrad

    1989-01-01

    Criticizes educators as the worst users of audio-visual aids and among the worst public speakers. Offers guidelines for the proper use of an overhead projector and the development of transparencies. (DMM)

  12. CAVA (human Communication: an Audio-Visual Archive)

    OpenAIRE

    Mahon, M. S.

    2009-01-01

    In order to investigate human communication and interaction, researchers need hours of audio-visual data, sometimes recorded over periods of months or years. The process of collecting, cataloguing and transcribing such valuable data is time-consuming and expensive. Once it is collected and ready to use, it makes sense to get the maximum value from it by reusing it and sharing it among the research community. But unlike highly-controlled experimental data, natural audio-visual data tends t...

  13. The Fungible Audio-Visual Mapping and its Experience

    Directory of Open Access Journals (Sweden)

    Adriana Sa

    2014-12-01

    Full Text Available This article draws a perceptual approach to audio-visual mapping. Clearly perceivable cause and effect relationships can be problematic if one desires the audience to experience the music. Indeed perception would bias those sonic qualities that fit previous concepts of causation, subordinating other sonic qualities, which may form the relations between the sounds themselves. The question is, how can an audio-visual mapping produce a sense of causation, and simultaneously confound the actual cause-effect relationships. We call this a fungible audio-visual mapping. Our aim here is to glean its constitution and aspect. We will report a study, which draws upon methods from experimental psychology to inform audio-visual instrument design and composition. The participants are shown several audio-visual mapping prototypes, after which we pose quantitative and qualitative questions regarding their sense of causation, and their sense of understanding the cause-effect relationships. The study shows that a fungible mapping requires both synchronized and seemingly non-related components – sufficient complexity to be confusing. As the specific cause-effect concepts remain inconclusive, the sense of causation embraces the whole. 

  14. The unity assumption facilitates cross-modal binding of musical, non-speech stimuli: The role of spectral and amplitude envelope cues.

    Science.gov (United States)

    Chuen, Lorraine; Schutz, Michael

    2016-07-01

    An observer's inference that multimodal signals originate from a common underlying source facilitates cross-modal binding. This 'unity assumption' causes asynchronous auditory and visual speech streams to seem simultaneous (Vatakis & Spence, Perception & Psychophysics, 69(5), 744-756, 2007). Subsequent tests of non-speech stimuli such as musical and impact events found no evidence for the unity assumption, suggesting the effect is speech-specific (Vatakis & Spence, Acta Psychologica, 127(1), 12-23, 2008). However, the role of amplitude envelope (the changes in energy of a sound over time) was not previously appreciated within this paradigm. Here, we explore whether previous findings suggesting speech-specificity of the unity assumption were confounded by similarities in the amplitude envelopes of the contrasted auditory stimuli. Experiment 1 used natural events with clearly differentiated envelopes: single notes played on either a cello (bowing motion) or marimba (striking motion). Participants performed an un-speeded temporal order judgments task; viewing audio-visually matched (e.g., marimba auditory with marimba video) and mismatched (e.g., cello auditory with marimba video) versions of stimuli at various stimulus onset asynchronies, and were required to indicate which modality was presented first. As predicted, participants were less sensitive to temporal order in matched conditions, demonstrating that the unity assumption can facilitate the perception of synchrony outside of speech stimuli. Results from Experiments 2 and 3 revealed that when spectral information was removed from the original auditory stimuli, amplitude envelope alone could not facilitate the influence of audiovisual unity. We propose that both amplitude envelope and spectral acoustic cues affect the percept of audiovisual unity, working in concert to help an observer determine when to integrate across modalities.

  15. The unity assumption facilitates cross-modal binding of musical, non-speech stimuli: The role of spectral and amplitude envelope cues.

    Science.gov (United States)

    Chuen, Lorraine; Schutz, Michael

    2016-07-01

    An observer's inference that multimodal signals originate from a common underlying source facilitates cross-modal binding. This 'unity assumption' causes asynchronous auditory and visual speech streams to seem simultaneous (Vatakis & Spence, Perception & Psychophysics, 69(5), 744-756, 2007). Subsequent tests of non-speech stimuli such as musical and impact events found no evidence for the unity assumption, suggesting the effect is speech-specific (Vatakis & Spence, Acta Psychologica, 127(1), 12-23, 2008). However, the role of amplitude envelope (the changes in energy of a sound over time) was not previously appreciated within this paradigm. Here, we explore whether previous findings suggesting speech-specificity of the unity assumption were confounded by similarities in the amplitude envelopes of the contrasted auditory stimuli. Experiment 1 used natural events with clearly differentiated envelopes: single notes played on either a cello (bowing motion) or marimba (striking motion). Participants performed an un-speeded temporal order judgments task; viewing audio-visually matched (e.g., marimba auditory with marimba video) and mismatched (e.g., cello auditory with marimba video) versions of stimuli at various stimulus onset asynchronies, and were required to indicate which modality was presented first. As predicted, participants were less sensitive to temporal order in matched conditions, demonstrating that the unity assumption can facilitate the perception of synchrony outside of speech stimuli. Results from Experiments 2 and 3 revealed that when spectral information was removed from the original auditory stimuli, amplitude envelope alone could not facilitate the influence of audiovisual unity. We propose that both amplitude envelope and spectral acoustic cues affect the percept of audiovisual unity, working in concert to help an observer determine when to integrate across modalities. PMID:27084701

  16. An Audio-Visual Lecture Course in Russian Culture

    Science.gov (United States)

    Leighton, Lauren G.

    1977-01-01

    An audio-visual course in Russian culture is given at Northern Illinois University. A collection of 4-5,000 color slides is the basis for the course, with lectures focussed on literature, philosophy, religion, politics, art and crafts. Acquisition, classification, storage and presentation of slides, and organization of lectures are discussed. (CHK)

  17. Voice activity detection using audio-visual information

    DEFF Research Database (Denmark)

    Petsatodis, Theodore; Pnevmatikakis, Aristodemos; Boukis, Christos

    2009-01-01

    An audio-visual voice activity detector that uses sensors positioned distantly from the speaker is presented. Its constituting unimodal detectors are based on the modeling of the temporal variation of audio and visual features using Hidden Markov Models; their outcomes are fused using a post...

  18. Audio-Visual Aid in Teaching "Fatty Liver"

    Science.gov (United States)

    Dash, Sambit; Kamath, Ullas; Rao, Guruprasad; Prakash, Jay; Mishra, Snigdha

    2016-01-01

    Use of audio visual tools to aid in medical education is ever on a rise. Our study intends to find the efficacy of a video prepared on "fatty liver," a topic that is often a challenge for pre-clinical teachers, in enhancing cognitive processing and ultimately learning. We prepared a video presentation of 11:36 min, incorporating various…

  19. Market potential for interactive audio-visual media

    NARCIS (Netherlands)

    Leurdijk, A.; Limonard, S.

    2005-01-01

    NM2 (New Media for a New Millennium) develops tools for interactive, personalised and non-linear audio-visual content that will be tested in seven pilot productions. This paper looks at the market potential for these productions from a technological, a business and a users' perspective. It shows tha

  20. Audio/Visual Aids: A Study of the Effect of Audio/Visual Aids on the Comprehension Recall of Students.

    Science.gov (United States)

    Bavaro, Sandra

    A study investigated whether the use of audio/visual aids had an effect upon comprehension recall. Thirty fourth-grade students from an urban public school were randomly divided into two equal samples of 15. One group was given a story to read (print only), while the other group viewed a filmstrip of the same story, thereby utilizing audio/visual…

  1. Audio-visual feedback improves the BCI performance in the navigational control of a humanoid robot.

    Science.gov (United States)

    Tidoni, Emmanuele; Gergondet, Pierre; Kheddar, Abderrahmane; Aglioti, Salvatore M

    2014-01-01

    Advancement in brain computer interfaces (BCI) technology allows people to actively interact in the world through surrogates. Controlling real humanoid robots using BCI as intuitively as we control our body represents a challenge for current research in robotics and neuroscience. In order to successfully interact with the environment the brain integrates multiple sensory cues to form a coherent representation of the world. Cognitive neuroscience studies demonstrate that multisensory integration may imply a gain with respect to a single modality and ultimately improve the overall sensorimotor performance. For example, reactivity to simultaneous visual and auditory stimuli may be higher than to the sum of the same stimuli delivered in isolation or in temporal sequence. Yet, knowledge about whether audio-visual integration may improve the control of a surrogate is meager. To explore this issue, we provided human footstep sounds as audio feedback to BCI users while controlling a humanoid robot. Participants were asked to steer their robot surrogate and perform a pick-and-place task through BCI-SSVEPs. We found that audio-visual synchrony between footsteps sound and actual humanoid's walk reduces the time required for steering the robot. Thus, auditory feedback congruent with the humanoid actions may improve motor decisions of the BCI's user and help in the feeling of control over it. Our results shed light on the possibility to increase robot's control through the combination of multisensory feedback to a BCI user. PMID:24987350

  2. Asynchrony adaptation reveals neural population code for audio-visual timing.

    Science.gov (United States)

    Roach, Neil W; Heron, James; Whitaker, David; McGraw, Paul V

    2011-05-01

    The relative timing of auditory and visual stimuli is a critical cue for determining whether sensory signals relate to a common source and for making inferences about causality. However, the way in which the brain represents temporal relationships remains poorly understood. Recent studies indicate that our perception of multisensory timing is flexible--adaptation to a regular inter-modal delay alters the point at which subsequent stimuli are judged to be simultaneous. Here, we measure the effect of audio-visual asynchrony adaptation on the perception of a wide range of sub-second temporal relationships. We find distinctive patterns of induced biases that are inconsistent with the previous explanations based on changes in perceptual latency. Instead, our results can be well accounted for by a neural population coding model in which: (i) relative audio-visual timing is represented by the distributed activity across a relatively small number of neurons tuned to different delays; (ii) the algorithm for reading out this population code is efficient, but subject to biases owing to under-sampling; and (iii) the effect of adaptation is to modify neuronal response gain. These results suggest that multisensory timing information is represented by a dedicated population code and that shifts in perceived simultaneity following asynchrony adaptation arise from analogous neural processes to well-known perceptual after-effects.

  3. Audio-visual voice activity detection

    Institute of Scientific and Technical Information of China (English)

    LIU Peng; WANG Zuo-ying

    2006-01-01

    In speech signal processing systems,frame-energy based voice activity detection (VAD) method may be interfered with the background noise and non-stationary characteristic of the frame-energy in voice segment.The purpose of this paper is to improve the performance and robustness of VAD by introducing visual information.Meanwhile,data-driven linear transformation is adopted in visual feature extraction,and a general statistical VAD model is designed.Using the general model and a two-stage fusion strategy presented in this paper,a concrete multimodal VAD system is built.Experiments show that a 55.0% relative reduction in frame error rate and a 98.5% relative reduction in sentence-breaking error rate are obtained when using multimodal VAD,compared to frame-energy based audio VAD.The results show that using multimodal method,sentence-breaking errors are almost avoided,and flame-detection performance is clearly improved, which proves the effectiveness of the visual modal in VAD.

  4. Acoustic Features and Perceptive Cues of Songs and Dialogues in Whistled Speech: Convergences with Sung Speech

    CERN Document Server

    Meyer, Julien

    2007-01-01

    Whistled speech is a little studied local use of language shaped by several cultures of the world either for distant dialogues or for rendering traditional songs. This practice consists of an emulation of the voice thanks to a simple modulated pitch. It is therefore the result of a transformation of the vocal signal that implies simplifications in the frequency domain. The whistlers adapt their productions to the way each language combines the qualities of height perceived simultaneously by the human ear in the complex frequency spectrum of the spoken or sung voice (pitch, timbre). As a consequence, this practice underlines key acoustic cues for the intelligibility of the concerned languages. The present study provides an analysis of the acoustic and phonetic features selected by whistled speech in several traditions either in purely oral whistles (Spanish, Turkish, Mazatec) or in whistles produced with an instrument like a leaf (Akha, Hmong). It underlines the convergences with the strategies of the singing ...

  5. Segmenting Words from Natural Speech: Subsegmental Variation in Segmental Cues

    Science.gov (United States)

    Rytting, C. Anton; Brew, Chris; Fosler-Lussier, Eric

    2010-01-01

    Most computational models of word segmentation are trained and tested on transcripts of speech, rather than the speech itself, and assume that speech is converted into a sequence of symbols prior to word segmentation. We present a way of representing speech corpora that avoids this assumption, and preserves acoustic variation present in speech. We…

  6. Audio-visual interactions in product sound design

    Science.gov (United States)

    Özcan, Elif; van Egmond, René

    2010-02-01

    Consistent product experience requires congruity between product properties such as visual appearance and sound. Therefore, for designing appropriate product sounds by manipulating their spectral-temporal structure, product sounds should preferably not be considered in isolation but as an integral part of the main product concept. Because visual aspects of a product are considered to dominate the communication of the desired product concept, sound is usually expected to fit the visual character of a product. We argue that this can be accomplished successfully only on basis of a thorough understanding of the impact of audio-visual interactions on product sounds. Two experimental studies are reviewed to show audio-visual interactions on both perceptual and cognitive levels influencing the way people encode, recall, and attribute meaning to product sounds. Implications for sound design are discussed defying the natural tendency of product designers to analyze the "sound problem" in isolation from the other product properties.

  7. Effects of virtual speaker density and room reverberation on spatiotemporal thresholds of audio-visual motion coherence.

    Directory of Open Access Journals (Sweden)

    Narayan Sankaran

    Full Text Available The present study examined the effects of spatial sound-source density and reverberation on the spatiotemporal window for audio-visual motion coherence. Three different acoustic stimuli were generated in Virtual Auditory Space: two acoustically "dry" stimuli via the measurement of anechoic head-related impulse responses recorded at either 1° or 5° spatial intervals (Experiment 1, and a reverberant stimulus rendered from binaural room impulse responses recorded at 5° intervals in situ in order to capture reverberant acoustics in addition to head-related cues (Experiment 2. A moving visual stimulus with invariant localization cues was generated by sequentially activating LED's along the same radial path as the virtual auditory motion. Stimuli were presented at 25°/s, 50°/s and 100°/s with a random spatial offset between audition and vision. In a 2AFC task, subjects made a judgment of the leading modality (auditory or visual. No significant differences were observed in the spatial threshold based on the point of subjective equivalence (PSE or the slope of psychometric functions (β across all three acoustic conditions. Additionally, both the PSE and β did not significantly differ across velocity, suggesting a fixed spatial window of audio-visual separation. Findings suggest that there was no loss in spatial information accompanying the reduction in spatial cues and reverberation levels tested, and establish a perceptual measure for assessing the veracity of motion generated from discrete locations and in echoic environments.

  8. Emotional speech processing: disentangling the effects of prosody and semantic cues.

    Science.gov (United States)

    Pell, Marc D; Jaywant, Abhishek; Monetta, Laura; Kotz, Sonja A

    2011-08-01

    To inform how emotions in speech are implicitly processed and registered in memory, we compared how emotional prosody, emotional semantics, and both cues in tandem prime decisions about conjoined emotional faces. Fifty-two participants rendered facial affect decisions (Pell, 2005a), indicating whether a target face represented an emotion (happiness or sadness) or not (a facial grimace), after passively listening to happy, sad, or neutral prime utterances. Emotional information from primes was conveyed by: (1) prosody only; (2) semantic cues only; or (3) combined prosody and semantic cues. Results indicated that prosody, semantics, and combined prosody-semantic cues facilitate emotional decisions about target faces in an emotion-congruent manner. However, the magnitude of priming did not vary across tasks. Our findings highlight that emotional meanings of prosody and semantic cues are systematically registered during speech processing, but with similar effects on associative knowledge about emotions, which is presumably shared by prosody, semantics, and faces.

  9. Perceiving speech in context: Compensation for contextual variability during acoustic cue encoding and categorization

    Science.gov (United States)

    Toscano, Joseph Christopher

    Several fundamental questions about speech perception concern how listeners understand spoken language despite considerable variability in speech sounds across different contexts (the problem of lack of invariance in speech). This contextual variability is caused by several factors, including differences between individual talkers' voices, variation in speaking rate, and effects of coarticulatory context. A number of models have been proposed to describe how the speech system handles differences across contexts. Critically, these models make different predictions about (1) whether contextual variability is handled at the level of acoustic cue encoding or categorization, (2) whether it is driven by feedback from category-level processes or interactions between cues, and (3) whether listeners discard fine-grained acoustic information to compensate for contextual variability. Separating the effects of cue- and category-level processing has been difficult because behavioral measures tap processes that occur well after initial cue encoding and are influenced by task demands and linguistic information. Recently, we have used the event-related brain potential (ERP) technique to examine cue encoding and online categorization. Specifically, we have looked at differences in the auditory N1 as a measure of acoustic cue encoding and the P3 as a measure of categorization. This allows us to examine multiple levels of processing during speech perception and can provide a useful tool for studying effects of contextual variability. Here, I apply this approach to determine the point in processing at which context has an effect on speech perception and to examine whether acoustic cues are encoded continuously. Several types of contextual variability (talker gender, speaking rate, and coarticulation), as well as several acoustic cues (voice onset time, formant frequencies, and bandwidths), are examined in a series of experiments. The results suggest that (1) at early stages of speech

  10. Information-Driven Active Audio-Visual Source Localization.

    Directory of Open Access Journals (Sweden)

    Niclas Schult

    Full Text Available We present a system for sensorimotor audio-visual source localization on a mobile robot. We utilize a particle filter for the combination of audio-visual information and for the temporal integration of consecutive measurements. Although the system only measures the current direction of the source, the position of the source can be estimated because the robot is able to move and can therefore obtain measurements from different directions. These actions by the robot successively reduce uncertainty about the source's position. An information gain mechanism is used for selecting the most informative actions in order to minimize the number of actions required to achieve accurate and precise position estimates in azimuth and distance. We show that this mechanism is an efficient solution to the action selection problem for source localization, and that it is able to produce precise position estimates despite simplified unisensory preprocessing. Because of the robot's mobility, this approach is suitable for use in complex and cluttered environments. We present qualitative and quantitative results of the system's performance and discuss possible areas of application.

  11. Aided and Unaided Speech Supplementation Strategies: Effect of Alphabet Cues and Iconic Hand Gestures on Dysarthric Speech

    Science.gov (United States)

    Hustad, Katherine C.; Garcia, Jane Mertz

    2005-01-01

    Purpose: This study compared the influence of speaker-implemented iconic hand gestures and alphabet cues on speech intelligibility scores and strategy helpfulness ratings for 3 adults with cerebral palsy and dysarthria who differed from one another in their overall motor abilities. Method: A total of 144 listeners (48 per speaker) orthographically…

  12. Audiovisual cues benefit recognition of accented speech in noise but not perceptual adaptation.

    Science.gov (United States)

    Banks, Briony; Gowen, Emma; Munro, Kevin J; Adank, Patti

    2015-01-01

    Perceptual adaptation allows humans to recognize different varieties of accented speech. We investigated whether perceptual adaptation to accented speech is facilitated if listeners can see a speaker's facial and mouth movements. In Study 1, participants listened to sentences in a novel accent and underwent a period of training with audiovisual or audio-only speech cues, presented in quiet or in background noise. A control group also underwent training with visual-only (speech-reading) cues. We observed no significant difference in perceptual adaptation between any of the groups. To address a number of remaining questions, we carried out a second study using a different accent, speaker and experimental design, in which participants listened to sentences in a non-native (Japanese) accent with audiovisual or audio-only cues, without separate training. Participants' eye gaze was recorded to verify that they looked at the speaker's face during audiovisual trials. Recognition accuracy was significantly better for audiovisual than for audio-only stimuli; however, no statistical difference in perceptual adaptation was observed between the two modalities. Furthermore, Bayesian analysis suggested that the data supported the null hypothesis. Our results suggest that although the availability of visual speech cues may be immediately beneficial for recognition of unfamiliar accented speech in noise, it does not improve perceptual adaptation.

  13. The sound of your lips: electrophysiological cross-modal interactions during hand-to-face and face-to-face speech perception

    Directory of Open Access Journals (Sweden)

    Avrill eTreille

    2014-05-01

    Full Text Available Recent magneto-encephalographic and electro-encephalographic studies provide evidence for cross-modal integration during audio-visual and audio-haptic speech perception, with speech gestures viewed or felt from manual tactile contact with the speaker’s face. Given the temporal precedence of the haptic and visual signals on the acoustic signal in these studies, the observed modulation of N1/P2 auditory evoked responses during bimodal compared to unimodal speech perception suggest that relevant and predictive visual and haptic cues may facilitate auditory speech processing. To further investigate this hypothesis, auditory evoked potentials were here compared during auditory-only, audio-visual and audio-haptic speech perception in live dyadic interactions between a listener and a speaker. In line with previous studies, auditory evoked potentials were attenuated and speeded up during both audio-haptic and audio-visual compared to auditory speech perception. Importantly, the observed latency and amplitude reduction did not significantly depend on the degree of visual and haptic recognition of the speech targets. Altogether, these results further demonstrate cross-modal interactions between the auditory, visual and haptic speech signals. Although they do not contradict the hypothesis that visual and haptic sensory inputs convey predictive information with respect to the incoming auditory speech input, these results suggest that, at least in live conversational interactions, systematic conclusions on sensory predictability in bimodal speech integration have to be taken with caution, with the extraction of predictive cues likely depending on the variability of the speech stimuli.

  14. Durational cues to word boundaries in clear speech

    OpenAIRE

    Cutler, A.; Butterfield, S.

    1990-01-01

    One of a listener’s major tasks in understanding continuous speech in segmenting the speech signal into separate words. When listening conditions are difficult, speakers can help listeners by deliberately clear speech. We found that speakers do indeed attempt to makr word boundaries; moreover, they differentiate between word boundaries in a way which suggest they are sensitive to listener needs. Application of heuristic segmentation strategies makes word boundaries before strong syllables eas...

  15. A Management Review and Analysis of Purdue University Libraries and Audio-Visual Center.

    Science.gov (United States)

    Baaske, Jan; And Others

    A management review and analysis was conducted by the staff of the libraries and audio-visual center of Purdue University. Not only were the study team and the eight task forces drawn from all levels of the libraries and audio-visual center staff, but a systematic effort was sustained through inquiries, draft reports and open meetings to involve…

  16. Increasing observer objectivity with audio-visual technology: the Sphygmocorder.

    Science.gov (United States)

    Atkins; O'Brien; Wesseling; Guelen

    1997-10-01

    The most fallible component of blood pressure measurement is the human observer. The traditional technique of measuring blood pressure does not allow the result of the measurement to be checked by independent observers, thereby leaving the method open to bias. In the Sphygmocorder, several components used to measure blood pressure have been combined innovatively with audio-visual recording technology to produce a system consisting of a mercury sphygmomanometer, an occluding cuff, an automatic inflation-deflation source, a stethoscope, a microphone capable of detecting Korotkoff sounds, a camcorder and a display screen. The accuracy of the Sphygmocorder against the trained human observer has been confirmed previously using the protocol of the British Hypertension Society and in this article the updated system incorporating a number of innovations is described. PMID:10234128

  17. The audio-visual revolution: do we really need it?

    Science.gov (United States)

    Townsend, I

    1979-03-01

    In the United Kingdom, The audio-visual revolution has steadily gained converts in the nursing profession. Nurse tutor courses now contain information on the techniques of educational technology and schools of nursing increasingly own (or wish to own) many of the sophisticated electronic aids to teaching that abound. This is taking place at a time of hitherto inexperienced crisis and change. Funds have been or are being made available to buy audio-visual equipment. But its purchase and use relies on satisfying personal whim, prejudice or educational fashion, not on considerations of educational efficiency. In the rush of enthusiasm, the overwhelmed teacher (everywhere; the phenomenon is not confined to nursing) forgets to ask the searching, critical questions: 'Why should we use this aid?','How effective is it?','And, at what?'. Influential writers in this profession have repeatedly called for a more responsible attitude towards published research work of other fields. In an attempt to discover what is known about the answers to this group of questions, an eclectic look at media research is taken and the widespread dissatisfaction existing amongst international educational technologists is noted. The paper isolates out of the literature several causative factors responsible for the present state of affairs. Findings from the field of educational television are cited as representative of an aid which has had a considerable amount of time and research directed at it. The concluding part of the paper shows the decisions to be taken in using or not using educational media as being more complicated than might at first appear.

  18. Psychoacoustic cues to emotion in speech prosody and music.

    Science.gov (United States)

    Coutinho, Eduardo; Dibben, Nicola

    2013-01-01

    There is strong evidence of shared acoustic profiles common to the expression of emotions in music and speech, yet relatively limited understanding of the specific psychoacoustic features involved. This study combined a controlled experiment and computational modelling to investigate the perceptual codes associated with the expression of emotion in the acoustic domain. The empirical stage of the study provided continuous human ratings of emotions perceived in excerpts of film music and natural speech samples. The computational stage created a computer model that retrieves the relevant information from the acoustic stimuli and makes predictions about the emotional expressiveness of speech and music close to the responses of human subjects. We show that a significant part of the listeners' second-by-second reported emotions to music and speech prosody can be predicted from a set of seven psychoacoustic features: loudness, tempo/speech rate, melody/prosody contour, spectral centroid, spectral flux, sharpness, and roughness. The implications of these results are discussed in the context of cross-modal similarities in the communication of emotion in the acoustic domain.

  19. Something for Everyone? An Evaluation of the Use of Audio-Visual Resources in Geographical Learning in the UK.

    Science.gov (United States)

    McKendrick, John H.; Bowden, Annabel

    1999-01-01

    Reports from a survey of geographers that canvassed experiences using audio-visual resources to support teaching. Suggests that geographical learning has embraced audio-visual resources and that they are employed effectively. Concludes that integration of audio-visual resources into mainstream curriculum is essential to ensure effective and…

  20. Paragraph-based Prosodic Cues for Speech Synthesis Applications

    OpenAIRE

    Farrús, Mireia; Lai, Catherine; Moore, Johanna

    2016-01-01

    Speech synthesis has improved in both expressiveness and voice quality inrecent years. However, obtaining full expressiveness when dealing with largemulti-sentential synthesized discourse is still a challenge, since speechsynthesizers do not take into account the prosodic differences that have beenobserved in discourse units such as paragraphs. The current study validatesand extends previous work by analyzing the prosody of paragraph units in alarge and diverse corpus of TED Talks using autom...

  1. Real-time decreased sensitivity to an audio-visual illusion during goal-directed reaching.

    Directory of Open Access Journals (Sweden)

    Luc Tremblay

    Full Text Available In humans, sensory afferences are combined and integrated by the central nervous system (Ernst MO, Bülthoff HH (2004 Trends Cogn. Sci. 8: 162-169 and appear to provide a holistic representation of the environment. Empirical studies have repeatedly shown that vision dominates the other senses, especially for tasks with spatial demands. In contrast, it has also been observed that sound can strongly alter the perception of visual events. For example, when presented with 2 flashes and 1 beep in a very brief period of time, humans often report seeing 1 flash (i.e. fusion illusion, Andersen TS, Tiippana K, Sams M (2004 Brain Res. Cogn. Brain Res. 21: 301-308. However, it is not known how an unfolding movement modulates the contribution of vision to perception. Here, we used the audio-visual illusion to demonstrate that goal-directed movements can alter visual information processing in real-time. Specifically, the fusion illusion was linearly reduced as a function of limb velocity. These results suggest that cue combination and integration can be modulated in real-time by goal-directed behaviors; perhaps through sensory gating (Chapman CE, Beauchamp E (2006 J. Neurophysiol. 96: 1664-1675 and/or altered sensory noise (Ernst MO, Bülthoff HH (2004 Trends Cogn. Sci. 8: 162-169 during limb movements.

  2. Audio-Visual Integration Modifies Emotional Judgment in Music

    Directory of Open Access Journals (Sweden)

    Shen-Yuan Su

    2011-10-01

    Full Text Available The conventional view that perceived emotion in music is derived mainly from auditory signals has led to neglect of the contribution of visual image. In this study, we manipulated mode (major vs. minor and examined the influence of a video image on emotional judgment in music. Melodies in either major or minor mode were controlled for tempo and rhythm and played to the participants. We found that Taiwanese participants, like Westerners, judged major melodies as expressing positive, and minor melodies negative, emotions. The major or minor melodies were then paired with video images of the singers, which were either emotionally congruent or incongruent with their modes. Results showed that participants perceived stronger positive or negative emotions with congruent audio-visual stimuli. Compared to listening to music alone, stronger emotions were perceived when an emotionally congruent video image was added and weaker emotions were perceived when an incongruent image was added. We therefore demonstrate that mode is important to perceive the emotional valence in music and that treating musical art as a purely auditory event might lose the enhanced emotional strength perceived in music, since going to a concert may lead to stronger perceived emotion than listening to the CD at home.

  3. Audio-visual assistance in co-creating transition knowledge

    Science.gov (United States)

    Hezel, Bernd; Broschkowski, Ephraim; Kropp, Jürgen P.

    2013-04-01

    Earth system and climate impact research results point to the tremendous ecologic, economic and societal implications of climate change. Specifically people will have to adopt lifestyles that are very different from those they currently strive for in order to mitigate severe changes of our known environment. It will most likely not suffice to transfer the scientific findings into international agreements and appropriate legislation. A transition is rather reliant on pioneers that define new role models, on change agents that mainstream the concept of sufficiency and on narratives that make different futures appealing. In order for the research community to be able to provide sustainable transition pathways that are viable, an integration of the physical constraints and the societal dynamics is needed. Hence the necessary transition knowledge is to be co-created by social and natural science and society. To this end, the Climate Media Factory - in itself a massively transdisciplinary venture - strives to provide an audio-visual connection between the different scientific cultures and a bi-directional link to stake holders and society. Since methodology, particular language and knowledge level of the involved is not the same, we develop new entertaining formats on the basis of a "complexity on demand" approach. They present scientific information in an integrated and entertaining way with different levels of detail that provide entry points to users with different requirements. Two examples shall illustrate the advantages and restrictions of the approach.

  4. A Psychophysical Imaging Method Evidencing Auditory Cue Extraction during Speech Perception: A Group Analysis of Auditory Classification Images

    OpenAIRE

    Varnet, Léo; Knoblauch, Kenneth; Serniclaes, Willy; Meunier, Fanny; Hoen, Michel

    2015-01-01

    Although there is a large consensus regarding the involvement of specific acoustic cues in speech perception, the precise mechanisms underlying the transformation from continuous acoustical properties into discrete perceptual units remains undetermined. This gap in knowledge is partially due to the lack of a turnkey solution for isolating critical speech cues from natural stimuli. In this paper, we describe a psychoacoustic imaging method known as the Auditory Classification Image technique t...

  5. The contribution of visual information to the perception of speech in noise with and without informative temporal fine structure.

    Science.gov (United States)

    Stacey, Paula C; Kitterick, Pádraig T; Morris, Saffron D; Sumner, Christian J

    2016-06-01

    Understanding what is said in demanding listening situations is assisted greatly by looking at the face of a talker. Previous studies have observed that normal-hearing listeners can benefit from this visual information when a talker's voice is presented in background noise. These benefits have also been observed in quiet listening conditions in cochlear-implant users, whose device does not convey the informative temporal fine structure cues in speech, and when normal-hearing individuals listen to speech processed to remove these informative temporal fine structure cues. The current study (1) characterised the benefits of visual information when listening in background noise; and (2) used sine-wave vocoding to compare the size of the visual benefit when speech is presented with or without informative temporal fine structure. The accuracy with which normal-hearing individuals reported words in spoken sentences was assessed across three experiments. The availability of visual information and informative temporal fine structure cues was varied within and across the experiments. The results showed that visual benefit was observed using open- and closed-set tests of speech perception. The size of the benefit increased when informative temporal fine structure cues were removed. This finding suggests that visual information may play an important role in the ability of cochlear-implant users to understand speech in many everyday situations. Models of audio-visual integration were able to account for the additional benefit of visual information when speech was degraded and suggested that auditory and visual information was being integrated in a similar way in all conditions. The modelling results were consistent with the notion that audio-visual benefit is derived from the optimal combination of auditory and visual sensory cues. PMID:27085797

  6. Normal-Hearing Listeners' and Cochlear Implant Users' Perception of Pitch Cues in Emotional Speech.

    Science.gov (United States)

    Gilbers, Steven; Fuller, Christina; Gilbers, Dicky; Broersma, Mirjam; Goudbeek, Martijn; Free, Rolien; Başkent, Deniz

    2015-10-01

    In cochlear implants (CIs), acoustic speech cues, especially for pitch, are delivered in a degraded form. This study's aim is to assess whether due to degraded pitch cues, normal-hearing listeners and CI users employ different perceptual strategies to recognize vocal emotions, and, if so, how these differ. Voice actors were recorded pronouncing a nonce word in four different emotions: anger, sadness, joy, and relief. These recordings' pitch cues were phonetically analyzed. The recordings were used to test 20 normal-hearing listeners' and 20 CI users' emotion recognition. In congruence with previous studies, high-arousal emotions had a higher mean pitch, wider pitch range, and more dominant pitches than low-arousal emotions. Regarding pitch, speakers did not differentiate emotions based on valence but on arousal. Normal-hearing listeners outperformed CI users in emotion recognition, even when presented with CI simulated stimuli. However, only normal-hearing listeners recognized one particular actor's emotions worse than the other actors'. The groups behaved differently when presented with similar input, showing that they had to employ differing strategies. Considering the respective speaker's deviating pronunciation, it appears that for normal-hearing listeners, mean pitch is a more salient cue than pitch range, whereas CI users are biased toward pitch range cues. PMID:27648210

  7. Pitch and spectral resolution: A systematic comparison of bottom-up cues for top-down repair of degraded speech.

    Science.gov (United States)

    Clarke, Jeanne; Başkent, Deniz; Gaudrain, Etienne

    2016-01-01

    The brain is capable of restoring missing parts of speech, a top-down repair mechanism that enhances speech understanding in noisy environments. This enhancement can be quantified using the phonemic restoration paradigm, i.e., the improvement in intelligibility when silent interruptions of interrupted speech are filled with noise. Benefit from top-down repair of speech differs between cochlear implant (CI) users and normal-hearing (NH) listeners. This difference could be due to poorer spectral resolution and/or weaker pitch cues inherent to CI transmitted speech. In CIs, those two degradations cannot be teased apart because spectral degradation leads to weaker pitch representation. A vocoding method was developed to evaluate independently the roles of pitch and spectral resolution for restoration in NH individuals. Sentences were resynthesized with different spectral resolutions and with either retaining the original pitch cues or discarding them all. The addition of pitch significantly improved restoration only at six-bands spectral resolution. However, overall intelligibility of interrupted speech was improved both with the addition of pitch and with the increase in spectral resolution. This improvement may be due to better discrimination of speech segments from the filler noise, better grouping of speech segments together, and/or better bottom-up cues available in the speech segments. PMID:26827034

  8. Relative Contributions of Spectral and Temporal Cues for Speech Recognition in Patients with Sensorineural Hearing Loss

    Institute of Scientific and Technical Information of China (English)

    XU Li; ZHOU Ning; Rebecca Brashears; Katherine Rife

    2008-01-01

    The present study was designed to examine speech recognition in patients with sensorineural hearing loss when the temporal and spectral information in the speech signals were co-varied. Four subjects with mild to moderate sensorineural hearing loss were recruited to participate in consonant and vowel recognition tests that used speech stimuli processed through a noise-excited voeoder. The number of channels was varied between 2 and 32, which defined spectral information. The lowpass cutoff frequency of the temporal envelope extractor was varied from 1 to 512 Hz, which defined temporal information. Results indicate that performance of subjects with sensorineural heating loss varied tremendously among the subjects. For consonant recognition, patterns of relative contributions of spectral and temporal information were similar to those in normal-hearing subjects. The utility of temporal envelope information appeared to be normal in the hearing-impaired listeners. For vowel recognition, which depended predominately on spectral information, the performance plateau was achieved with numbers of channels as high as 16-24, much higher than expected, given that the frequency selectivity in patients with sensorineural hearing loss might be compromised. In order to understand the mechanisms on how hearing-impaired listeners utilize spectral and temporal cues for speech recognition, future studies that involve a large sample of patients with sensorineural hearing loss will be necessary to elucidate the relationship between frequency selectivity as well as central processing capability and speech recognition performance using vocoded signals.

  9. The development and use of audio-visual technology in terms of economy and socio-economic trends in society

    OpenAIRE

    Mikšík, Jan

    2014-01-01

    The aim of this work is to describe history of audio-visual technology and to analyse the influence of digitalization. The text describes the history of cinematography, television and also the introduction of audio-visual technology to people's homes. It contains information on present situation as well as new trends and the influence of the Internet on audio-visual making. There is a comparison of past and present technologies. The new technologies are accessible even for amateur creators wh...

  10. Listeners' expectation of room acoustical parameters based on visual cues

    Science.gov (United States)

    Valente, Daniel L.

    Despite many studies investigating auditory spatial impressions in rooms, few have addressed the impact of simultaneous visual cues on localization and the perception of spaciousness. The current research presents an immersive audio-visual study, in which participants are instructed to make spatial congruency and quantity judgments in dynamic cross-modal environments. The results of these psychophysical tests suggest the importance of consilient audio-visual presentation to the legibility of an auditory scene. Several studies have looked into audio-visual interaction in room perception in recent years, but these studies rely on static images, speech signals, or photographs alone to represent the visual scene. Building on these studies, the aim is to propose a testing method that uses monochromatic compositing (blue-screen technique) to position a studio recording of a musical performance in a number of virtual acoustical environments and ask subjects to assess these environments. In the first experiment of the study, video footage was taken from five rooms varying in physical size from a small studio to a small performance hall. Participants were asked to perceptually align two distinct acoustical parameters---early-to-late reverberant energy ratio and reverberation time---of two solo musical performances in five contrasting visual environments according to their expectations of how the room should sound given its visual appearance. In the second experiment in the study, video footage shot from four different listening positions within a general-purpose space was coupled with sounds derived from measured binaural impulse responses (IRs). The relationship between the presented image, sound, and virtual receiver position was examined. It was found that many visual cues caused different perceived events of the acoustic environment. This included the visual attributes of the space in which the performance was located as well as the visual attributes of the performer

  11. Acoustic Features and Perceptive Cues of Songs and Dialogues in Whistled Speech: Convergences with Sung Speech

    OpenAIRE

    Meyer, Julien

    2007-01-01

    International audience Whistled speech is a little studied local use of language shaped by several cultures of the world either for distant dialogues or for rendering traditional songs. This practice consists of an emulation of the voice thanks to a simple modulated pitch. It is therefore the result of a transformation of the vocal signal that implies simplifications in the frequency domain. The whistlers adapt their productions to the way each language combines the qualities of height per...

  12. An Audio-Visual Resource Notebook for Adult Consumer Education. An Annotated Bibliography of Selected Audio-Visual Aids for Adult Consumer Education, with Special Emphasis on Materials for Elderly, Low-Income and Handicapped Consumers.

    Science.gov (United States)

    Virginia State Dept. of Agriculture and Consumer Services, Richmond, VA.

    This document is an annotated bibliography of audio-visual aids in the field of consumer education, intended especially for use among low-income, elderly, and handicapped consumers. It was developed to aid consumer education program planners in finding audio-visual resources to enhance their presentations. Materials listed include 293 resources…

  13. Temporal structure and complexity affect audio-visual correspondence detection

    OpenAIRE

    Denison, Rachel N.; Driver, Jon; Ruff, Christian C.

    2013-01-01

    Synchrony between events in different senses has long been considered the critical temporal cue for multisensory integration. Here, using rapid streams of auditory and visual events, we demonstrate how humans can use temporal structure (rather than mere temporal coincidence) to detect multisensory relatedness. We find psychophysically that participants can detect matching auditory and visual streams via shared temporal structure for crossmodal lags of up to 200 ms. Performance on this task re...

  14. Temporal structure and complexity affect audio-visual correspondence detection

    OpenAIRE

    Denison, Rachel N.; Jon eDriver; Ruff, Christian C.

    2013-01-01

    Synchrony between events in different senses has long been considered the critical temporal cue for multisensory integration. Here, using rapid streams of auditory and visual events, we demonstrate how humans can use temporal structure (rather than mere temporal coincidence) to detect multisensory relatedness. We find psychophysically that participants can detect matching auditory and visual streams via shared temporal structure for crossmodal lags of up to 200 ms. Performance on this task re...

  15. Temporal Structure and Complexity Affect Audio-Visual Correspondence Detection

    OpenAIRE

    Denison, Rachel N.; Driver, Jon; Ruff, Christian C.

    2013-01-01

    Synchrony between events in different senses has long been considered the critical temporal cue for multisensory integration. Here, using rapid streams of auditory and visual events, we demonstrate how humans can use temporal structure (rather than mere temporal coincidence) to detect multisensory relatedness. We find psychophysically that participants can detect matching auditory and visual streams via shared temporal structure for crossmodal lags of up to 200 ms. Performance on this task re...

  16. Audio-visual training-aid for speechreading

    DEFF Research Database (Denmark)

    Bothe, Hans-Heinrich; Gebert, H.

    2011-01-01

    . Training of speechreading skills may be seen as the acquisition of a new, visual language. In this spirit, the presented project demonstrates the conception and implementation of a language laboratory for speechreading that is intended to be employed as an effective audio‐visual complement and extension...... on the employment of computer‐based communication aids for hearing‐impaired, deaf and deaf‐blind people [6]. This paper presents the complete system that is composed of a 3D‐facial animation with synchronized speech synthesis, a natural language dialogue unit and a student‐teacher‐training module. Due to the very...... of fundamental knowledge of other words. The present version of the training aid can be used for the training of speechreading in English, this as a consequence of the integrated English language models for facial animation and speech synthesis. Nevertheless, the training aid is prepared to handle all possible...

  17. Training the brain to weight speech cues differently: a study of Finnish second-language users of English.

    Science.gov (United States)

    Ylinen, Sari; Uther, Maria; Latvala, Antti; Vepsäläinen, Sara; Iverson, Paul; Akahane-Yamada, Reiko; Näätänen, Risto

    2010-06-01

    Foreign-language learning is a prime example of a task that entails perceptual learning. The correct comprehension of foreign-language speech requires the correct recognition of speech sounds. The most difficult speech-sound contrasts for foreign-language learners often are the ones that have multiple phonetic cues, especially if the cues are weighted differently in the foreign and native languages. The present study aimed to determine whether non-native-like cue weighting could be changed by using phonetic training. Before the training, we compared the use of spectral and duration cues of English /i/ and /I/ vowels (e.g., beat vs. bit) between native Finnish and English speakers. In Finnish, duration is used phonologically to separate short and long phonemes, and therefore Finns were expected to weight duration cues more than native English speakers. The cross-linguistic differences and training effects were investigated with behavioral and electrophysiological methods, in particular by measuring the MMN brain response that has been used to probe long-term memory representations for speech sounds. The behavioral results suggested that before the training, the Finns indeed relied more on duration in vowel recognition than the native English speakers did. After the training, however, the Finns were able to use the spectral cues of the vowels more reliably than before. Accordingly, the MMN brain responses revealed that the training had enhanced the Finns' ability to preattentively process the spectral cues of the English vowels. This suggests that as a result of training, plastic changes had occurred in the weighting of phonetic cues at early processing stages in the cortex. PMID:19445609

  18. Primary School Pupils' Response to Audio-Visual Learning Process in Port-Harcourt

    Science.gov (United States)

    Olube, Friday K.

    2015-01-01

    The purpose of this study is to examine primary school children's response on the use of audio-visual learning processes--a case study of Chokhmah International Academy, Port-Harcourt (owned by Salvation Ministries). It looked at the elements that enhance pupils' response to educational television programmes and their hindrances to these…

  19. Evaluation of Modular EFL Educational Program (Audio-Visual Materials Translation & Translation of Deeds & Documents)

    Science.gov (United States)

    Imani, Sahar Sadat Afshar

    2013-01-01

    Modular EFL Educational Program has managed to offer specialized language education in two specific fields: Audio-visual Materials Translation and Translation of Deeds and Documents. However, no explicit empirical studies can be traced on both internal and external validity measures as well as the extent of compatibility of both courses with the…

  20. Multimodal indexing of digital audio-visual documents: A case study for cultural heritage data

    NARCIS (Netherlands)

    J. Carmichael; M. Larson; J. Marlow; E. Newman; P. Clough; J. Oomen; S. Sav

    2008-01-01

    This paper describes a multimedia multimodal information access sub-system (MIAS) for digital audio-visual documents, typically presented in streaming media format. The system is designed to provide both professional and general users with entry points into video documents that are relevant to their

  1. Technical Considerations in the Delivery of Audio-Visual Course Content.

    Science.gov (United States)

    Lightfoot, Jay M.

    2002-01-01

    In an attempt to provide students with the benefit of the latest technology, some instructors include multimedia content on their class Web sites. This article introduces the basic terms and concepts needed to understand the multimedia domain. Provides a brief tutorial designed to help instructors create good, consistent audio-visual content. (AEF)

  2. The Use of Video as an Audio-visual Material in Foreign Language Teaching Classroom

    Science.gov (United States)

    Cakir, Ismail

    2006-01-01

    In recent years, a great tendency towards the use of technology and its integration into the curriculum has gained a great importance. Particularly, the use of video as an audio-visual material in foreign language teaching classrooms has grown rapidly because of the increasing emphasis on communicative techniques, and it is obvious that the use of…

  3. Acceptance of online audio-visual cultural heritage archive services: a study of the general public

    NARCIS (Netherlands)

    Ongena, G.; Wijngaert, van de L.A.L.; Huizer, E.

    2013-01-01

    Introduction. This study examines the antecedents of user acceptance of an audio-visual heritage archive for a wider audience (i.e., the general public) by extending the technology acceptance model with the concepts of perceived enjoyment, nostalgia proneness and personal innovativeness. Method. A W

  4. Audio-visual synchrony and feature-selective attention co-amplify early visual processing.

    Science.gov (United States)

    Keitel, Christian; Müller, Matthias M

    2016-05-01

    Our brain relies on neural mechanisms of selective attention and converging sensory processing to efficiently cope with rich and unceasing multisensory inputs. One prominent assumption holds that audio-visual synchrony can act as a strong attractor for spatial attention. Here, we tested for a similar effect of audio-visual synchrony on feature-selective attention. We presented two superimposed Gabor patches that differed in colour and orientation. On each trial, participants were cued to selectively attend to one of the two patches. Over time, spatial frequencies of both patches varied sinusoidally at distinct rates (3.14 and 3.63 Hz), giving rise to pulse-like percepts. A simultaneously presented pure tone carried a frequency modulation at the pulse rate of one of the two visual stimuli to introduce audio-visual synchrony. Pulsed stimulation elicited distinct time-locked oscillatory electrophysiological brain responses. These steady-state responses were quantified in the spectral domain to examine individual stimulus processing under conditions of synchronous versus asynchronous tone presentation and when respective stimuli were attended versus unattended. We found that both, attending to the colour of a stimulus and its synchrony with the tone, enhanced its processing. Moreover, both gain effects combined linearly for attended in-sync stimuli. Our results suggest that audio-visual synchrony can attract attention to specific stimulus features when stimuli overlap in space. PMID:26226930

  5. A comparative study on automatic audio-visual fusion for aggression detection using meta-information

    NARCIS (Netherlands)

    Lefter, I.; Rothkrantz, L.J.M.; Burghouts, G.J.

    2013-01-01

    Multimodal fusion is a complex topic. For surveillance applications audio-visual fusion is very promising given the complementary nature of the two streams. However, drawing the correct conclusion from multi-sensor data is not straightforward. In previous work we have analysed a database with audio-

  6. Automatic audio-visual fusion for aggression detection using meta-information

    NARCIS (Netherlands)

    Lefter, I.; Burghouts, G.J.; Rothkrantz, L.J.M.

    2012-01-01

    We propose a new method for audio-visual sensor fusion and apply it to automatic aggression detection. While a variety of definitions of aggression exist, in this paper we see it as any kind of behavior that has a disturbing effect on others. We have collected multi- and unimodal assessments by huma

  7. Challenges of Using Audio-Visual Aids as Warm-Up Activity in Teaching Aviation English

    Science.gov (United States)

    Sahin, Mehmet; Sule, St.; Seçer, Y. E.

    2016-01-01

    This study aims to find out the challenges encountered in the use of video as audio-visual material as a warm-up activity in aviation English course at high school level. This study is based on a qualitative study in which focus group interview is used as the data collection procedure. The participants of focus group are four instructors teaching…

  8. Rehabilitation of balance-impaired stroke patients through audio-visual biofeedback

    DEFF Research Database (Denmark)

    Gheorghe, Cristina; Nissen, Thomas; Juul Rosengreen Christensen, Daniel;

    2015-01-01

    This study explored how audio-visual biofeedback influences physical balance of seven balance-impaired stroke patients, between 33–70 years-of-age. The setup included a bespoke balance board and a music rhythm game. The procedure was designed as follows: (1) a control group who performed a balance...

  9. Brain networks engaged in audiovisual integration during speech perception revealed by persistent homology-based network filtration.

    Science.gov (United States)

    Kim, Heejung; Hahm, Jarang; Lee, Hyekyoung; Kang, Eunjoo; Kang, Hyejin; Lee, Dong Soo

    2015-05-01

    The human brain naturally integrates audiovisual information to improve speech perception. However, in noisy environments, understanding speech is difficult and may require much effort. Although the brain network is supposed to be engaged in speech perception, it is unclear how speech-related brain regions are connected during natural bimodal audiovisual or unimodal speech perception with counterpart irrelevant noise. To investigate the topological changes of speech-related brain networks at all possible thresholds, we used a persistent homological framework through hierarchical clustering, such as single linkage distance, to analyze the connected component of the functional network during speech perception using functional magnetic resonance imaging. For speech perception, bimodal (audio-visual speech cue) or unimodal speech cues with counterpart irrelevant noise (auditory white-noise or visual gum-chewing) were delivered to 15 subjects. In terms of positive relationship, similar connected components were observed in bimodal and unimodal speech conditions during filtration. However, during speech perception by congruent audiovisual stimuli, the tighter couplings of left anterior temporal gyrus-anterior insula component and right premotor-visual components were observed than auditory or visual speech cue conditions, respectively. Interestingly, visual speech is perceived under white noise by tight negative coupling in the left inferior frontal region-right anterior cingulate, left anterior insula, and bilateral visual regions, including right middle temporal gyrus, right fusiform components. In conclusion, the speech brain network is tightly positively or negatively connected, and can reflect efficient or effortful processes during natural audiovisual integration or lip-reading, respectively, in speech perception.

  10. Audio-visual stimulation improves oculomotor patterns in patients with hemianopia.

    Science.gov (United States)

    Passamonti, Claudia; Bertini, Caterina; Làdavas, Elisabetta

    2009-01-01

    Patients with visual field disorders often exhibit impairments in visual exploration and a typical defective oculomotor scanning behaviour. Recent evidence [Bolognini, N., Rasi, F., Coccia, M., & Làdavas, E. (2005b). Visual search improvement in hemianopic patients after audio-visual stimulation. Brain, 128, 2830-2842] suggests that systematic audio-visual stimulation of the blind hemifield can improve accuracy and search times in visual exploration, probably due to the stimulation of Superior Colliculus (SC), an important multisensory structure involved in both the initiation and execution of saccades. The aim of the present study is to verify this hypothesis by studying the effects of multisensory training on oculomotor scanning behaviour. Oculomotor responses during a visual search task and a reading task were studied before and after visual (control) or audio-visual (experimental) training, in a group of 12 patients with chronic visual field defects and 12 controls subjects. Eye movements were recorded using an infra-red technique which measured a range of spatial and temporal variables. Prior to treatment, patients' performance was significantly different from that of controls in relation to fixations and saccade parameters; after Audio-Visual Training, all patients reported an improvement in ocular exploration characterized by fewer fixations and refixations, quicker and larger saccades, and reduced scanpath length. Overall, these improvements led to a reduction of total exploration time. Similarly, reading parameters were significantly affected by the training, with respect to specific impairments observed in both left- and right-hemianopia readers. Our findings provide evidence that Audio-Visual Training, by stimulating the SC, may induce a more organized pattern of visual exploration due to an implementation of efficient oculomotor strategies. Interestingly, the improvement was found to be stable at a 1 year follow-up control session, indicating a long

  11. Comparison of Gated Audiovisual Speech Identification in Elderly Hearing Aid Users and Elderly Normal-Hearing Individuals: Effects of Adding Visual Cues to Auditory Speech Stimuli.

    Science.gov (United States)

    Moradi, Shahram; Lidestam, Björn; Rönnberg, Jerker

    2016-06-17

    The present study compared elderly hearing aid (EHA) users (n = 20) with elderly normal-hearing (ENH) listeners (n = 20) in terms of isolation points (IPs, the shortest time required for correct identification of a speech stimulus) and accuracy of audiovisual gated speech stimuli (consonants, words, and final words in highly and less predictable sentences) presented in silence. In addition, we compared the IPs of audiovisual speech stimuli from the present study with auditory ones extracted from a previous study, to determine the impact of the addition of visual cues. Both participant groups achieved ceiling levels in terms of accuracy in the audiovisual identification of gated speech stimuli; however, the EHA group needed longer IPs for the audiovisual identification of consonants and words. The benefit of adding visual cues to auditory speech stimuli was more evident in the EHA group, as audiovisual presentation significantly shortened the IPs for consonants, words, and final words in less predictable sentences; in the ENH group, audiovisual presentation only shortened the IPs for consonants and words. In conclusion, although the audiovisual benefit was greater for EHA group, this group had inferior performance compared with the ENH group in terms of IPs when supportive semantic context was lacking. Consequently, EHA users needed the initial part of the audiovisual speech signal to be longer than did their counterparts with normal hearing to reach the same level of accuracy in the absence of a semantic context.

  12. Relative roles of consonants and vowels in perceiving phonetic versus talker cues in speech

    Science.gov (United States)

    Cardillo, Gina; Owren, Michael J.

    2002-05-01

    Perceptual experiments tested whether consonants and vowels differentially contribute to phonetic versus indexical cueing in speech. In 2 experiments, 62 total participants each heard 128 American-English word pairs recorded by 8 male and 8 female talkers. Half the pairs were synonyms, while half were nonsynonyms. Further, half the pairs were words from the same talker, and half from different, same-sex talkers. The first word heard was unaltered, while the second was edited by setting either all vowels (``Consonants-Only'') or all consonants (``Vowels-Only'') to silence. Each participant responded to half Consonants-Only and half Vowels-Only trials, always hearing the unaltered word once and the edited word twice. In experiment 1, participants judged whether the two words had the same or different meanings. Participants in experiment 2 indicated whether the word pairs were from the same or different talkers. Performance was measured as latencies and d values, and indicated significantly greater sensitivity to phonetic content when consonants rather than vowels were heard, but the converse when talker identity was judged. These outcomes suggest important functional differences in the roles played by consonants and vowels in normative speech.

  13. Prioritized MPEG-4 Audio-Visual Objects Streaming over the DiffServ

    Institute of Scientific and Technical Information of China (English)

    HUANG Tian-yun; ZHENG Chan

    2005-01-01

    The object-based scalable coding in MPEG-4 is investigated, and a prioritized transmission scheme of MPEG-4 audio-visual objects (AVOs) over the DiffServ network with the QoS guarantee is proposed. MPEG-4 AVOs are extracted and classified into different groups according to their priority values and scalable layers (visual importance). These priority values are mapped to the IP DiffServ per hop behaviors (PHB). This scheme can selectively discard packets with low importance, in order to avoid the network congestion. Simulation results show that the quality of received video can gracefully adapt to network state, as compared with the 'best-effort' manner. Also, by allowing the content provider to define prioritization of each audio-visual object, the adaptive transmission of object-based scalable video can be customized based on the content.

  14. El tratamiento documental del mensaje audiovisual Documentary treatment of the audio-visual message

    Directory of Open Access Journals (Sweden)

    Blanca Rodríguez Bravo

    2005-06-01

    Full Text Available Se analizan las peculiaridades del documento audiovisual y el tratamiento documental que sufre en las emisoras de televisión. Observando a las particularidades de la imagen que condicionan su análisis y recuperación, se establecen las etapas y procedimientos para representar el mensaje audiovisual con vistas a su reutilización. Por último se realizan algunas consideraciones acerca del procesamiento automático del video y de los cambios introducidos por la televisión digital.Peculiarities of the audio-visual document and the treatment it undergoes in TV broadcasting stations are analyzed. The particular features of images condition their analysis and recovery; this paper establishes stages and proceedings for the representation of audio-visual messages with a view to their re-usability Also, some considerations about the automatic processing of the video and the changes introduced by digital TV are made.

  15. Internet video telephony allows speech reading by deaf individuals and improves speech perception by cochlear implant users.

    Directory of Open Access Journals (Sweden)

    Georgios Mantokoudis

    Full Text Available OBJECTIVE: To analyze speech reading through Internet video calls by profoundly hearing-impaired individuals and cochlear implant (CI users. METHODS: Speech reading skills of 14 deaf adults and 21 CI users were assessed using the Hochmair Schulz Moser (HSM sentence test. We presented video simulations using different video resolutions (1280 × 720, 640 × 480, 320 × 240, 160 × 120 px, frame rates (30, 20, 10, 7, 5 frames per second (fps, speech velocities (three different speakers, webcameras (Logitech Pro9000, C600 and C500 and image/sound delays (0-500 ms. All video simulations were presented with and without sound and in two screen sizes. Additionally, scores for live Skype™ video connection and live face-to-face communication were assessed. RESULTS: Higher frame rate (>7 fps, higher camera resolution (>640 × 480 px and shorter picture/sound delay (<100 ms were associated with increased speech perception scores. Scores were strongly dependent on the speaker but were not influenced by physical properties of the camera optics or the full screen mode. There is a significant median gain of +8.5%pts (p = 0.009 in speech perception for all 21 CI-users if visual cues are additionally shown. CI users with poor open set speech perception scores (n = 11 showed the greatest benefit under combined audio-visual presentation (median speech perception +11.8%pts, p = 0.032. CONCLUSION: Webcameras have the potential to improve telecommunication of hearing-impaired individuals.

  16. Cueing musical emotions: An empirical analysis of 24-piece sets by Bach and Chopin documents parallels with emotional speech

    Directory of Open Access Journals (Sweden)

    Matthew ePoon

    2015-11-01

    Full Text Available Acoustic cues such as pitch height and timing are effective at communicating emotion in both music and speech. Numerous experiments altering musical passages have shown that higher and faster melodies generally sound happier than lower and slower melodies, findings consistent with corpus analyses of emotional speech. However, equivalent corpus analyses of complex time-varying cues in music are less common, due in part to the challenges of assembling an appropriate corpus. Here we describe a novel, score-based exploration of the use of pitch height and timing in a set of balanced major and minor key compositions. Our corpus contained all 24 Preludes and 24 Fugues from Bach’s Well Tempered Clavier (book 1, as well as all 24 of Chopin’s Preludes for piano. These three sets are balanced with respect to both modality (major/minor and key chroma (A, B, C, etc.. Consistent with predictions derived from speech, we found major-key (nominally happy pieces to be two semitones higher in pitch height and 29% faster than minor-key (nominally sad pieces. This demonstrates that our balanced corpus of major and minor key pieces uses low-level acoustic cues for emotion in a manner consistent with speech. A series of post-hoc analyses illustrate interesting trade-offs, with

  17. Cueing musical emotions: An empirical analysis of 24-piece sets by Bach and Chopin documents parallels with emotional speech.

    Science.gov (United States)

    Poon, Matthew; Schutz, Michael

    2015-01-01

    Acoustic cues such as pitch height and timing are effective at communicating emotion in both music and speech. Numerous experiments altering musical passages have shown that higher and faster melodies generally sound "happier" than lower and slower melodies, findings consistent with corpus analyses of emotional speech. However, equivalent corpus analyses of complex time-varying cues in music are less common, due in part to the challenges of assembling an appropriate corpus. Here, we describe a novel, score-based exploration of the use of pitch height and timing in a set of "balanced" major and minor key compositions. Our analysis included all 24 Preludes and 24 Fugues from Bach's Well-Tempered Clavier (book 1), as well as all 24 of Chopin's Preludes for piano. These three sets are balanced with respect to both modality (major/minor) and key chroma ("A," "B," "C," etc.). Consistent with predictions derived from speech, we found major-key (nominally "happy") pieces to be two semitones higher in pitch height and 29% faster than minor-key (nominally "sad") pieces. This demonstrates that our balanced corpus of major and minor key pieces uses low-level acoustic cues for emotion in a manner consistent with speech. A series of post hoc analyses illustrate interesting trade-offs, with sets featuring greater emphasis on timing distinctions between modalities exhibiting the least pitch distinction, and vice-versa. We discuss these findings in the broader context of speech-music research, as well as recent scholarship exploring the historical evolution of cue use in Western music.

  18. Cueing musical emotions: An empirical analysis of 24-piece sets by Bach and Chopin documents parallels with emotional speech.

    Science.gov (United States)

    Poon, Matthew; Schutz, Michael

    2015-01-01

    Acoustic cues such as pitch height and timing are effective at communicating emotion in both music and speech. Numerous experiments altering musical passages have shown that higher and faster melodies generally sound "happier" than lower and slower melodies, findings consistent with corpus analyses of emotional speech. However, equivalent corpus analyses of complex time-varying cues in music are less common, due in part to the challenges of assembling an appropriate corpus. Here, we describe a novel, score-based exploration of the use of pitch height and timing in a set of "balanced" major and minor key compositions. Our analysis included all 24 Preludes and 24 Fugues from Bach's Well-Tempered Clavier (book 1), as well as all 24 of Chopin's Preludes for piano. These three sets are balanced with respect to both modality (major/minor) and key chroma ("A," "B," "C," etc.). Consistent with predictions derived from speech, we found major-key (nominally "happy") pieces to be two semitones higher in pitch height and 29% faster than minor-key (nominally "sad") pieces. This demonstrates that our balanced corpus of major and minor key pieces uses low-level acoustic cues for emotion in a manner consistent with speech. A series of post hoc analyses illustrate interesting trade-offs, with sets featuring greater emphasis on timing distinctions between modalities exhibiting the least pitch distinction, and vice-versa. We discuss these findings in the broader context of speech-music research, as well as recent scholarship exploring the historical evolution of cue use in Western music. PMID:26578990

  19. THE IMPROVEMENT OF AUDIO-VISUAL BASED DANCE APPRECIATION LEARNING AMONG PRIMARY TEACHER EDUCATION STUDENTS OF MAKASSAR STATE UNIVERSITY

    OpenAIRE

    Wahira

    2014-01-01

    This research aimed to improve the skill in appreciating dances owned by the students of Primary Teacher Education of Makassar State University, to improve the perception towards audio-visual based art appreciation, to increase the students’ interest in audio-visual based art education subject, and to increase the students’ responses to the subject. This research was classroom action research using the research design created by Kemmis & MC. Taggart, which was conducted to 42 students of Prim...

  20. Modern Foreign Language Audio Visual Education and Computer Technology%现代外语电化教学与计算机技术

    Institute of Scientific and Technical Information of China (English)

    赵飒

    2012-01-01

    Computer Assisted Foreign Language Teaching is an important and effective means of Modern Foreign Language Audio Visual Education.In teaching,language testing and analysis,corpus construction,electronic dictionaries,machine translation,speech recognition and speech synthesis visual voice software,including text,images,output,audio,video,animation,hypertext links,database production and output function,it has been continuously improving with the development of computer software control.This paper describes the basic content and purpose of the Modern Foreign Language Audio Visual Education as well as the computer teaching software controls.%在现代外语电化教学中,为实现文本、图片、声音、影像、动画、链接、数据库等输出功能,必须借助现代计算机技术的相关语言测试分析、语料库建设、字典词典、机器翻译、语音识别及可视语音合成等教学软件。计算机辅助教学是外语教学中的重要和有效手段,并且始终在不断地完善和发展。本文阐述了现代外语电化教学的基本内容、目的以及相应的计算机教学主要软件控件。

  1. The perception of speech modulation cues in lexical tones is guided by early language-specific experience

    Directory of Open Access Journals (Sweden)

    Laurianne eCabrera

    2015-08-01

    Full Text Available A number of studies showed that infants reorganize their perception of speech sounds according to their native language categories during their first year of life. Still, information is lacking about the contribution of basic auditory mechanisms to this process. This study aimed to evaluate when native language experience starts to noticeably affect the perceptual processing of basic acoustic cues (i.e., frequency-modulation (FM and amplitude-modulation (AM information known to be crucial for speech perception in adults. The discrimination of a lexical-tone contrast (rising versus low was assessed in 6- and 10-month-old infants learning either French or Mandarin using a visual habituation paradigm. The lexical tones were presented in two conditions designed to either keep intact or to severely degrade the FM and fine spectral cues needed to accurately perceive voice-pitch trajectory. A third condition was designed to assess the discrimination of the same voice-pitch trajectories using click trains containing only the FM cues related to the fundamental-frequency (F0 in French- and Mandarin-learning 10-month-old infants. Results showed that the younger infants of both language groups and the Mandarin-learning 10-month-olds discriminated the intact lexical-tone contrast while French-learning 10-month-olds failed. However, only the French 10-month-olds discriminated degraded lexical tones when FM, and thus voice-pitch cues were reduced. Moreover, Mandarin-learning 10-month-olds were found to discriminate the pitch trajectories as presented in click trains better than French infants. Altogether, these results reveal that the perceptual reorganization occurring during the first year of life for lexical tones is coupled with changes in the auditory ability to use speech modulation cues.

  2. PHYSIOLOGICAL MONITORING OPERATORS ACS IN AUDIO-VISUAL SIMULATION OF AN EMERGENCY

    Directory of Open Access Journals (Sweden)

    S. S. Aleksanin

    2016-01-01

    Full Text Available In terms of ship simulator automated control systems we have investigated the information content of physiological monitoring cardiac rhythm to assess the reliability and noise immunity of operators of various specializations with audio-visual simulation of an emergency. In parallel, studied the effectiveness of protection against the adverse effects of electromagnetic fields. Monitoring of cardiac rhythm in a virtual crash it is possible to differentiate the degree of voltage regulation systems of body functions of operators on specialization and note the positive effect of the use of means of protection from exposure of electromagnetic fields.

  3. Using Play Activities and Audio-Visual Aids to Develop Speaking Skills

    Directory of Open Access Journals (Sweden)

    Casallas Mutis Nidia

    2000-08-01

    Full Text Available A project was conducted in order to improve oral proficiency in English through the use of play activities and audio-visual aids, with students of first grade in a bilingual school, in la Calera. They were between 6 and 7 years old. As the sample for this study, the fivestudents who had the lowest language oral proficiency were selected. According to the results, it is clear that the sample has improved their English oral proficiency a great deal. However, the process has to be continued because this skill needs constant practice in order to be developed.

  4. Attitude of medical students towards the use of audio visual aids during didactic lectures in pharmacology in a medical college of central India

    OpenAIRE

    Mehul Agrawal; Rajanish Kumar Sankdia

    2016-01-01

    Background: Students favour teaching methods employing audio visual aids over didactic lectures not using these aids. However, the optimum use of audio visual aids is essential for deriving their benefits. During a lecture, both the visual and auditory senses are used to absorb information. Different methods of lecture are and ndash; chalk and board, power point presentations (PPT) and mix of aids. This study was done to know the students' preference regarding the various audio visual aids, ...

  5. Effects of audio-visual aids on foreign language test anxiety, reading and listening comprehension, and retention in EFL learners.

    Science.gov (United States)

    Lee, Shu-Ping; Lee, Shin-Da; Liao, Yuan-Lin; Wang, An-Chi

    2015-04-01

    This study examined the effects of audio-visual aids on anxiety, comprehension test scores, and retention in reading and listening to short stories in English as a Foreign Language (EFL) classrooms. Reading and listening tests, general and test anxiety, and retention were measured in English-major college students in an experimental group with audio-visual aids (n=83) and a control group without audio-visual aids (n=94) with similar general English proficiency. Lower reading test anxiety, unchanged reading comprehension scores, and better reading short-term and long-term retention after four weeks were evident in the audiovisual group relative to the control group. In addition, lower listening test anxiety, higher listening comprehension scores, and unchanged short-term and long-term retention were found in the audiovisual group relative to the control group after the intervention. Audio-visual aids may help to reduce EFL learners' listening test anxiety and enhance their listening comprehension scores without facilitating retention of such materials. Although audio-visual aids did not increase reading comprehension scores, they helped reduce EFL learners' reading test anxiety and facilitated retention of reading materials. PMID:25914939

  6. Effects of audio-visual aids on foreign language test anxiety, reading and listening comprehension, and retention in EFL learners.

    Science.gov (United States)

    Lee, Shu-Ping; Lee, Shin-Da; Liao, Yuan-Lin; Wang, An-Chi

    2015-04-01

    This study examined the effects of audio-visual aids on anxiety, comprehension test scores, and retention in reading and listening to short stories in English as a Foreign Language (EFL) classrooms. Reading and listening tests, general and test anxiety, and retention were measured in English-major college students in an experimental group with audio-visual aids (n=83) and a control group without audio-visual aids (n=94) with similar general English proficiency. Lower reading test anxiety, unchanged reading comprehension scores, and better reading short-term and long-term retention after four weeks were evident in the audiovisual group relative to the control group. In addition, lower listening test anxiety, higher listening comprehension scores, and unchanged short-term and long-term retention were found in the audiovisual group relative to the control group after the intervention. Audio-visual aids may help to reduce EFL learners' listening test anxiety and enhance their listening comprehension scores without facilitating retention of such materials. Although audio-visual aids did not increase reading comprehension scores, they helped reduce EFL learners' reading test anxiety and facilitated retention of reading materials.

  7. Role of contextual cues on the perception of spectrally reduced interrupted speech.

    Science.gov (United States)

    Patro, Chhayakanta; Mendel, Lisa Lucks

    2016-08-01

    Understanding speech within an auditory scene is constantly challenged by interfering noise in suboptimal listening environments when noise hinders the continuity of the speech stream. In such instances, a typical auditory-cognitive system perceptually integrates available speech information and "fills in" missing information in the light of semantic context. However, individuals with cochlear implants (CIs) find it difficult and effortful to understand interrupted speech compared to their normal hearing counterparts. This inefficiency in perceptual integration of speech could be attributed to further degradations in the spectral-temporal domain imposed by CIs making it difficult to utilize the contextual evidence effectively. To address these issues, 20 normal hearing adults listened to speech that was spectrally reduced and spectrally reduced interrupted in a manner similar to CI processing. The Revised Speech Perception in Noise test, which includes contextually rich and contextually poor sentences, was used to evaluate the influence of semantic context on speech perception. Results indicated that listeners benefited more from semantic context when they listened to spectrally reduced speech alone. For the spectrally reduced interrupted speech, contextual information was not as helpful under significant spectral reductions, but became beneficial as the spectral resolution improved. These results suggest top-down processing facilitates speech perception up to a point, and it fails to facilitate speech understanding when the speech signals are significantly degraded. PMID:27586760

  8. Integration of audio-visual information for spatial decisions in children and adults.

    Science.gov (United States)

    Nardini, Marko; Bales, Jennifer; Mareschal, Denis

    2016-09-01

    In adults, decisions based on multisensory information can be faster and/or more accurate than those relying on a single sense. However, this finding varies significantly across development. Here we studied speeded responding to audio-visual targets, a key multisensory function whose development remains unclear. We found that when judging the locations of targets, children aged 4 to 12 years and adults had faster and less variable response times given auditory and visual information together compared with either alone. Comparison of response time distributions with model predictions indicated that children at all ages were integrating (pooling) sensory information to make decisions but that both the overall speed and the efficiency of sensory integration improved with age. The evidence for pooling comes from comparison with the predictions of Miller's seminal 'race model', as well as with a major recent extension of this model and a comparable 'pooling' (coactivation) model. The findings and analyses can reconcile results from previous audio-visual studies, in which infants showed speed gains exceeding race model predictions in a spatial orienting task (Neil et al., 2006) but children below 7 years did not in speeded reaction time tasks (e.g. Barutchu et al., 2009). Our results provide new evidence for early and sustained abilities to integrate visual and auditory signals for spatial localization from a young age. PMID:26190579

  9. A new alley in Opinion Mining using Senti Audio Visual Algorithm

    Directory of Open Access Journals (Sweden)

    Mukesh Rawat,

    2016-02-01

    Full Text Available People share their views about products and services over social media, blogs, forums etc. If someone is willing to spend resources and money over these products and services will definitely learn about them from the past experiences of their peers. Opinion mining plays vital role in knowing increasing interests of a particular community, social and political events, making business strategies, marketing campaigns etc. This data is in unstructured form over internet but analyzed properly can be of great use. Sentiment analysis focuses on polarity detection of emotions like happy, sad or neutral. In this paper we proposed an algorithm i.e. Senti Audio Visual for examining Video as well as Audio sentiments. A review in the form of video/audio may contain several opinions/emotions, this algorithm will classify the reviews with the help of Baye’s Classifiers to three different classes i.e., positive, negative or neutral. The algorithm will use smiles, cries, gazes, pauses, pitch, and intensity as relevant Audio Visual features.

  10. Designing Promotion Strategy of Malang Raya’s Tourism Destination Branding through Audio Visual Media

    Directory of Open Access Journals (Sweden)

    Chanira Nuansa

    2014-04-01

    Full Text Available This study examines the suitability concept of destination branding with existing models of Malang tourism promotion. This research is qualitative by taking the data directly in the form of existing promotional models of Malang, namely: information portal sites, blogs, social networking, and video via the Internet. This study used SWOT analysis to find strengths, weaknesses, opportunities, and threats on existing models of the tourism promotion. The data is analyzed based on destination branding’s concept indicators. Results of analysis are used as a basis in designing solutions for Malang tourism promotion through a new integrated tourism advertising model. Through the analysis we found that video is the most suitable media that used to promote Malang tourism in the form of advertisements. Videos are able to show the objectivity of the fact that intact better through audio-visual form, making it easier to associate the viewer thoughts on the phenomenon of destination. Moreover, video creation of Malang tourism as well as conceptualized ad is still rare. This is an opportunity, because later models of audio-visual advertisements made of this study is expected to be an example for concerned parties to conceptualize the next Malang tourism advertising.Keywords: Advertise, SWOT Analysis, Malang City, tourism promotion

  11. GRAPE - GIS Repetition Using Audio-Visual Repetition Units and its Leanring Effectiveness

    Science.gov (United States)

    Niederhuber, M.; Brugger, S.

    2011-09-01

    A new audio-visual learning medium has been developed at the Department of Environmental Sciences at ETH Zurich (Switzerland), for use in geographical information sciences (GIS) courses. This new medium, presented in the form of Repetition Units, allows students to review and consolidate the most important learning concepts on an individual basis. The new material consists of: a) a short enhanced podcast (recorded and spoken slide show) with a maximum duration of 5 minutes, which focuses on only one important aspect of a lecture's theme; b) one or two relevant exercises, covering different cognitive levels of learning, with a maximum duration of 10 minutes; and c), solutions for the exercises. During a pilot phase in 2010, six Repetition Units were produced by the lecturers. Twenty more Repetition Units will be produced by our students during the fall semester of 2011 and 2012. The project is accompanied by a 5-year study (2009 - 2013) that investigates learning success using the new material, focussing on the question, whether or not the new material help to consolidate and refresh basic GIS knowledge. It will be analysed based on longitudinal studies. Initial results indicate that the new medium helps to refresh knowledge as the test groups scored higher than the control group. These results are encouraging and suggest that the new material with its combination of short audio-visual podcasts and relevant exercises help to consolidate students' knowledge.

  12. THE IMPROVEMENT OF AUDIO-VISUAL BASED DANCE APPRECIATION LEARNING AMONG PRIMARY TEACHER EDUCATION STUDENTS OF MAKASSAR STATE UNIVERSITY

    Directory of Open Access Journals (Sweden)

    Wahira

    2014-06-01

    Full Text Available This research aimed to improve the skill in appreciating dances owned by the students of Primary Teacher Education of Makassar State University, to improve the perception towards audio-visual based art appreciation, to increase the students’ interest in audio-visual based art education subject, and to increase the students’ responses to the subject. This research was classroom action research using the research design created by Kemmis & MC. Taggart, which was conducted to 42 students of Primary Teacher Education of Makassar State University. The data collection was conducted using observation, questionnaire, and interview. The techniques of data analysis applied in this research were descriptive qualitative and quantitative. The results of this research were: (1 the students’ achievement in audio-visual based dance appreciation improved: precycle 33,33%, cycle I 42,85% and cycle II 83,33%, (2 the students’ perception towards the audio-visual based dance appreciation improved: cycle I 59,52%, and cycle II 71,42%. The students’ perception towards the subject obtained through structured interview in cycle I and II was 69,83% in a high category, (3 the interest of the students in the art education subject, especially audio-visual based dance appreciation, increased: cycle I 52,38% and cycle II 64,28%, and the students’ interest in the subject obtained through structured interview was 69,50 % in a high category. (3 the students’ response to audio-visual based dance appreciation increased: cycle I 54,76% and cycle II 69,04% in a good category.

  13. The application of manifold based visual speech units for visual speech recognition

    OpenAIRE

    Yu, Dahai

    2008-01-01

    This dissertation presents a new learning-based representation that is referred to as a Visual Speech Unit for visual speech recognition (VSR). The automated recognition of human speech using only features from the visual domain has become a significant research topic that plays an essential role in the development of many multimedia systems such as audio visual speech recognition(AVSR), mobile phone applications, human-computer interaction (HCI) and sign language recognition. The inclusio...

  14. Natural speech cues to word segmentation under difficult listening conditions

    OpenAIRE

    Cutler, A.; Butterfield, S.

    1989-01-01

    One of a listener's major tasks in understanding continuous speech is segmenting the speech signal into separate words. When listening conditions are difficult, speakers can help listeners by deliberately speaking more clearly. In three experiments, we examined how word boundaries are produced in deliberately clear speech. We found that speakers do indeed attempt to mark word boundaries; moreover, they differentiate between word boundaries in a way which suggests they are sensitive to listene...

  15. Integrating Audio-Visual Features and Text Information for Story Segmentation of News Video

    Institute of Scientific and Technical Information of China (English)

    Liu Hua-yong; Zhou Dong-ru

    2003-01-01

    Video data are composed of multimodal information streams including visual, auditory and textual streams, so an approach of story segmentation for news video using multimodal analysis is described in this paper. The proposed approach detects the topic-caption frames, and integrates them with silence clips detection results, as well as shot segmentation results to locate the news story boundaries. The integration of audio-visual features and text information overcomes the weakness of the approach using only image analysis techniques. On test data with 135 400 frames, when the boundaries between news stories are detected, the accuracy rate 85.8% and the recall rate 97.5% are obtained. The experimental results show the approach is valid and robust.

  16. Integrating Audio-Visual Features and Text Information for Story Segmentation of News Video

    Institute of Scientific and Technical Information of China (English)

    LiuHua-yong; ZhouDong-ru

    2003-01-01

    Video data are composed of multimodal information streams including visual, auditory and textual streams, an approach of story segmentation for news video using multimodal analysis is described in this paper. The proposed approach detects the topic-caption frames, and integrates them with silence clips detection results, as well as shot segmentation results to locate the news story boundaries. The integration of audio-visual features and text information overcomes the weakness of the approach using only image analysis techniques. On test data with 135 400 frames, when the boundaries between news stories are detected, the accuracy rate 85.8% and the recall rate 97.5% are obtained. The experimental results show the approach is valid and robust.

  17. Visual-Auditory Integration during Speech Imitation in Autism

    Science.gov (United States)

    Williams, Justin H. G.; Massaro, Dominic W.; Peel, Natalie J.; Bosseler, Alexis; Suddendorf, Thomas

    2004-01-01

    Children with autistic spectrum disorder (ASD) may have poor audio-visual integration, possibly reflecting dysfunctional "mirror neuron" systems which have been hypothesised to be at the core of the condition. In the present study, a computer program, utilizing speech synthesizer software and a "virtual" head (Baldi), delivered speech stimuli for…

  18. How is the McGurk effect modulated by Cued Speech in deaf and hearing adults?

    Science.gov (United States)

    Bayard, Clémence; Colin, Cécile; Leybaert, Jacqueline

    2014-01-01

    Speech perception for both hearing and deaf people involves an integrative process between auditory and lip-reading information. In order to disambiguate information from lips, manual cues from Cued Speech may be added. Cued Speech (CS) is a system of manual aids developed to help deaf people to clearly and completely understand speech visually (Cornett, 1967). Within this system, both labial and manual information, as lone input sources, remain ambiguous. Perceivers, therefore, have to combine both types of information in order to get one coherent percept. In this study, we examined how audio-visual (AV) integration is affected by the presence of manual cues and on which form of information (auditory, labial or manual) the CS receptors primarily rely. To address this issue, we designed a unique experiment that implemented the use of AV McGurk stimuli (audio /pa/ and lip-reading /ka/) which were produced with or without manual cues. The manual cue was congruent with either auditory information, lip information or the expected fusion. Participants were asked to repeat the perceived syllable aloud. Their responses were then classified into four categories: audio (when the response was /pa/), lip-reading (when the response was /ka/), fusion (when the response was /ta/) and other (when the response was something other than /pa/, /ka/ or /ta/). Data were collected from hearing impaired individuals who were experts in CS (all of which had either cochlear implants or binaural hearing aids; N = 8), hearing-individuals who were experts in CS (N = 14) and hearing-individuals who were completely naïve of CS (N = 15). Results confirmed that, like hearing-people, deaf people can merge auditory and lip-reading information into a single unified percept. Without manual cues, McGurk stimuli induced the same percentage of fusion responses in both groups. Results also suggest that manual cues can modify the AV integration and that their impact differs between hearing and deaf people

  19. How is the McGurk effect modulated by Cued Speech in deaf and hearing adults ?

    Directory of Open Access Journals (Sweden)

    Clémence eBayard

    2014-05-01

    Full Text Available Speech perception for both hearing and deaf people involves an integrative process between auditory and lip-reading information. In order to disambiguate information from lips, manual cues from Cued Speech may be added. Cued Speech (CS is a system of manual aids developed to help deaf people to clearly and completely understand speech visually (Cornett, 1967. Within this system, both labial and manual information, as lone input sources, remain ambiguous. Perceivers, therefore, have to combine both types of information in order to get one coherent percept. In this study, we examined how audio-visual (AV integration is affected by the presence of manual cues and on which form of information (auditory, labial or manual the CS receptors primarily rely. To address this issue, we designed a unique experiment that implemented the use of AV McGurk stimuli (audio /pa/ and lip-reading /ka/ which were produced with or without manual cues. The manual cue was congruent with either auditory information, lip information or the expected fusion. Participants were asked to repeat the perceived syllable aloud. Their responses were then classified into four categories: audio (when the response was /pa/, lip-reading (when the response was /ka/, fusion (when the response was /ta/ and other (when the response was something other than /pa/, /ka/ or /ta/. Data were collected from hearing impaired individuals who were experts in CS (all of which had either cochlear implants or binaural hearing aids; N=8, hearing-individuals who were experts in CS (N = 14 and hearing-individuals who were completely naïve of CS (N = 15. Results confirmed that, like hearing-people, deaf people can merge auditory and lip-reading information into a single unified percept. Without manual cues, McGurk stimuli induced the same percentage of fusion responses in both groups. Results also suggest that manual cues can modify the AV integration and that their impact differs between hearing and deaf

  20. How is the McGurk effect modulated by Cued Speech in deaf and hearing adults?

    Science.gov (United States)

    Bayard, Clémence; Colin, Cécile; Leybaert, Jacqueline

    2014-01-01

    Speech perception for both hearing and deaf people involves an integrative process between auditory and lip-reading information. In order to disambiguate information from lips, manual cues from Cued Speech may be added. Cued Speech (CS) is a system of manual aids developed to help deaf people to clearly and completely understand speech visually (Cornett, 1967). Within this system, both labial and manual information, as lone input sources, remain ambiguous. Perceivers, therefore, have to combine both types of information in order to get one coherent percept. In this study, we examined how audio-visual (AV) integration is affected by the presence of manual cues and on which form of information (auditory, labial or manual) the CS receptors primarily rely. To address this issue, we designed a unique experiment that implemented the use of AV McGurk stimuli (audio /pa/ and lip-reading /ka/) which were produced with or without manual cues. The manual cue was congruent with either auditory information, lip information or the expected fusion. Participants were asked to repeat the perceived syllable aloud. Their responses were then classified into four categories: audio (when the response was /pa/), lip-reading (when the response was /ka/), fusion (when the response was /ta/) and other (when the response was something other than /pa/, /ka/ or /ta/). Data were collected from hearing impaired individuals who were experts in CS (all of which had either cochlear implants or binaural hearing aids; N = 8), hearing-individuals who were experts in CS (N = 14) and hearing-individuals who were completely naïve of CS (N = 15). Results confirmed that, like hearing-people, deaf people can merge auditory and lip-reading information into a single unified percept. Without manual cues, McGurk stimuli induced the same percentage of fusion responses in both groups. Results also suggest that manual cues can modify the AV integration and that their impact differs between hearing and deaf people.

  1. Undifferentiated Facial Electromyography Responses to Dynamic, Audio-Visual Emotion Displays in Individuals with Autism Spectrum Disorders

    Science.gov (United States)

    Rozga, Agata; King, Tricia Z.; Vuduc, Richard W.; Robins, Diana L.

    2013-01-01

    We examined facial electromyography (fEMG) activity to dynamic, audio-visual emotional displays in individuals with autism spectrum disorders (ASD) and typically developing (TD) individuals. Participants viewed clips of happy, angry, and fearful displays that contained both facial expression and affective prosody while surface electrodes measured…

  2. Equipped for the 21st Century?: Audio-Visual Resource Standards and Product Demands from Geography Departments in the UK.

    Science.gov (United States)

    McKendrick, John H.; Bowden, Annabel

    2000-01-01

    Reports on a survey of United Kingdom geography departments where data were collected on the availability, use, and opinions about the role of audio visual resources (AVRs) in teaching and learning. Reveals that AVRs are seen positively, hardware is readily available, software provision is uneven, and AVR commitment varies. (CMK)

  3. Changes in the Management of Information in Audio-Visual Archives following Digitization: Current and Future Outlook

    Science.gov (United States)

    Caldera-Serrano, Jorge

    2008-01-01

    This article attempts to offer an overview of the current changes that are being experienced in the management of audio-visual documentation and those that can be forecast in the future as a result of the migration from analogue to digital information. For this purpose the documentary chain will be used as a basis to analyse individually the tasks…

  4. The role of vowel perceptual cues in compensatory responses to perturbations of speech auditory feedback

    OpenAIRE

    Reilly, Kevin J.; Dougherty, Kathleen E.

    2013-01-01

    The perturbation of acoustic features in a speaker's auditory feedback elicits rapid compensatory responses that demonstrate the importance of auditory feedback for control of speech output. The current study investigated whether responses to a perturbation of speech auditory feedback vary depending on the importance of the perturbed feature to perception of the vowel being produced. Auditory feedback of speakers' first formant frequency (F1) was shifted upward by 130 mels in randomly selecte...

  5. PENERAPAN STRATEGI LSQ BERBANTUAN MEDIA AUDIO VISUAL UNTUK MENINGKATKAN HASIL BELAJAR EKONOMI

    Directory of Open Access Journals (Sweden)

    Sholikhah Fakhratus

    2012-10-01

    Full Text Available The process which is being a constraint at High School 1 Kroya is the activities of� students in the learning process still lacking, the students still feel scared and ashamed to ask if� there isn�t� an� encouragement from the teacher, the teacher is still lack in the development of� teach- ing variation. The above is caused of�� needing to the use of� appropriate and varied methods and media as a tool in teaching and learning, one of� the alternatives by applying the learning strategies Learning Start With A Question (LSQ assisted by audio visual media. The design of� this study is an action research class with two cycles, each cycle includes planning, imple- mentation, observation and reflection. The results on the cycle I shows the average of�� student learning outcomes is 71,5 with classical completeness 65.7%, 67.71% of� the student activity in the high category, teacher�s activity in the learning is 67.5% or high category. For the result on the cycle II showed an average of� student learning outcomes 78,6 with classical complete- ness 85.7%, 76.57% of� student activities or activities of� the students in the high category, for teachers� activity is 87.5% with very high criteria.

  6. Effects of hearing loss on the subcortical representation of speech cues.

    Science.gov (United States)

    Anderson, Samira; Parbery-Clark, Alexandra; White-Schwoch, Travis; Drehobl, Sarah; Kraus, Nina

    2013-05-01

    Individuals with sensorineural hearing loss often report frustration with speech being loud but not clear, especially in background noise. Despite advanced digital technology, hearing aid users may resort to removing their hearing aids in noisy environments due to the perception of excessive loudness. In an animal model, sensorineural hearing loss results in greater auditory nerve coding of the stimulus envelope, leading to a relative deficit of stimulus fine structure. Based on the hypothesis that brainstem encoding of the temporal envelope is greater in humans with sensorineural hearing loss, speech-evoked brainstem responses were recorded in normal hearing and hearing impaired age-matched groups of older adults. In the hearing impaired group, there was a disruption in the balance of envelope-to-fine structure representation compared to that of the normal hearing group. This imbalance may underlie the difficulty experienced by individuals with sensorineural hearing loss when trying to understand speech in background noise. This finding advances the understanding of the effects of sensorineural hearing loss on central auditory processing of speech in humans. Moreover, this finding has clinical potential for developing new amplification or implantation technologies, and in developing new training regimens to address this relative deficit of fine structure representation. PMID:23654406

  7. Universal and language-specific sublexical cues in speech perception: a novel electroencephalography-lesion approach.

    Science.gov (United States)

    Obrig, Hellmuth; Mentzel, Julia; Rossi, Sonja

    2016-06-01

    SEE CAPPA DOI101093/BRAIN/AWW090 FOR A SCIENTIFIC COMMENTARY ON THIS ARTICLE  : The phonological structure of speech supports the highly automatic mapping of sound to meaning. While it is uncontroversial that phonotactic knowledge acts upon lexical access, it is unclear at what stage these combinatorial rules, governing phonological well-formedness in a given language, shape speech comprehension. Moreover few studies have investigated the neuronal network affording this important step in speech comprehension. Therefore we asked 70 participants-half of whom suffered from a chronic left hemispheric lesion-to listen to 252 different monosyllabic pseudowords. The material models universal preferences of phonotactic well-formedness by including naturally spoken pseudowords and digitally reversed exemplars. The latter partially violate phonological structure of all human speech and are rich in universally dispreferred phoneme sequences while preserving basic auditory parameters. Language-specific constraints were modelled in that half of the naturally spoken pseudowords complied with the phonotactics of the native language of the monolingual participants (German) while the other half did not. To ensure universal well-formedness and naturalness, the latter stimuli comply with Slovak phonotactics and all stimuli were produced by an early bilingual speaker. To maximally attenuate lexico-semantic influences, transparent pseudowords were avoided and participants had to detect immediate repetitions, a task orthogonal to the contrasts of interest. The results show that phonological 'well-formedness' modulates implicit processing of speech at different levels: universally dispreferred phonological structure elicits early, medium and late latency differences in the evoked potential. On the contrary, the language-specific phonotactic contrast selectively modulates a medium latency component of the event-related potentials around 400 ms. Using a novel event-related potential

  8. The efectiveness of mnemonic audio-visual aids in teaching content words to EFL students at a Turkish university

    OpenAIRE

    Kılınç, A Reha

    1996-01-01

    Ankara : Institute of Economics and Social Sciences, Bilkent University, 1996. Thesis(Master's) -- Bilkent University, 1996. Includes bibliographical references leaves 63-67 This experimental study aimed at investigating the effects of mnemonic audio-visual aids on recognition and recall of vocabulary items in comparison to a dictionary using control group. The study was conducted at Middle East Technical University Department of Basic English. The participants were 64 beginner and u...

  9. From vibration to perception: using Large Multi-Actuator Panels (LaMAPs) to create coherent audio-visual environments

    OpenAIRE

    Rébillat, Marc; Corteel, Etienne; Katz, Brian,; Boutillon, Xavier

    2012-01-01

    International audience Virtual reality aims at providing users with audio-visual worlds where they will behave and learn as if they were in the real world. In this context, specific acoustic transducers are needed to fulfill simultaneous spatial requirements on visual and audio rendering in order to make them coherent. Large multi-actuator panels (LaMAPs) allow for the combined construction of a projection screen and loudspeaker array, and thus allows for the coherent creation of an audio ...

  10. Effects of hearing loss on the subcortical representation of speech cues

    OpenAIRE

    Anderson, Samira; Parbery-Clark, Alexandra; White-Schwoch, Travis; Drehobl, Sarah; Kraus, Nina

    2013-01-01

    Individuals with sensorineural hearing loss often report frustration with speech being loud but not clear, especially in background noise. Despite advanced digital technology, hearing aid users may resort to removing their hearing aids in noisy environments due to the perception of excessive loudness. In an animal model, sensorineural hearing loss results in greater auditory nerve coding of the stimulus envelope, leading to a relative deficit of stimulus fine structure. Based on the hypothesi...

  11. Speech recall and word recognition depending on prosodic and musical cues as well as voice pitch

    OpenAIRE

    Rozanovskaya, Anna; Sokolova, Taisia

    2011-01-01

    Within this study, speech perception in different conditions was examined. The aim of the research was to compare perception results based on stimuli mode (plain spoken, rhythmic spoken or rhythmic sung stimuli) and pitch (normal, lower and higher). In the study, an experiment was conducted on 44 participants who had been asked to listen to 9 recorded sentences in Russian language (unknown to them) and write them down using Latin letters. These 9 sentences were specially prepared using differ...

  12. Synchronized audio-visual transients drive efficient visual search for motion-in-depth.

    Directory of Open Access Journals (Sweden)

    Marina Zannoli

    Full Text Available In natural audio-visual environments, a change in depth is usually correlated with a change in loudness. In the present study, we investigated whether correlating changes in disparity and loudness would provide a functional advantage in binding disparity and sound amplitude in a visual search paradigm. To test this hypothesis, we used a method similar to that used by van der Burg et al. to show that non-spatial transient (square-wave modulations of loudness can drastically improve spatial visual search for a correlated luminance modulation. We used dynamic random-dot stereogram displays to produce pure disparity modulations. Target and distractors were small disparity-defined squares (either 6 or 10 in total. Each square moved back and forth in depth in front of the background plane at different phases. The target's depth modulation was synchronized with an amplitude-modulated auditory tone. Visual and auditory modulations were always congruent (both sine-wave or square-wave. In a speeded search task, five observers were asked to identify the target as quickly as possible. Results show a significant improvement in visual search times in the square-wave condition compared to the sine condition, suggesting that transient auditory information can efficiently drive visual search in the disparity domain. In a second experiment, participants performed the same task in the absence of sound and showed a clear set-size effect in both modulation conditions. In a third experiment, we correlated the sound with a distractor instead of the target. This produced longer search times, indicating that the correlation is not easily ignored.

  13. Normal Gaze Cueing in Children with Autism Is Disrupted by Simultaneous Speech Utterances in “Live” Face-to-Face Interactions

    Directory of Open Access Journals (Sweden)

    Douglas D. Potter

    2011-01-01

    Full Text Available Gaze cueing was assessed in children with autism and in typically developing children, using a computer-controlled “live” face-to-face procedure. Sensitivity to gaze direction was assessed using a Posner cuing paradigm. Both static and dynamic directional gaze cues were used. Consistent with many previous studies, using photographic and cartoon faces, gaze cueing was present in children with autism and was not developmentally delayed. However, in the same children, gaze cueing was abolished when a mouth movement occurred at the same time as the gaze cue. In contrast, typical children were able to use gaze cues in all conditions. The findings indicate that gaze cueing develops successfully in some children with autism but that their attention is disrupted by speech utterances. Their ability to learn to read nonverbal emotional and intentional signals provided by the eyes may therefore be significantly impaired. This may indicate a problem with cross-modal attention control or an abnormal sensitivity to peripheral motion in general or the mouth region in particular.

  14. Attentional modulation of external speech attribution in patients with hallucinations and delusions.

    Science.gov (United States)

    Ilankovic, Lana Marija; Allen, Paul P; Engel, Rolf; Kambeitz, Joseph; Riedel, Michael; Müller, Norbert; Hennig-Fast, Kristina

    2011-04-01

    A range of psychological theories have been proposed to account for the experience of auditory hallucinations and delusions in schizophrenic patients. Most influential theories are those implicating the defective self-monitoring of inner speech. Some recent studies measured response bias independently of self-monitoring and found the results inconsistent with the defective self-monitoring model, but explained by an externalizing response bias. We aimed to investigate the role of attentional bias in external misattribution of source by modulating participant's endogenous expectancies. Comparisons were made between patients with paranoid schizophrenia (N=23) and matched healthy controls (N=23) who participated in two different versions of an audio-visual task, which differed based upon level of the cue predictiveness. The acoustic characteristic of voice was altered in half of the trials by shifting the pitch (distortion). Participants passively listened to recordings of single adjectives spoken in their own and another person's voice (alien) preceded by their own or another person's (alien) face and made self/non self judgments about the source. The patients showed increased error rates comparing to controls, when listening to the distorted self spoken words, misidentifying their own speech as produced by others. Importantly, patients made significantly more errors across all the invalid cue conditions. This suggests not only the presence of pathological misattribution bias, but also an inadequate balance between top-down and bottom-up attentional processes in the patients, which could be responsible for misattribution of the ambiguous sensory material. PMID:21241719

  15. Speech emotion recognition in emotional feedback for Human-Robot Interaction

    Directory of Open Access Journals (Sweden)

    Javier G. R´azuri

    2015-02-01

    Full Text Available For robots to plan their actions autonomously and interact with people, recognizing human emotions is crucial. For most humans nonverbal cues such as pitch, loudness, spectrum, speech rate are efficient carriers of emotions. The features of the sound of a spoken voice probably contains crucial information on the emotional state of the speaker, within this framework, a machine might use such properties of sound to recognize emotions. This work evaluated six different kinds of classifiers to predict six basic universal emotions from non-verbal features of human speech. The classification techniques used information from six audio files extracted from the eNTERFACE05 audio-visual emotion database. The information gain from a decision tree was also used in order to choose the most significant speech features, from a set of acoustic features commonly extracted in emotion analysis. The classifiers were evaluated with the proposed features and the features selected by the decision tree. With this feature selection could be observed that each one of compared classifiers increased the global accuracy and the recall. The best performance was obtained with Support Vector Machine and bayesNet.

  16. New method for mathematical modelling of human visual speech

    OpenAIRE

    Sadaghiani, Mohammad Hossein/Mr.

    2015-01-01

    Audio-visual speech recognition and visual speech synthesisers are used as interfaces between humans and machines. Such interactions specifically rely on the analysis and synthesis of both audio and visual information, which humans use for face-to-face communication. Currently, there is no global standard to describe these interactions nor is there a standard mathematical tool to describe lip movements. Furthermore, the visual lip movement for each phoneme is considered in isolation rather th...

  17. Susceptibility to a multisensory speech illusion in older persons is driven by perceptual processes

    Directory of Open Access Journals (Sweden)

    Annalisa eSetti

    2013-09-01

    Full Text Available Recent studies suggest that multisensory integration is enhanced in older adults but it is not known whether this enhancement is solely driven by perceptual processes or affected by cognitive processes. Using the ‘McGurk illusion’, in Experiment 1 we found that audio-visual integration of incongruent audio-visual words was higher in older adults than in younger adults, although the recognition of either audio- or visual-only presented words was the same across groups. In Experiment 2 we tested recall of sentences within which an incongruent audio-visual speech word was embedded. The overall semantic meaning of the sentence was compatible with either one of the unisensory components of the target word and/or with the illusory percept. Older participants recalled more illusory audio-visual words in sentences than younger adults, however, there was no differential effect of word compatibility on recall for the two groups. Our findings suggest that the relatively high susceptibility to the audio-visual speech illusion in older participants is due more to perceptual than cognitive processing.

  18. Hand gestures as visual prosody: BOLD responses to audio-visual alignment are modulated by the communicative nature of the stimuli.

    Science.gov (United States)

    Biau, Emmanuel; Morís Fernández, Luis; Holle, Henning; Avila, César; Soto-Faraco, Salvador

    2016-05-15

    During public addresses, speakers accompany their discourse with spontaneous hand gestures (beats) that are tightly synchronized with the prosodic contour of the discourse. It has been proposed that speech and beat gestures originate from a common underlying linguistic process whereby both speech prosody and beats serve to emphasize relevant information. We hypothesized that breaking the consistency between beats and prosody by temporal desynchronization, would modulate activity of brain areas sensitive to speech-gesture integration. To this aim, we measured BOLD responses as participants watched a natural discourse where the speaker used beat gestures. In order to identify brain areas specifically involved in processing hand gestures with communicative intention, beat synchrony was evaluated against arbitrary visual cues bearing equivalent rhythmic and spatial properties as the gestures. Our results revealed that left MTG and IFG were specifically sensitive to speech synchronized with beats, compared to the arbitrary vision-speech pairing. Our results suggest that listeners confer beats a function of visual prosody, complementary to the prosodic structure of speech. We conclude that the emphasizing function of beat gestures in speech perception is instantiated through a specialized brain network sensitive to the communicative intent conveyed by a speaker with his/her hands. PMID:26892858

  19. Audiovisual Speech Perception in Infancy: The Influence of Vowel Identity and Infants' Productive Abilities on Sensitivity to (Mis)Matches between Auditory and Visual Speech Cues

    Science.gov (United States)

    Altvater-Mackensen, Nicole; Mani, Nivedita; Grossmann, Tobias

    2016-01-01

    Recent studies suggest that infants' audiovisual speech perception is influenced by articulatory experience (Mugitani et al., 2008; Yeung & Werker, 2013). The current study extends these findings by testing if infants' emerging ability to produce native sounds in babbling impacts their audiovisual speech perception. We tested 44 6-month-olds…

  20. Clever Use of Audio-visual Media to Promote the Teaching of History%巧用电教媒体推进历史教学

    Institute of Scientific and Technical Information of China (English)

    刘艳丽

    2012-01-01

    把电教手段引入课堂教学是一类比较新的教学方式创新,特别是运用在历史的教学实践中,在历史课堂上使用电教媒体,不但能增加学生对过去历史的具体感知,同时也可以通过对历史事实的客观描述强化思维力度.本文笔者对电教媒体教学的特点进行了详细介绍,并就如何利用电教媒体推动历史教学方面谈了自己的一些感受.%Audio-visual means of introduction of classroom teaching a class of relatively new way of teaching innovation, especially the use of in the history of teaching practice, the use of audio-visual media in the history classroom, not only to in- crease the students' past history perception, and also throughobjective description of historical facts are efforts to suengthen the thinking. This article the author described in detail the characteristics of the audio-visual media teaching, and on how to promote the teaching of history in the use of audio-visual media to talk about their own feelings.

  1. Twenty-Fifth Annual Audio-Visual Aids Conference, Wednesday 9th to Friday 11th July 1975, Whitelands College, Putney SW15. Conference Preprints.

    Science.gov (United States)

    National Committee for Audio-Visual Aids in Education, London (England).

    Preprints of papers to be presented at the 25th annual Audio-Visual Aids Conference are collected along with the conference program. Papers include official messages, a review of the conference's history, and presentations on photography in education, using school broadcasts, flexibility in the use of television, the "communications generation,"…

  2. Relationship between Audio-Visual Materials and Environmental Factors on Students Academic Performance in Senior Secondary Schools in Borno State: Implications for Counselling

    Science.gov (United States)

    Bello, S.; Goni, Umar

    2016-01-01

    This is a survey study, designed to determine the relationship between audio-visual materials and environmental factors on students' academic performance in Senior Secondary Schools in Borno State: Implications for Counselling. The study set two research objectives, and tested two research hypotheses. The population of this study is 1,987 students…

  3. Attitude of medical students towards the use of audio visual aids during didactic lectures in pharmacology in a medical college of central India

    Directory of Open Access Journals (Sweden)

    Mehul Agrawal

    2016-04-01

    Conclusions: In our study we found that students preferred mixture of audio visual aids over other teaching methods. Teachers should consider the suggestions given by the students while preparing their lectures. [Int J Basic Clin Pharmacol 2016; 5(2.000: 416-422

  4. Design Foundations for Content-Rich Acoustic Interfaces: Investigating Audemes as Referential Non-Speech Audio Cues

    Science.gov (United States)

    Ferati, Mexhid Adem

    2012-01-01

    To access interactive systems, blind and visually impaired users can leverage their auditory senses by using non-speech sounds. The current structure of non-speech sounds, however, is geared toward conveying user interface operations (e.g., opening a file) rather than large theme-based information (e.g., a history passage) and, thus, is ill-suited…

  5. Pitch and spectral resolution: A systematic comparison of bottom-up cues for top-down repair of degraded speech

    NARCIS (Netherlands)

    Clarke, Jeanne; Başkent, Deniz; Gaudrain, Etienne

    2016-01-01

    The brain is capable of restoring missing parts of speech, a top-down repair mechanism that enhances speech understanding in noisy environments. This enhancement can be quantified using the phonemic restoration paradigm, i.e., the improvement in intelligibility when silent interruptions of interrupt

  6. A scheme for racquet sports video analysis with the combination of audio-visual information

    Science.gov (United States)

    Xing, Liyuan; Ye, Qixiang; Zhang, Weigang; Huang, Qingming; Yu, Hua

    2005-07-01

    As a very important category in sports video, racquet sports video, e.g. table tennis, tennis and badminton, has been paid little attention in the past years. Considering the characteristics of this kind of sports video, we propose a new scheme for structure indexing and highlight generating based on the combination of audio and visual information. Firstly, a supervised classification method is employed to detect important audio symbols including impact (ball hit), audience cheers, commentator speech, etc. Meanwhile an unsupervised algorithm is proposed to group video shots into various clusters. Then, by taking advantage of temporal relationship between audio and visual signals, we can specify the scene clusters with semantic labels including rally scenes and break scenes. Thirdly, a refinement procedure is developed to reduce false rally scenes by further audio analysis. Finally, an exciting model is proposed to rank the detected rally scenes from which many exciting video clips such as game (match) points can be correctly retrieved. Experiments on two types of representative racquet sports video, table tennis video and tennis video, demonstrate encouraging results.

  7. MENINGKATKAN KEMAMPUAN SISWA KELAS B DALAM PENGENALAN HURUF (AKSARA) DENGAN MENGGUNAKAN MEDIA AUDIO-VISUAL DI TK NEGERI PEMBINA 3 TARAKAN

    OpenAIRE

    Salbiah

    2011-01-01

    SALBIAH. 2011. Improve students' ability in identifying class Letter B (Script) by using audio visual media learning in TK Negeri Pembina 3 Tarakan.Thesis. Teacher Education Pedagogy Faculty of Education University of Tarakan. Main supervisor: Zulkifli, Assistant Supervisor: Wiwit Ike. Early childhood literacy in diverse ways. Children associated with various forms of communication with the familiar forms of symbols long before they can read and write. Development of early literacy learning a...

  8. THE EFFECT OF USING AUDIO-VISUAL AIDS VERSUS PICTURES ON FOREIGN LANGUAGE VOCABULARY LEARNING OF INDIVIDUALS WITH MILD INTELLECTUAL DISABILITY

    Directory of Open Access Journals (Sweden)

    Zahra Sadat NOORI

    2016-04-01

    Full Text Available This study aimed to examine the effect of using audio-visual aids and pictures on foreign language vocabulary learning of individuals with mild intellectual disability. Method: To this end, a comparison group quasi-experimental study was conducted along with a pre-test and a post-test. The participants were 16 individuals with mild intellectual disability living in a center for mentally disabled individuals in Dezfoul, Iran. They were all male individuals with the age range of 20 to 30. Their mother tongue was Persian, and they did not have any English background. In order to ensure that all participants were within the same IQ level, a standard IQ test, i.e. Colored Progressive Matrices test, was run. Afterwards, the participants were randomly assigned to two experimental groups; one group received the instruction through audio-visual aids, while the other group was taught through pictures. The treatment lasted for four weeks, 20 sessions on aggregate. A total number of 60 English words selected from the English package named 'The Smart Child' were taught. After the treatment, the participants took the posttest in which the researchers randomly selected 40 words from among the 60 target words. Results: The results of Mann-Whitney U-test indicated that using audio-visual aids was more effective than pictures in foreign language vocabulary learning of individuals with mild intellectual disability. Conclusions: It can be concluded that the use of audio-visual aids can be more effective than pictures in foreign language vocabulary learning of individuals with mild intellectual disability.

  9. Children's Judgments of Emotion from Conflicting Cues in Speech: Why 6-Year-Olds Are So Inflexible

    Science.gov (United States)

    Waxer, Matthew; Morton, J. Bruce

    2011-01-01

    Six-year-old children can judge a speaker's feelings either from content or paralanguage but have difficulty switching the basis of their judgments when these cues conflict. This inflexibility may relate to a lexical bias in 6-year-olds' judgments. Two experiments tested this claim. In Experiment 1, 6-year-olds (n = 40) were as inflexible when…

  10. The New Thinking on College English Audio-Visual Teaching%大学英语视听说多媒体网络教学新思路

    Institute of Scientific and Technical Information of China (English)

    陈亚斐; 丰建泉

    2011-01-01

    The paper analyses the current situation of audiovisual teaching in college English,interprets its characteristics,put forth the new idea of audio-visual English teaching in the environment of multi-media network.If multi-media network technique is integrated with audio-visual English teaching,there will appear the new teaching idea "students occupy central position and teachers take enlightening one",thus,the comprehensive ability of the students will be greatly improved,especially in audio-visual ability.%本文在分析当前大学英语视听说教学现状的基础上阐述了大学英语视听说教学的特点,并提出了大学英语视听说多媒体网络教学新思路。若可以将多媒体网络技术与大学英语视听说教学相结合,将会产生"学生主体,教师主导"的新教学思想,从而可以提高学生的综合能力,尤其是英语视听说能力。

  11. 视听新媒体内容元数据研究%Content Metadata of the Newly Audio-Visual Media Research

    Institute of Scientific and Technical Information of China (English)

    刘俊宇

    2014-01-01

    The rapid rise of audio -visual new media business makes marking way and method about metadata of content of new media become extremely important, appropriate marking way and method of content of audio-visual new media will directly impact exchange of contents of new media,its storage, positioning,retrieval, management and other related applications, meanwhile greatly affecting efficiency and sustainability of audio-visual new media.%视听新媒体业务的迅速崛起使得新媒体内容元数据标识方式和方法变得尤为重要,合理的视听新媒体内容标识方式和方法将对新媒体内容的交换、存储、定位、检索、管理等相关应用带来直接影响,对视听新媒体的高效性和可持续性有很大影响。

  12. UNDERSTANDING PROSE THROUGH TASK ORIENTED AUDIO-VISUAL ACTIVITY: AN AMERICAN MODERN PROSE COURSE AT THE FACULTY OF LETTERS, PETRA CHRISTIAN UNIVERSITY

    Directory of Open Access Journals (Sweden)

    Sarah Prasasti

    2001-01-01

    Full Text Available The method presented here provides the basis for a course in American prose for EFL students. Understanding and appreciation of American prose is a difficult task for the students because they come into contact with works that are full of cultural baggage and far apart from their own world. The audio visual aid is one of the alternatives to sensitize the students to the topic and the cultural background. Instead of proving the ready-made audio visual aids, teachers can involve students to actively engage in a more task oriented audiovisual project. Here, the teachers encourage their students to create their own audio visual aids using colors, pictures, sound and gestures as a point of initiation for further discussion. The students can use color that has become a strong element of fiction to help them calling up a forceful visual representation. Pictures can also stimulate the students to build their mental image. Sound and silence, which are a part of the fabric of literature, may also help them to increase the emotional impact.

  13. Characterizing sensory and cognitive factors of human speech processing through eye movements

    OpenAIRE

    Wendt, Dorothea Christine

    2013-01-01

    The primary goal of this thesis is to gain a better insight into any impediments in speech processing that occur due to sensory and cognitive factors. To achieve this, a new audio-visual paradigm based on the analysis of eye-movements is developed here which allows for an online analysis of the speech understanding process with possible applications in the field of audiology. The proposed paradigm is used to investigate the influence of background noise and linguistic complexity on the proces...

  14. The effect of visual cues on top-down restoration of temporally interrupted speech, with and without further degradations

    NARCIS (Netherlands)

    Benard, Michel R.; Başkent, Deniz

    2015-01-01

    In complex listening situations, cognitive restoration mechanisms are commonly used to enhance perception of degraded speech with inaudible segments. Profoundly hearing-impaired people with a cochlear implant (Cl) show less benefit from such mechanisms. However, both normal hearing (NH) listeners an

  15. Multisensory and Modality Specific Processing of Visual Speech in Different Regions of the Premotor Cortex

    Directory of Open Access Journals (Sweden)

    Daniel eCallan

    2014-05-01

    Full Text Available Behavioral and neuroimaging studies have demonstrated that brain regions involved with speech production also support speech perception, especially under degraded conditions. The premotor cortex has been shown to be active during both observation and execution of action (‘Mirror System’ properties, and may facilitate speech perception by mapping unimodal and multimodal sensory features onto articulatory speech gestures. For this functional magnetic resonance imaging (fMRI study, participants identified vowels produced by a speaker in audio-visual (saw the speaker’s articulating face and heard her voice, visual only (only saw the speaker’s articulating face, and audio only (only heard the speaker’s voice conditions with varying audio signal-to-noise ratios in order to determine the regions of the premotor cortex involved with multisensory and modality specific processing of visual speech gestures. The task was designed so that identification could be made with a high level of accuracy from visual only stimuli to control for task difficulty and differences in intelligibility. The results of the fMRI analysis for visual only and audio-visual conditions showed overlapping activity in inferior frontal gyrus and premotor cortex. The left ventral inferior premotor cortex showed properties of multimodal (audio-visual enhancement with a degraded auditory signal. The left inferior parietal lobule and right cerebellum also showed these properties. The left ventral superior and dorsal premotor cortex did not show this multisensory enhancement effect, but there was greater activity for the visual only over audio-visual conditions in these areas. The results suggest that the inferior regions of the ventral premotor cortex are involved with integrating multisensory information, whereas, more superior and dorsal regions of the premotor cortex are involved with mapping unimodal (in this case visual sensory features of the speech signal with

  16. Maternal depression and the learning-promoting effects of infant-directed speech: Roles of maternal sensitivity, depression diagnosis, and speech acoustic cues.

    Science.gov (United States)

    Kaplan, Peter S; Danko, Christina M; Cejka, Anna M; Everhart, Kevin D

    2015-11-01

    The hypothesis that the associative learning-promoting effects of infant-directed speech (IDS) depend on infants' social experience was tested in a conditioned-attention paradigm with a cumulative sample of 4- to 14-month-old infants. Following six forward pairings of a brief IDS segment and a photographic slide of a smiling female face, infants of clinically depressed mothers exhibited evidence of having acquired significantly weaker voice-face associations than infants of non-depressed mothers. Regression analyses revealed that maternal depression was significantly related to infant learning even after demographic correlates of depression, antidepressant medication use, and extent of pitch modulation in maternal IDS had been taken into account. However, after maternal depression had been accounted for, maternal emotional availability, coded by blind raters from separate play interactions, accounted for significant further increments in the proportion of variance accounted for in infant learning scores. Both maternal depression and maternal insensitivity negatively, and additively, predicted poor learning.

  17. 电教教材建设的实践与思考%Practice and thinking of audio-visual teaching materials construction

    Institute of Scientific and Technical Information of China (English)

    杨宝强; 刘守东; 王莹

    2013-01-01

    电教教材已成为提高教育教学质量、全面实施素质教育的重要手段和培养学生创新精神及实践能力的重要途径。文章系统分析和梳理了空军工程大学电教教材建设理论与实践工作,提出了“着眼复合型人才培养,完善建设体系;发挥人才和技术优势,形成建设合力;实行全流程精细化管理,确保建设质量;推进数字资源共建共享,提高使用效益”的建设策略,对于高等院校探索信息化人才培养,打造精品电教教材有一定的借鉴意义和参考价值。%Audio-visual teaching materials are the important means of improving teaching quality and implementing quality-oriented education .They are also an important way to cultivate students 'innovative spirit and practical ability .This article analyzes the theory and practice of audio-visual teaching material construction at our university .Then it puts forward construction strategies as follows:focusing on the compound-type talents training and improving construction system; giving play to the advantages of talents and technology to form construction power; carrying out whole-process fine management to ensure the quality of construction; and promoting co-construction and sharing of digital resources to enhance using benefits .This study has valuable reference for information professional training and top-quality audio-visual teaching material construction .

  18. A framework for event detection in field-sports video broadcasts based on SVM generated audio-visual feature model. Case-study: soccer video

    OpenAIRE

    Sadlier, David A.; O''Connor, Noel E.; Murphy, Noel; Marlow, Seán

    2004-01-01

    In this paper we propose a novel audio-visual feature-based framework, for event detection in field sports broadcast video. The system is evaluated via a case-study involving MPEG encoded soccer video. Specifically, the evidence gathered by various feature detectors is combined by means of a learning algorithm (a support vector machine), which infers the occurrence of an event, based on a model generated during a training phase, utilizing a corpus of 25 hours of content. The system is evaluat...

  19. THE EFFECT OF USING AUDIO-VISUAL AIDS VERSUS PICTURES ON FOREIGN LANGUAGE VOCABULARY LEARNING OF INDIVIDUALS WITH MILD INTELLECTUAL DISABILITY

    OpenAIRE

    Zahra Sadat NOORI; FARVARDIN Mohammad Taghi

    2016-01-01

    This study aimed to examine the effect of using audio-visual aids and pictures on foreign language vocabulary learning of individuals with mild intellectual disability. Method: To this end, a comparison group quasi-experimental study was conducted along with a pre-test and a post-test. The participants were 16 individuals with mild intellectual disability living in a center for mentally disabled individuals in Dezfoul, Iran. They were all male individuals with the age range of 20 to 30. Th...

  20. 基于网络平台的汉语视听教材设计%Chinese Audio-visual Teaching Material Design Based on Internet Platform

    Institute of Scientific and Technical Information of China (English)

    徐文婷

    2012-01-01

    In recent twenty years, teaching Chinese as a foreign language has got a rapid development. The international promotion of Chinese has been one of the most important strategies of peaceful development of the country. Although the theory and practice of teaching Chinese as a foreign language have acquired a great achievement, few scholars pay close at- tention to the study of the educational and teaching ideas of audio-visual teaching material, which is a pity. So the present paper focuses on the design principle of audio-visual teaching material.%近二十多年来,我国对外汉语教学事业蓬勃发展,"汉语国际推广"已成为21世纪国家和平发展的重要战略之一。对外汉语教学理论和实践的研究成果颇丰。但是,对"汉语视听教材"的对外汉语教育教学思想的研究状况却与之相形见绌,这不能不说是一个缺憾。因此,本文旨在探讨基于网络的汉语视听教材的设计原则,以期抛砖引玉,引起界内学者的关注与研究。

  1. Audio-visual speechreading in a group of hearing aid users. The effects of onset age, handicap age, and degree of hearing loss.

    Science.gov (United States)

    Tillberg, I; Rönnberg, J; Svärd, I; Ahlner, B

    1996-01-01

    Speechreading ability was investigated among hearing aid users with different time of onset and different degree of hearing loss. Audio-visual and visual-only performance were assessed. One group of subjects had been hearing-impaired for a large part of their lives, and the impairments appeared early in life. The other group of subjects had been impaired for a fewer number of years, and the impairments appeared later in life. Differences between the groups were obtained. There was no significant difference on the audio-visual test between the groups in spite of the fact that the early onset group scored very poorly auditorily. However, the early-onset group performed significantly better on the visual test. It was concluded that the visual information constituted the dominant coding strategy for the early onset group. An interpretation chiefly in terms of early onset may be the most appropriate, since dB loss variations as such are not related to speechreading skill. PMID:8976000

  2. The influence of previous environmental history on audio-visual binding occurs during visual-weighted but not auditory-weighted environments.

    Science.gov (United States)

    Wilbiks, Jonathan M P; Dyson, Benjamin J

    2013-01-01

    Although there is substantial evidence for the adjustment of audio-visual binding as a function of the distribution of audio-visual lag, it is not currently clear whether adjustment can take place as a function of task demands. To address this, participants took part in competitive binding paradigms whereby a temporally roving auditory stimulus was assigned to one of two visual anchors (visual-weighted; VAV), or, a temporally roving visual stimulus was assigned to one of two auditory anchors (auditory-weighted; AVA). Using a blocked design it was possible to assess the malleability of audiovisual binding as a function of both the repetition and change of paradigm. VAV performance showed sensitivity to preceding contexts, echoing previous 'repulsive' effects shown in recalibration literature. AVA performance showed no sensitivity to preceding contexts. Despite the use of identical equi-probable temporal distributions in both paradigms, data support the contention that visual contexts may be more sensitive than auditory contexts in being influenced by previous environmental history of temporal events.

  3. Audio-visual speechreading in a group of hearing aid users. The effects of onset age, handicap age, and degree of hearing loss.

    Science.gov (United States)

    Tillberg, I; Rönnberg, J; Svärd, I; Ahlner, B

    1996-01-01

    Speechreading ability was investigated among hearing aid users with different time of onset and different degree of hearing loss. Audio-visual and visual-only performance were assessed. One group of subjects had been hearing-impaired for a large part of their lives, and the impairments appeared early in life. The other group of subjects had been impaired for a fewer number of years, and the impairments appeared later in life. Differences between the groups were obtained. There was no significant difference on the audio-visual test between the groups in spite of the fact that the early onset group scored very poorly auditorily. However, the early-onset group performed significantly better on the visual test. It was concluded that the visual information constituted the dominant coding strategy for the early onset group. An interpretation chiefly in terms of early onset may be the most appropriate, since dB loss variations as such are not related to speechreading skill.

  4. 试论多媒体网络在英语视听说教学中的作用%On the Function of Multimedia Network in English Audio-Visual Teaching

    Institute of Scientific and Technical Information of China (English)

    陈亚斐; 丰建泉

    2011-01-01

    The paper discusses the important roles of multi-media network in the audio-visual English teaching,namely,it is conducive to the change towards the students-centered initiative mode of study,to the optimum of audio-visual study,and to the improvement of the students' comprehensive ability,especially in audio-visual ability.%本文讨论了多媒体网络在英语视听说教学中的重要作用,视听说教学对"以学生为中心"教学模式的转变有推动作用,有利于优化视听说学习,同时有助于提高学生的综合能力尤其是英语视听说能力。

  5. Self-organizing maps for measuring similarity of audiovisual speech percepts

    DEFF Research Database (Denmark)

    Bothe, Hans-Heinrich

    The goal of this work is to find a way to measure similarity of audiovisual speech percepts. Phoneme-related self-organizing maps (SOM) with a rectangular basis are trained with data material from a (labeled) video film. For the training, a combination of auditory speech features and corresponding....... Dependent on the training data, these other units may also be contextually immediate neighboring units. The poster demonstrates the idea with text material spoken by one individual subject using a set of simple audio-visual features. The data material for the training process consists of 44 labeled...... sentences in German with a balanced phoneme repertoire. As a result it can be stated that (i) the SOM can be trained to map auditory and visual features in a topology-preserving way and (ii) they show strain due to the influence of other audio-visual units. The SOM can be used to measure similarity amongst...

  6. Talker Variability in Audiovisual Speech Perception

    Directory of Open Access Journals (Sweden)

    Shannon eHeald

    2014-07-01

    Full Text Available A change in talker is a change in the context for the phonetic interpretation of acoustic patterns of speech. Different talkers have different mappings between acoustic patterns and phonetic categories and listeners need to adapt to these differences. Despite this complexity, listeners are adept at comprehending speech in multiple-talker contexts, albeit at a slight but measurable performance cost (e.g., slower recognition. So far, this talker-variability cost has been demonstrated only in audio-only speech. Other research in single-talker contexts have shown, however, that when listeners are able to see a talker’s face, speech recognition is improved under adverse listening (e.g., noise or distortion conditions that can increase uncertainty in the mapping between acoustic patterns and phonetic categories. Does seeing a talker's face reduce the cost of word recognition in multiple-talker contexts? We used a speeded word-monitoring task in which listeners make quick judgments about target-word recognition in single- and multiple-talker contexts. Results show faster recognition performance in single-talker conditions compared to multiple-talker conditions for both audio-only and audio-visual speech. However, recognition time in a multiple-talker context was slower in the audio-visual condition compared to audio-only condition. These results suggest that seeing a talker’s face during speech perception may slow recognition by increasing the importance of talker identification, signaling to the listener a change in talker has occurred.

  7. Cardiac and pulmonary dose reduction for tangentially irradiated breast cancer, utilizing deep inspiration breath-hold with audio-visual guidance, without compromising target coverage

    International Nuclear Information System (INIS)

    Background and purpose. Cardiac disease and pulmonary complications are documented risk factors in tangential breast irradiation. Respiratory gating radiotherapy provides a possibility to substantially reduce cardiopulmonary doses. This CT planning study quantifies the reduction of radiation doses to the heart and lung, using deep inspiration breath-hold (DIBH). Patients and methods. Seventeen patients with early breast cancer, referred for adjuvant radiotherapy, were included. For each patient two CT scans were acquired; the first during free breathing (FB) and the second during DIBH. The scans were monitored by the Varian RPM respiratory gating system. Audio coaching and visual feedback (audio-visual guidance) were used. The treatment planning of the two CT studies was performed with conformal tangential fields, focusing on good coverage (V95>98%) of the planning target volume (PTV). Dose-volume histograms were calculated and compared. Doses to the heart, left anterior descending (LAD) coronary artery, ipsilateral lung and the contralateral breast were assessed. Results. Compared to FB, the DIBH-plans obtained lower cardiac and pulmonary doses, with equal coverage of PTV. The average mean heart dose was reduced from 3.7 to 1.7 Gy and the number of patients with >5% heart volume receiving 25 Gy or more was reduced from four to one of the 17 patients. With DIBH the heart was completely out of the beam portals for ten patients, with FB this could not be achieved for any of the 17 patients. The average mean dose to the LAD coronary artery was reduced from 18.1 to 6.4 Gy. The average ipsilateral lung volume receiving more than 20 Gy was reduced from 12.2 to 10.0%. Conclusion. Respiratory gating with DIBH, utilizing audio-visual guidance, reduces cardiac and pulmonary doses for tangentially treated left sided breast cancer patients without compromising the target coverage

  8. Audio-Visual Biofeedback Does Not Improve the Reliability of Target Delineation Using Maximum Intensity Projection in 4-Dimensional Computed Tomography Radiation Therapy Planning

    International Nuclear Information System (INIS)

    Purpose: To investigate whether coaching patients' breathing would improve the match between ITVMIP (internal target volume generated by contouring in the maximum intensity projection scan) and ITV10 (generated by combining the gross tumor volumes contoured in 10 phases of a 4-dimensional CT [4DCT] scan). Methods and Materials: Eight patients with a thoracic tumor and 5 patients with an abdominal tumor were included in an institutional review board-approved prospective study. Patients underwent 3 4DCT scans with: (1) free breathing (FB); (2) coaching using audio-visual (AV) biofeedback via the Real-Time Position Management system; and (3) coaching via a spirometer system (Active Breathing Coordinator or ABC). One physician contoured all scans to generate the ITV10 and ITVMIP. The match between ITVMIP and ITV10 was quantitatively assessed with volume ratio, centroid distance, root mean squared distance, and overlap/Dice coefficient. We investigated whether coaching (AV or ABC) or uniform expansions (1, 2, 3, or 5 mm) of ITVMIP improved the match. Results: Although both AV and ABC coaching techniques improved frequency reproducibility and ABC improved displacement regularity, neither improved the match between ITVMIP and ITV10 over FB. On average, ITVMIP underestimated ITV10 by 19%, 19%, and 21%, with centroid distance of 1.9, 2.3, and 1.7 mm and Dice coefficient of 0.87, 0.86, and 0.88 for FB, AV, and ABC, respectively. Separate analyses indicated a better match for lung cancers or tumors not adjacent to high-intensity tissues. Uniform expansions of ITVMIP did not correct for the mismatch between ITVMIP and ITV10. Conclusions: In this pilot study, audio-visual biofeedback did not improve the match between ITVMIP and ITV10. In general, ITVMIP should be limited to lung cancers, and modification of ITVMIP in each phase of the 4DCT data set is recommended

  9. Cardiac and pulmonary dose reduction for tangentially irradiated breast cancer, utilizing deep inspiration breath-hold with audio-visual guidance, without compromising target coverage

    Energy Technology Data Exchange (ETDEWEB)

    Vikstroem, Johan; Hjelstuen, Mari H.B.; Mjaaland, Ingvil; Dybvik, Kjell Ivar (Dept. of Radiotherapy, Stavanger Univ. Hospital, Stavanger (Norway)), e-mail: vijo@sus.no

    2011-01-15

    Background and purpose. Cardiac disease and pulmonary complications are documented risk factors in tangential breast irradiation. Respiratory gating radiotherapy provides a possibility to substantially reduce cardiopulmonary doses. This CT planning study quantifies the reduction of radiation doses to the heart and lung, using deep inspiration breath-hold (DIBH). Patients and methods. Seventeen patients with early breast cancer, referred for adjuvant radiotherapy, were included. For each patient two CT scans were acquired; the first during free breathing (FB) and the second during DIBH. The scans were monitored by the Varian RPM respiratory gating system. Audio coaching and visual feedback (audio-visual guidance) were used. The treatment planning of the two CT studies was performed with conformal tangential fields, focusing on good coverage (V95>98%) of the planning target volume (PTV). Dose-volume histograms were calculated and compared. Doses to the heart, left anterior descending (LAD) coronary artery, ipsilateral lung and the contralateral breast were assessed. Results. Compared to FB, the DIBH-plans obtained lower cardiac and pulmonary doses, with equal coverage of PTV. The average mean heart dose was reduced from 3.7 to 1.7 Gy and the number of patients with >5% heart volume receiving 25 Gy or more was reduced from four to one of the 17 patients. With DIBH the heart was completely out of the beam portals for ten patients, with FB this could not be achieved for any of the 17 patients. The average mean dose to the LAD coronary artery was reduced from 18.1 to 6.4 Gy. The average ipsilateral lung volume receiving more than 20 Gy was reduced from 12.2 to 10.0%. Conclusion. Respiratory gating with DIBH, utilizing audio-visual guidance, reduces cardiac and pulmonary doses for tangentially treated left sided breast cancer patients without compromising the target coverage

  10. 基础阶段西班牙语视听说课程的教学思考%Thoughts on Basic Spanish Audio-Visual Teaching

    Institute of Scientific and Technical Information of China (English)

    杨洁

    2012-01-01

    As a compulsory course in Spanish major, Spanish audio-visual Course of theFoundation Stage is the supplement and expansion of Intensive Reading Course. By listening to recordings, news, watching DVD,video and other means, this course is able to expose students to the different pronunciations and intonations of many Spanish-speaking countries. Besides, in studying this course the students would have a better understanding of the social and cultural backgrounds and the present development of these countries. It plays an important role not only in helping the students to broaden their horizons, but also in improving their knowledge of the theoretical system. However, clue to the limited foreign materials, there still exist some problems in the Spanish audio-visual teaching. The writer discusses some of her thoughts about this course based on her own teaching experience during these years.%基础阶段的西班牙语视听说课程作为一门专业必修课,是对精读课的补充与扩展,通过听录音、新闻,看影碟、录像等手段,能够使学生接触到众多西语国家不同的语音语调,了解其社会文化背景以及现今发展状态,不仅有助于学生拓宽视野,而且对完善其知识理论体系起到了举足轻重的作用。然而由于外文资料有限,西语视听说课程的教学也存在着一些难题,笔者简单地谈谈对于视听说课程的教学思考。

  11. 音像制品出版数量变化对图书馆音像资源建设的影响%The influence of publication quantifies of audio-visual products on the consstruction of library audio-visual resources

    Institute of Scientific and Technical Information of China (English)

    宾锋

    2012-01-01

    According to the statistics and analyses of audio-visual products publications from 2005 to 2010, the publication number and annual newly publication number of sound recordings and video products decreased obviously. The amount of CD, VCD and other media carriers reduced, but DVD-A, DVD-V and other new carriers increased. The publication quantities of education, language and other subjects of DVD-A and social sciences, education, comprehensive, music and dance and other subjects raised. Those changes have effects on the construction of library audio-visual resources. On the basis of demand changes, libraries should revise the proettrement rules of au- dio-visual materials in time, they should increase procurement efforts, form emphases and characteristics of collections, optimize the construction of collections, strengthen audio-visual database constructions, expand procurement channels and methods, try to establish a system of audio-visual materials purchase librarian.%对2005年-2010年音像制品出版数量统计分析表明:录音制品和录像制品年出版数量和年新出版数量已呈现明显下降;CD、VCD等载体出版数量下降,DVD-A、DVD-V等新载体出版数量上升;DVD-A中教育、语言等学科以及DVD-V中社会科学、教育、综合、音乐舞蹈等学科的出版量都在增加。这些变化对图书馆音像资源建设产生了影响,建议图书馆应根据需求变化及时修改音像资料采购细则,加大采购力度,形成重点及特色馆藏,优化馆藏建设,加强音像资料数据库建设,拓展采购渠道和采购方式,尝试建立音像资料采访馆员等。

  12. Behavioral Signal Processing: Deriving Human Behavioral Informatics From Speech and Language: Computational techniques are presented to analyze and model expressed and perceived human behavior-variedly characterized as typical, atypical, distressed, and disordered-from speech and language cues and their applications in health, commerce, education, and beyond.

    Science.gov (United States)

    Narayanan, Shrikanth; Georgiou, Panayiotis G

    2013-02-01

    The expression and experience of human behavior are complex and multimodal and characterized by individual and contextual heterogeneity and variability. Speech and spoken language communication cues offer an important means for measuring and modeling human behavior. Observational research and practice across a variety of domains from commerce to healthcare rely on speech- and language-based informatics for crucial assessment and diagnostic information and for planning and tracking response to an intervention. In this paper, we describe some of the opportunities as well as emerging methodologies and applications of human behavioral signal processing (BSP) technology and algorithms for quantitatively understanding and modeling typical, atypical, and distressed human behavior with a specific focus on speech- and language-based communicative, affective, and social behavior. We describe the three important BSP components of acquiring behavioral data in an ecologically valid manner across laboratory to real-world settings, extracting and analyzing behavioral cues from measured data, and developing models offering predictive and decision-making support. We highlight both the foundational speech and language processing building blocks as well as the novel processing and modeling opportunities. Using examples drawn from specific real-world applications ranging from literacy assessment and autism diagnostics to psychotherapy for addiction and marital well being, we illustrate behavioral informatics applications of these signal processing techniques that contribute to quantifying higher level, often subjectively described, human behavior in a domain-sensitive fashion.

  13. 关于动画视听语言课程教学改革的探索%Exploration on the Teaching Reform of Animation Audio-Visual Language Course

    Institute of Scientific and Technical Information of China (English)

    殷俊; 张慧

    2015-01-01

    视听语言是动画专业的基础课程,传统的纯理论教学已经不能满足当今社会需求。本文分别从提高视听语言课程教材的专业性,改变学生对视听语言课程的单纯认识,以实践操作丰富传统理论课程等角度探讨教学方法的实践,以期达到提高教学质量,学生完全掌握视听语言知识的目的。%Audio-visual language is a basic course of animation major, but the traditional pure theory teaching can no longer meet the needs of today's society. Respectively from improving the professionalization of audio-visual language curriculum materi-als, changing students' simple understanding of audio-visual language course, and enriching the traditional theory teaching by practical operations, this paper aims to improve the teaching quality and make students completely master audio-visual lan-guage knowledge.

  14. Audio-Visual Biofeedback Does Not Improve the Reliability of Target Delineation Using Maximum Intensity Projection in 4-Dimensional Computed Tomography Radiation Therapy Planning

    Energy Technology Data Exchange (ETDEWEB)

    Lu, Wei, E-mail: wlu@umm.edu [Department of Radiation Oncology, University of Maryland School of Medicine, Baltimore, Maryland (United States); Neuner, Geoffrey A.; George, Rohini; Wang, Zhendong; Sasor, Sarah [Department of Radiation Oncology, University of Maryland School of Medicine, Baltimore, Maryland (United States); Huang, Xuan [Research and Development, Care Management Department, Johns Hopkins HealthCare LLC, Glen Burnie, Maryland (United States); Regine, William F.; Feigenberg, Steven J.; D' Souza, Warren D. [Department of Radiation Oncology, University of Maryland School of Medicine, Baltimore, Maryland (United States)

    2014-01-01

    Purpose: To investigate whether coaching patients' breathing would improve the match between ITV{sub MIP} (internal target volume generated by contouring in the maximum intensity projection scan) and ITV{sub 10} (generated by combining the gross tumor volumes contoured in 10 phases of a 4-dimensional CT [4DCT] scan). Methods and Materials: Eight patients with a thoracic tumor and 5 patients with an abdominal tumor were included in an institutional review board-approved prospective study. Patients underwent 3 4DCT scans with: (1) free breathing (FB); (2) coaching using audio-visual (AV) biofeedback via the Real-Time Position Management system; and (3) coaching via a spirometer system (Active Breathing Coordinator or ABC). One physician contoured all scans to generate the ITV{sub 10} and ITV{sub MIP}. The match between ITV{sub MIP} and ITV{sub 10} was quantitatively assessed with volume ratio, centroid distance, root mean squared distance, and overlap/Dice coefficient. We investigated whether coaching (AV or ABC) or uniform expansions (1, 2, 3, or 5 mm) of ITV{sub MIP} improved the match. Results: Although both AV and ABC coaching techniques improved frequency reproducibility and ABC improved displacement regularity, neither improved the match between ITV{sub MIP} and ITV{sub 10} over FB. On average, ITV{sub MIP} underestimated ITV{sub 10} by 19%, 19%, and 21%, with centroid distance of 1.9, 2.3, and 1.7 mm and Dice coefficient of 0.87, 0.86, and 0.88 for FB, AV, and ABC, respectively. Separate analyses indicated a better match for lung cancers or tumors not adjacent to high-intensity tissues. Uniform expansions of ITV{sub MIP} did not correct for the mismatch between ITV{sub MIP} and ITV{sub 10}. Conclusions: In this pilot study, audio-visual biofeedback did not improve the match between ITV{sub MIP} and ITV{sub 10}. In general, ITV{sub MIP} should be limited to lung cancers, and modification of ITV{sub MIP} in each phase of the 4DCT data set is recommended.

  15. Using a three-dimension head mounted displayer in audio-visual sexual stimulation aids in differential diagnosis of psychogenic from organic erectile dysfunction.

    Science.gov (United States)

    Moon, K-H; Song, P-H; Park, T-C

    2005-01-01

    We designed this study to compare the efficacy of using a three-dimension head mounted displayer (3-D HMD) and a conventional monitor in audio-visual sexual stimulation (AVSS) in differential diagnosis of psychogenic from organic erectile dysfunction (ED). Three groups of subjects such as psychogenic ED, organic ED, and healthy control received the evaluation. The change of penile tumescence in AVSS was monitored with Nocturnal Electrobioimpedance Volumetric Assessment and sexual arousal after AVSS was assessed by a simple question as being good, fair, or poor. Both the group of healthy control and psychogenic ED demonstrated a significantly higher rate of normal response in penile tumescence (P<0.05) and a significantly higher level of sexual arousal (P<0.05) if stimulated with 3-D HMD than conventional monitor. In the group of organic ED, even using 3-D HMD in AVSS could not give rise to a better response in both assessments. Therefore, we conclude that using a 3-D HMD in AVSS helps more to differentiate psychogenic from organic ED than a conventional monitor in AVSS.

  16. The challenge of reducing scientific complexity for different target groups (without losing the essence) - experiences from interdisciplinary audio-visual media production

    Science.gov (United States)

    Hezel, Bernd; Broschkowski, Ephraim; Kropp, Jürgen

    2013-04-01

    The Climate Media Factory originates from an interdisciplinary media lab run by the Film and Television University "Konrad Wolf" Potsdam-Babelsberg (HFF) and the Potsdam Institute for Climate Impact Research (PIK). Climate scientists, authors, producers and media scholars work together to develop media products on climate change and sustainability. We strive towards communicating scientific content via different media platforms reconciling the communication needs of scientists and the audience's need to understand the complexity of topics that are relevant in their everyday life. By presenting four audio-visual examples, that have been designed for very different target groups, we show (i) the interdisciplinary challenges during the production process and the lessons learnt and (ii) possibilities to reach the required degree of simplification without the need for dumbing down the content. "We know enough about climate change" is a short animated film that was produced for the German Agency for International Cooperation (GIZ) for training programs and conferences on adaptation in the target countries including Indonesia, Tunisia and Mexico. "Earthbook" is a short animation produced for "The Year of Science" to raise awareness for the topics of sustainability among digital natives. "What is Climate Engineering?". Produced for the Institute for Advanced Sustainability Studies (IASS) the film is meant for an informed and interested public. "Wimmelwelt Energie!" is a prototype of an iPad application for children from 4-6 years of age to help them learn about different forms of energy and related greenhouse gas emissions.

  17. 翻转课堂在艺术类高校英语视听说教学中的应用%The application of flipped classroom in English audio-visual courses for art colleges

    Institute of Scientific and Technical Information of China (English)

    王莹莹; 孟庆娟

    2016-01-01

    This paper discusses how to apply flipped classroom to English audio-visual courses in art colleges efficiently. The author analyzes the features of flipped classroom theory and the arts majors, and gives some examples of college English audio-visual courses to show the detailed application of flipped classroom for art majors.%本文探讨在艺术类高校的英语教学中如何有效开展翻转课堂活动,笔者分析了翻转课堂的理念和艺术类大学生的特点,并以大学英语视听说课为例,提出翻转课堂在艺术类高校英语教学中的具体应用。

  18. The Present Situation of Teaching and Countermeasure Studies on Japanese Audio-visual-oral Course%日语视听说课程教学现状及对策研究

    Institute of Scientific and Technical Information of China (English)

    糜玲

    2012-01-01

    The Japanese Audio-visual-oral Course aims at promoting students' abilities of listening comprehension and oral Japanese as well as cross-cultural communicative competence. This paper introduces present situation of Japanese teaching of Audio-visual-oral Course and points out the problems of the Course, and discuss how to improve it.%日语视听说课程的开设目的,在于提高学生听说能力和跨文化交际能力。本文主要围绕当前视听说课程教学现状展开,就其中存在的问题以及改善方法进行探讨。

  19. 运用电教手段优化竞技健美操专业教学%Improvement of Sports Aerobics Teaching by Electrical Audio-visual Aids

    Institute of Scientific and Technical Information of China (English)

    赵静

    2011-01-01

    This paper discusses the better effects on teaching methods, teaching course, content of courses, teaching purpose and teaching results by education with electrical audio-visual aids in Sports Aerobics teaching, It provides the basis to the use of electrical audio-visual aids in Sports Aerobics teaching.%文章主要针对在竞技健美操专业课教学中运用电教手段,以达到优化教学方法,优化教学过程、优化教学内容、优化教学目的及优化教学效果等进行阐述,旨在为竞技健美操专业教学过程中合理运用电教手段提供科学依据.

  20. Collection of Digital Audio-visual Material Preservation and Backup Data Transfer%典藏音像资料保存与数字化备份转移

    Institute of Scientific and Technical Information of China (English)

    李浚

    2011-01-01

    According to the audio and video material carrier form, storage media technical features type classification accord- ing to different types of collection, audio-visual materials of corresponding preserving method proposed. In audio and video material carrier storage life could not infinite long cases, and many early video data broadcast devices will be eliminated, causing many valuable audio-visual material will collapse of reality, audio-visual materials need to put forward the urgency views. Finally talk about how video data provide detailed digital transfer methods.%根据音像资料载体形式、存储媒介技术特点进行类型划分,针对不同类型的典藏音像资料提出各种相应的保存方法。在音像资料载体保存期不可能无限长的情况下,以及很多早期音像资料播放设备即将被淘汰,致使许多珍贵声像资料面·临无法使用的现实,为此提出音像资料迫切需要数字化的观点。最后为音像资料怎样数字化转移提供了详细方法

  1. Speech misperception: speaking and seeing interfere differently with hearing.

    Directory of Open Access Journals (Sweden)

    Takemi Mochida

    Full Text Available Speech perception is thought to be linked to speech motor production. This linkage is considered to mediate multimodal aspects of speech perception, such as audio-visual and audio-tactile integration. However, direct coupling between articulatory movement and auditory perception has been little studied. The present study reveals a clear dissociation between the effects of a listener's own speech action and the effects of viewing another's speech movements on the perception of auditory phonemes. We assessed the intelligibility of the syllables [pa], [ta], and [ka] when listeners silently and simultaneously articulated syllables that were congruent/incongruent with the syllables they heard. The intelligibility was compared with a condition where the listeners simultaneously watched another's mouth producing congruent/incongruent syllables, but did not articulate. The intelligibility of [ta] and [ka] were degraded by articulating [ka] and [ta] respectively, which are associated with the same primary articulator (tongue as the heard syllables. But they were not affected by articulating [pa], which is associated with a different primary articulator (lips from the heard syllables. In contrast, the intelligibility of [ta] and [ka] was degraded by watching the production of [pa]. These results indicate that the articulatory-induced distortion of speech perception occurs in an articulator-specific manner while visually induced distortion does not. The articulator-specific nature of the auditory-motor interaction in speech perception suggests that speech motor processing directly contributes to our ability to hear speech.

  2. Speech misperception: speaking and seeing interfere differently with hearing.

    Science.gov (United States)

    Mochida, Takemi; Kimura, Toshitaka; Hiroya, Sadao; Kitagawa, Norimichi; Gomi, Hiroaki; Kondo, Tadahisa

    2013-01-01

    Speech perception is thought to be linked to speech motor production. This linkage is considered to mediate multimodal aspects of speech perception, such as audio-visual and audio-tactile integration. However, direct coupling between articulatory movement and auditory perception has been little studied. The present study reveals a clear dissociation between the effects of a listener's own speech action and the effects of viewing another's speech movements on the perception of auditory phonemes. We assessed the intelligibility of the syllables [pa], [ta], and [ka] when listeners silently and simultaneously articulated syllables that were congruent/incongruent with the syllables they heard. The intelligibility was compared with a condition where the listeners simultaneously watched another's mouth producing congruent/incongruent syllables, but did not articulate. The intelligibility of [ta] and [ka] were degraded by articulating [ka] and [ta] respectively, which are associated with the same primary articulator (tongue) as the heard syllables. But they were not affected by articulating [pa], which is associated with a different primary articulator (lips) from the heard syllables. In contrast, the intelligibility of [ta] and [ka] was degraded by watching the production of [pa]. These results indicate that the articulatory-induced distortion of speech perception occurs in an articulator-specific manner while visually induced distortion does not. The articulator-specific nature of the auditory-motor interaction in speech perception suggests that speech motor processing directly contributes to our ability to hear speech. PMID:23844227

  3. Learning Words' Sounds before Learning How Words Sound: 9-Month-Olds Use Distinct Objects as Cues to Categorize Speech Information

    Science.gov (United States)

    Yeung, H. Henny; Werker, Janet F.

    2009-01-01

    One of the central themes in the study of language acquisition is the gap between the linguistic knowledge that learners demonstrate, and the apparent inadequacy of linguistic input to support induction of this knowledge. One of the first linguistic abilities in the course of development to exemplify this problem is in speech perception:…

  4. Audio-visual perception of 3D cinematography: an fMRI study using condition-based and computation-based analyses.

    Directory of Open Access Journals (Sweden)

    Akitoshi Ogawa

    Full Text Available The use of naturalistic stimuli to probe sensory functions in the human brain is gaining increasing interest. Previous imaging studies examined brain activity associated with the processing of cinematographic material using both standard "condition-based" designs, as well as "computational" methods based on the extraction of time-varying features of the stimuli (e.g. motion. Here, we exploited both approaches to investigate the neural correlates of complex visual and auditory spatial signals in cinematography. In the first experiment, the participants watched a piece of a commercial movie presented in four blocked conditions: 3D vision with surround sounds (3D-Surround, 3D with monaural sound (3D-Mono, 2D-Surround, and 2D-Mono. In the second experiment, they watched two different segments of the movie both presented continuously in 3D-Surround. The blocked presentation served for standard condition-based analyses, while all datasets were submitted to computation-based analyses. The latter assessed where activity co-varied with visual disparity signals and the complexity of auditory multi-sources signals. The blocked analyses associated 3D viewing with the activation of the dorsal and lateral occipital cortex and superior parietal lobule, while the surround sounds activated the superior and middle temporal gyri (S/MTG. The computation-based analyses revealed the effects of absolute disparity in dorsal occipital and posterior parietal cortices and of disparity gradients in the posterior middle temporal gyrus plus the inferior frontal gyrus. The complexity of the surround sounds was associated with activity in specific sub-regions of S/MTG, even after accounting for changes of sound intensity. These results demonstrate that the processing of naturalistic audio-visual signals entails an extensive set of visual and auditory areas, and that computation-based analyses can track the contribution of complex spatial aspects characterizing such life

  5. La regulación audiovisual: argumentos a favor y en contra The audio-visual regulation: the arguments for and against

    Directory of Open Access Journals (Sweden)

    Jordi Sopena Palomar

    2008-03-01

    Full Text Available El artículo analiza la efectividad de la regulación audiovisual y valora los diversos argumentos a favor y en contra de la existencia de consejos reguladores a nivel estatal. El debate sobre la necesidad de un organismo de este calado en España todavía persiste. La mayoría de los países comunitarios se han dotado de consejos competentes en esta materia, como es el caso del OFCOM en el Reino Unido o el CSA en Francia. En España, la regulación audiovisual se limita a organismos de alcance autonómico, como son el Consejo Audiovisual de Navarra, el de Andalucía y el Consell de l’Audiovisual de Catalunya (CAC, cuyo modelo también es abordado en este artículo. The article analyzes the effectiveness of the audio-visual regulation and assesses the different arguments for and against the existence of the broadcasting authorities at the state level. The debate of the necessity of a Spanish organism of regulation is still active. Most of the European countries have created some competent authorities, like the OFCOM in United Kingdom and the CSA in France. In Spain, the broadcasting regulation is developed by regional organisms, like the Consejo Audiovisual de Navarra, the Consejo Audiovisual de Andalucía and the Consell de l’Audiovisual de Catalunya (CAC, whose case is also studied in this article.

  6. Audio-visual perception of 3D cinematography: an fMRI study using condition-based and computation-based analyses.

    Science.gov (United States)

    Ogawa, Akitoshi; Bordier, Cecile; Macaluso, Emiliano

    2013-01-01

    The use of naturalistic stimuli to probe sensory functions in the human brain is gaining increasing interest. Previous imaging studies examined brain activity associated with the processing of cinematographic material using both standard "condition-based" designs, as well as "computational" methods based on the extraction of time-varying features of the stimuli (e.g. motion). Here, we exploited both approaches to investigate the neural correlates of complex visual and auditory spatial signals in cinematography. In the first experiment, the participants watched a piece of a commercial movie presented in four blocked conditions: 3D vision with surround sounds (3D-Surround), 3D with monaural sound (3D-Mono), 2D-Surround, and 2D-Mono. In the second experiment, they watched two different segments of the movie both presented continuously in 3D-Surround. The blocked presentation served for standard condition-based analyses, while all datasets were submitted to computation-based analyses. The latter assessed where activity co-varied with visual disparity signals and the complexity of auditory multi-sources signals. The blocked analyses associated 3D viewing with the activation of the dorsal and lateral occipital cortex and superior parietal lobule, while the surround sounds activated the superior and middle temporal gyri (S/MTG). The computation-based analyses revealed the effects of absolute disparity in dorsal occipital and posterior parietal cortices and of disparity gradients in the posterior middle temporal gyrus plus the inferior frontal gyrus. The complexity of the surround sounds was associated with activity in specific sub-regions of S/MTG, even after accounting for changes of sound intensity. These results demonstrate that the processing of naturalistic audio-visual signals entails an extensive set of visual and auditory areas, and that computation-based analyses can track the contribution of complex spatial aspects characterizing such life-like stimuli. PMID

  7. 巧用电教媒体开拓课改的新渠道%Using Audio-Visual Media to Open Up New Channels for Curriculum Reform

    Institute of Scientific and Technical Information of China (English)

    冯力

    2011-01-01

    This paper mainly from four aspects,to describe audio-visual media in the role of the new curriculum reform:First,create situations,stimulated interest in luring read;Second,the introduction of immigrants,familiar with fine thinking;Third,by virtue of situations,learn to accumulate;Fourth,the use of scenarios,stimulated interest,said guide;clever use of the network and optimize the combination of multimedia,the creation of scenarios;coaching teachers to allow students to read,Ziwu,self-training, since that;the use of information technology to develop students'self-channels for the new curriculum reform to open up new channels.%全文主要从四个方面,来论述电教媒体在新课程改革中的作用:一、创设情境,激趣诱读;二、引人入境。熟读精思;三、凭借情境,学会积累;四、运用情景,激趣导说;巧妙地运用网络及多媒体优化组合,创设情景;在教师的点拨下,让学生自读、自悟、自练、自说;利用信息技术手段开拓学生自学的渠道,为新课程改革开拓新渠道。

  8. Respiratory motion management using audio-visual biofeedback for respiratory-gated radiotherapy of synchrotron-based pulsed heavy-ion beam delivery

    Energy Technology Data Exchange (ETDEWEB)

    He, Pengbo; Ma, Yuanyuan; Huang, Qiyan; Yan, Yuanlin [Institute of Modern Physics, Chinese Academy of Sciences, Lanzhou 730000 (China); Key Laboratory of Heavy Ion Radiation Biology and Medicine of Chinese Academy of Sciences, Lanzhou 730000 (China); School of Life Sciences, University of Chinese Academy of Sciences, Beijing 100049 (China); Li, Qiang, E-mail: liqiang@impcas.ac.cn; Liu, Xinguo; Dai, Zhongying; Zhao, Ting; Fu, Tingyan; Shen, Guosheng [Institute of Modern Physics, Chinese Academy of Sciences, Lanzhou 730000 (China); Key Laboratory of Heavy Ion Radiation Biology and Medicine of Chinese Academy of Sciences, Lanzhou 730000 (China)

    2014-11-01

    Purpose: To efficiently deliver respiratory-gated radiation during synchrotron-based pulsed heavy-ion radiotherapy, a novel respiratory guidance method combining a personalized audio-visual biofeedback (BFB) system, breath hold (BH), and synchrotron-based gating was designed to help patients synchronize their respiratory patterns with synchrotron pulses and to overcome typical limitations such as low efficiency, residual motion, and discomfort. Methods: In-house software was developed to acquire body surface marker positions and display BFB, gating signals, and real-time beam profiles on a LED screen. Patients were prompted to perform short BHs or short deep breath holds (SDBH) with the aid of BFB following a personalized standard BH/SDBH (stBH/stSDBH) guiding curve or their own representative BH/SDBH (reBH/reSDBH) guiding curve. A practical simulation was performed for a group of 15 volunteers to evaluate the feasibility and effectiveness of this method. Effective dose rates (EDRs), mean absolute errors between the guiding curves and the measured curves, and mean absolute deviations of the measured curves were obtained within 10%–50% duty cycles (DCs) that were synchronized with the synchrotron’s flat-top phase. Results: All maneuvers for an individual volunteer took approximately half an hour, and no one experienced discomfort during the maneuvers. Using the respiratory guidance methods, the magnitude of residual motion was almost ten times less than during nongated irradiation, and increases in the average effective dose rate by factors of 2.39–4.65, 2.39–4.59, 1.73–3.50, and 1.73–3.55 for the stBH, reBH, stSDBH, and reSDBH guiding maneuvers, respectively, were observed in contrast with conventional free breathing-based gated irradiation, depending on the respiratory-gated duty cycle settings. Conclusions: The proposed respiratory guidance method with personalized BFB was confirmed to be feasible in a group of volunteers. Increased effective dose

  9. Phase Synchronization Analysis of EEG Signal During Audio-visual Stimulation%视听刺激脑电信号的相位同步分析

    Institute of Scientific and Technical Information of China (English)

    张立伟; 刘国忠; 罗倩; 徐炜君

    2012-01-01

    脑电(EEG)同步被认为是脑功能区域整合的表现.高级脑功能需要具有特定功能的多区域神经系统间进行不同层次的整合和协调来完成.本文提出了一种新的相位同步分析方法一互近似熵.采用分段频率,用同步指数、互信息熵与互近似熵方法对视听刺激EEG导联数据进行了相位同步的比较分析,三种分析得到了一致的结果,说明互近似熵方法也能很好反映出两导联的相位同步.文章同时通过相位同步分析结果进行了大脑反应区域的探索分析.此研究为脑机接口的设计奠定了基础.%EEG Synchronization is considered the conformity of the brain functional areas. Advanced brain function requires many nervous systems with a specific function in relevant brain regions (areas)to achieve integration and coordination at different levels. In this paper,a new method for phase synchronization analysis-Mutually Approximate Entropy is proposed to process different frequency band of EEG signal during audio-visual stimulation and get Similar results with the method of Synchronization Index and Mutual Information Entropy. This showed that the Mutually Approximate Entropy can lead to a good indication of the phase synchronization between two leads. The paper also explored the brain reaction zone by the results of the phase synchronization analysis. The research work lays the foundation for the brain-computer interface design.

  10. 互动教学法在大学英语视听说教学中的应用%A Study of Application of Interactive Approach into College English Audio-visual and Speaking Teaching

    Institute of Scientific and Technical Information of China (English)

    李之松

    2012-01-01

    Interactive approach is a specific embodiment of the teaching concept of being student-centered at college and it puts an em- phasis on the interaction between teachers and students, among students, and among the teachers and students and the teachingmaterials College English Audio-visual and Speaking Course pays close attention to students'English audio-visual and oral expression ability. By way of exploring the three modes of application of interactive approach into College English Audio-visual and Speaking classroom teach- ing and talking about the advantages and notices of applying the interactive approach into classroom teaching, with the students'real situ- ations taken into account, the paper is trying to search the methods of improving the efficiency of the classroom teaching so as to further promote the students'English viewing,listening and speaking abilities.%互动教学法是高校教学"以学生为中心"理念的一种体现,该教法强调师生互动,生生互动,同时也强调师生与教学内容的互动。大学英语视听说课程重视学生的英语视听和口头表达能力。通过分析与探讨互动教学法中三种互动模式在大学英语视听说课堂教学中的应用,讨论互动教学法应用的优势与注意事项,努力探寻提高大学英语视听说课堂教学效率的方法,进而结合学生的实际有效提高大学生英语视、听、说能力。

  11. 传统电视与视听新媒体融合发展路径的选择与拓展%The Selection and Expansion of Traditional TV and Audio-Visual New Media Convergence Development Path

    Institute of Scientific and Technical Information of China (English)

    王长潇

    2011-01-01

    The audio-visual new media has resulted in the diversion of the traditional TV's audience, advertising and personnel resources. In order to continue to maintain dominance in the future media landscape, the traditional TV should integrate audio-visual new media, foster audio-visual new media industries. From the logic of development of network technology and media law of evolution, this paper not only reveals the traditional TV development trend of self-improvement, but also proposes the traditional TV development path of the vertical extension and horizontal integration.%当前,由于传统电视与;觇听新媒体融合发展状况因技术依托和业务倚重而呈现扩散性发展态势,新情况、新问题不断出现,这就造成两者融合发展路径不是简单的终端扩张,也不是纯粹的网点联结,更不是注册域名建个网站,而是系统、全面的转型,首先面,瞄的是发展路径的选择与拓展。论文从网络技术发展逻辑和媒介演进规律出发,揭示了传统电视自我完善发展的必然趋势、内在规律以及核心问题,提出了传统电视与视听新媒体融合发展路径的纵向延伸与横向联合的理论观点。

  12. 浅谈电化教学对传统教学的继承与发展%On the Role of Audio-Visual Teaching in the Inheritance and Development of Traditional Teaching

    Institute of Scientific and Technical Information of China (English)

    汤卓凡

    2011-01-01

    随着信息技术的迅猛发展和教育改革的不断深入,以信息化带动教育的现代化,努力实现我国教育的跨越式发展已经成为必然趋势。作为信息和知识传递工具的电化教学具有交互性、应答式和可控制性等优势,它适应了现代学习者积极探索、终身学习的要求。但是,电化教学在本质上仍然是一种工具,作为一种辅助手段有其局限性,电化教学必须与传统教学优化整合,实现优势互补,才能取得最佳教学效果。%With the rapid development of information technology and the continuing deepening of education reform,it has become an inexorable trend to achieve the goal of Great-leap-forward development of China's education through education moderniz-ation driven by informationization.As a carrier of information and knowledge,audio-visual teaching has several advantages which acclimatize to the requirements of modern learners for active exploration and lifelong learning.However,audio-visual teaching is a just an aid with its limitations.To achieve the best teaching effects,audio-visual teaching should be integrated with traditional teaching to complement one another.

  13. Colliding Cues in Word Segmentation: The Role of Cue Strength and General Cognitive Processes

    Science.gov (United States)

    Weiss, Daniel J.; Gerfen, Chip; Mitchel, Aaron D.

    2010-01-01

    The process of word segmentation is flexible, with many strategies potentially available to learners. This experiment explores how segmentation cues interact, and whether successful resolution of cue competition is related to general executive functioning. Participants listened to artificial speech streams that contained both statistical and…

  14. 以情境构建为主线的高职英语视听说课堂教学模式探究%Higher vocational English audio-visual classroom teaching mode, situation building as the main line

    Institute of Scientific and Technical Information of China (English)

    张蕾; 肖建云

    2015-01-01

    This topic research takes constructivism and situated cognition theory as the basis, through the teaching practice, and connecting with the interview and questionnaire survey from the teaching content, teaching process, to achieve the goals of teaching from three aspects discussed in the higher vocational English audio-visual classroom teaching, make the situation to build through visual, listening, said trinity, the skill training of higher vocational English audio-visual classroom teaching to achieve the desired goal, and effectively improve students' English communication ability.%本课题研究以建构主义情境论和情境认知论为依据,通过教学实践,并结合访谈及问卷调查从教学内容,教学实施过程,教学目标的实现三个方面探讨了在高职英语视听说课堂教学中,使情境构建贯穿于视,听,说三位一体的技能训练中,从而达到高职英语视听说课堂教学的预期目标,并使学生的英语交际能力切实地得到提高。

  15. 网络资源辅助高职英语视听说教学的应用研究%Application and Research of Network Resources for the English Audio-visual Course Auxiliary Teaching in Higher Vocational Colleges

    Institute of Scientific and Technical Information of China (English)

    孙敏

    2016-01-01

    By exploring auxiliary teaching for English audio-visual course in higher vocational education under the applica-tion of network resources, reforming and improving the teaching mode, with the organic combination of network resources and teaching process, with the breakthrough in teaching activity space and time limit, laying emphasis on abilities training for stu-dents' autonomous learning, cooperative learning and inquiry-based learning, paying attention to the coordinated development of students' English audio-visual skills, we definitely get the improvements on the teaching quality and efficiency.%探索应用网络资源辅助高职英语视听说教学,网络资源与教学过程有机融合,改善教学方式,突破教学活动时空限制,提升教学效率与质量,注重培养学生学习兴趣和自主学习能力,注重学生英语视听说技能的协调发展。

  16. The Application of Audio-visual Media in Junior High School English Teaching%关于初中英语教学中电教手段的应用

    Institute of Scientific and Technical Information of China (English)

    江介香

    2012-01-01

    In junior high school English teaching, applying audio-visual media can motivate students' English learning interest. As a teaching aid, the application of audio-visual media in English classroom is the supplement and development of English classroom teaching, and it is helpful to the improvement of classroom teaching effect and it is of important meaning in cultivating students' comprehensive applying ability.%在初中英语教学中,运用电教手段可以激发学生英语学习的兴趣。作为一种辅助教学手段,电教手段运用于英语课堂中,是对英语课堂教学的补充和发展,有利于提高整体课堂教学效率,对于培养学生英语综合应用能力有着十分重要的意义。

  17. The Schema Features and Aesthetic Functions of the Foreign Language Teaching with Electric Audio-visual Aids%外语电化教学的图式特征与美育功能

    Institute of Scientific and Technical Information of China (English)

    齐欣

    2015-01-01

    外语电化教学对传统外语教学模式提出挑战的同时,其自身也面临着诸多的挑战,需要更多的理论支撑和功能研究。基于图式理论和美育教育,对外语电化教学图式特征及其隐性、感性、个性三种美育功能的创新审视,进一步丰富了外语电化教学的理论基础,并强调了其美育功能实现的必要性。%While the foreign language teaching with electric audio-visual aids brings about challenges to the traditional language teaching,it is also faced with many challenges,and more studies on its theoretical basis and functions are encouraged. On the basis of Schema Theory and aesthetic education,this paper makes an innovative examination of the schema features of foreign language teaching with electric audio-visual aids and its implicit,emotional,and personalized aesthetic functions,further enriches its theoretical basis and emphasizes the necessity of achieving its aesthetic functions.

  18. 基于网络教学平台的日语视听课策略%Strategy of Japanese Audio-visual Lesson Based on Network Teaching Platform

    Institute of Scientific and Technical Information of China (English)

    梁暹

    2014-01-01

    现代科学技术的发展给语言课教学带来了革命性的变化。电脑的迅速发展、更新也导致其技术被用于现代语言教学中。网络的出现以及迅猛发展更是使教学发生了巨大的变化。基于这种形式下的高级日语视听课的教学策略也相应要适应形势的变化和发展。本文则探讨在网络环境下,如何应用网络教学平台改变日语视听课的教学策略。%The development of modern science and technology has brought a revolutionary change to language teaching. The rapid development of computer technology update also led to its being used in modern language teaching. Advent of the Internet and the rapid development is to make teaching has undergone tremendous changes. Advanced Japanese audio-visual teaching strategies based on lessons under this form corresponding to adapt to changes and developments in the situation. This article will explore the teaching strategies in the network environment, how to apply network teaching platform to change Japanese audio-visual course.

  19. New developments in speech pattern element hearing aids for the profoundly deaf.

    Science.gov (United States)

    Faulkner, A; Walliker, J R; Howard, I S; Ball, V; Fourcin, A J

    1993-01-01

    Two new developments in speech pattern processing hearing aids will be described. The first development is the use of compound speech pattern coding. Speech information which is invisible to the lipreader was encoded in terms of three acoustic speech factors; the voice fundamental frequency pattern, coded as a sinusoid, the presence of aperiodic excitation, coded as a low-frequency noise, and the wide-band amplitude envelope, coded by amplitude modulation of the sinusoid and noise signals. Each element of the compound stimulus was individually matched in frequency and intensity to the listener's receptive range. Audio-visual speech receptive assessments in five profoundly hearing-impaired listeners were performed to examine the contributions of adding voiceless and amplitude information to the voice fundamental frequency pattern, and to compare these codings to amplified speech. In both consonant recognition and connected discourse tracking (CDT), all five subjects showed an advantage from the addition of amplitude information to the fundamental frequency pattern. In consonant identification, all five subjects showed further improvements in performance when voiceless speech excitation was additionally encoded together with amplitude information, but this effect was not found in CDT. The addition of voiceless information to voice fundamental frequency information did not improve performance in the absence of amplitude information. Three of the subjects performed significantly better in at least one of the compound speech pattern conditions than with amplified speech, while the other two performed similarly with amplified speech and the best compound speech pattern condition. The three speech pattern elements encoded here may represent a near-optimal basis for an acoustic aid to lipreading for this group of listeners. The second development is the use of a trained multi-layer-perceptron (MLP) pattern classification algorithm as the basis for a robust real-time voice

  20. Congruent and Incongruent Cues in Highly Familiar Audiovisual Action Sequences: An ERP Study

    Directory of Open Access Journals (Sweden)

    SM Wuerger

    2012-07-01

    Full Text Available In a previous fMRI study we found significant differences in BOLD responses for congruent and incongruent semantic audio-visual action sequences (whole-body actions and speech actions in bilateral pSTS, left SMA, left IFG, and IPL (Meyer, Greenlee, & Wuerger, JOCN, 2011. Here, we present results from a 128-channel ERP study that examined the time-course of these interactions using a one-back task. ERPs in response to congruent and incongruent audio-visual actions were compared to identify regions and latencies of differences. Responses to congruent and incongruent stimuli differed between 240–280 ms, 340–420 ms, and 460–660 ms after stimulus onset. A dipole analysis revealed that the difference around 250 ms can be partly explained by a modulation of sources in the vicinity of the superior temporal area, while the responses after 400 ms are consistent with sources in inferior frontal areas. Our results are in line with a model that postulates early recognition of congruent audiovisual actions in the pSTS, perhaps as a sensory memory buffer, and a later role of the IFG, perhaps in a generative capacity, in reconciling incongruent signals.

  1. The effect of audio- visual segregation on the sleep disorder of patients who Check- in ICU%视听觉隔离对ICU患者睡眠障碍的疗效观察

    Institute of Scientific and Technical Information of China (English)

    解军丽; 刁井地; 马昭君; 冯伟龙; 冯伟生

    2011-01-01

    Objective To investigate the effect of audio - visual segregation , a simply nursing methods, on the sleep quantity, quality and structure of patients who Check - in ICU. Methods 75 cases selected patients were randomly divided into observation, through muscle tension, Pittsburgh sleep quality index scale and EEG monitoring three kinds of methods, audio-visual isolated group and control group observation and statistics processing. Results between the two groups, the sleep quality and sleep time has a significant difference (P<0.01), sleep structure of audio - visual segregation group maintained at normal level, but in control group, the period NREM3-4 and REM significant reduced, and there was a significant difference in two groups. Conclusion The method of physical isolation guarantees the quality of sleep ICU patients with normal structure effect significantly, popularization.%目的 研究单纯的视听隔离护理方法对ICU患者睡眠的量、质和结构的影响.方法 对75例入选患者随机分组后,通过肌张力观察法、匹兹堡睡眠质量指数量表和脑电图监测3种方法,对视听隔离组(视听隔离组)和对照组观察观察结果进行统计学处理.结果 两组间睡眠时间、睡眠质量有统计学差异(P<0.01),睡眠结构视听隔离组保持了正常的分期结构,对照组快波睡眠(REM)和慢波睡眠(NREM)3~4期有显著性减少,和视听隔离组有显著性差异.结论 物理隔离的方法在保证ICU患者睡眠的质量和结构方面效果显著,有着较大的应用价值.

  2. Word segmentation with universal prosodic cues.

    Science.gov (United States)

    Endress, Ansgar D; Hauser, Marc D

    2010-09-01

    When listening to speech from one's native language, words seem to be well separated from one another, like beads on a string. When listening to a foreign language, in contrast, words seem almost impossible to extract, as if there was only one bead on the same string. This contrast reveals that there are language-specific cues to segmentation. The puzzle, however, is that infants must be endowed with a language-independent mechanism for segmentation, as they ultimately solve the segmentation problem for any native language. Here, we approach the acquisition problem by asking whether there are language-independent cues to segmentation that might be available to even adult learners who have already acquired a native language. We show that adult learners recognize words in connected speech when only prosodic cues to word-boundaries are given from languages unfamiliar to the participants. In both artificial and natural speech, adult English speakers, with no prior exposure to the test languages, readily recognized words in natural languages with critically different prosodic patterns, including French, Turkish and Hungarian. We suggest that, even though languages differ in their sound structures, they carry universal prosodic characteristics. Further, these language-invariant prosodic cues provide a universally accessible mechanism for finding words in connected speech. These cues may enable infants to start acquiring words in any language even before they are fine-tuned to the sound structure of their native language.

  3. On the English Films and TV Programs in English Audio -visual Class%浅析英文影视材料在英语视听说教学中的应用

    Institute of Scientific and Technical Information of China (English)

    邓丽娟

    2011-01-01

    The application of some films and TV programs in English audio - visual class is a popular and effective method in ELT in China. This paper mainly points out the advantages of this method in teaching listening and some suggestions on films and TV programs%采用一些原版英文影视材料进行狈。听说微学,是目前视听说课程中经常使用而且行之有效的一种方法。本文论述了这种方法在英语视听说教学中的优势地位,并重点对电影的选择与运用英文电影的教学设计提出了建议。

  4. 独立学院英语视听说课堂教学模式探索%An Inquiry into the Model of English Audio-Visual Classroom Teaching in Independent College

    Institute of Scientific and Technical Information of China (English)

    刘艳明; 张新坤

    2011-01-01

    本文基于独立学院英语专业学生的特点和英语视听说课程的教学现状,在人本主义和建构主义指导下,构建了一个多元化、个性化、协作化的英语视听说课堂教学模式,并通过具体的教案设计分析了其在教学中的实际应用。%Based on English majors' characteristics and the present situation of English audio-visual lesson in Independent college,this paper puts forward a diversified,personalized and collaborative classroom teaching model and applies it to actual teaching design with the guide of Humanism and Constructivism.

  5. 鼻咽癌放疗患者康复视听教材的制作与应用%Development and application of audio-visual materials in radiotherapy patients with nasopharyngeal carcinoma

    Institute of Scientific and Technical Information of China (English)

    潘海卿; 席淑新; 吴沛霞; 叶向红; 王苏丹

    2015-01-01

    Objective To develop audio-visual materials and confirm their effect in patients with nasopharyngeal carcinoma receiving radiotherapy. Methods Audio-visual development were produced based on relevant literature, professional demonstration, and digital video. A total of 84 patients with nasopharyngeal carcinoma were selected from Jinhua Hospital of Zhejiang University and divided into control group (n=42) and intervention group (n=42) according to admission time. The patients of control group received usual one-to-one healthcare education. The patients of intervention group received audio-visual materials systematically for imitation excise with professional guidance. The compliance of rehabilitation exercise and patients′ satisfaction on nursing service were compared between two groups. Results The scores of compliance in the intervention group at 1 month after discharge and at 3 months after discharge ware higher than those of the control group (P<0. 05). There was significant difference between the intervention group and the control group in patients′satisfaction (P<0. 01). Conclusions Self-made audio-visual materials are intuitionistic, iconic, straightaway, imitable, receivable, and they can improve patients′ compliance on rehabilitation exercise and satisfaction on nursing service effectively.%目的:探讨康复视听教材的制作及其在鼻咽癌放疗患者中的应用效果。方法参考相关文献编写康复方案,经专人演示康复锻炼的动作和数码录像制作视听教材DV。选择在浙江大学金华医院放疗科行根治性放疗的84例鼻咽癌患者,按住院时间先后分为对照组和干预组各42例。对照组采用传统一对一口头健康宣教方法,干预组采用康复锻炼视听教材光盘的系统播放并指导患者模仿锻炼方法进行健康宣教,比较两种健康教育方法实施后两组患者康复锻炼依从性情况和患者对护理服务满意度。结果干预组出院后1,3个月依

  6. Semantic Indexing of Multimedia Content Using Visual, Audio, and Text Cues

    Directory of Open Access Journals (Sweden)

    W. H. Adams

    2003-02-01

    Full Text Available We present a learning-based approach to the semantic indexing of multimedia content using cues derived from audio, visual, and text features. We approach the problem by developing a set of statistical models for a predefined lexicon. Novel concepts are then mapped in terms of the concepts in the lexicon. To achieve robust detection of concepts, we exploit features from multiple modalities, namely, audio, video, and text. Concept representations are modeled using Gaussian mixture models (GMM, hidden Markov models (HMM, and support vector machines (SVM. Models such as Bayesian networks and SVMs are used in a late-fusion approach to model concepts that are not explicitly modeled in terms of features. Our experiments indicate promise in the proposed classification and fusion methodologies: our proposed fusion scheme achieves more than 10% relative improvement over the best unimodal concept detector.

  7. Automatic discrimination between laughter and speech

    NARCIS (Netherlands)

    Truong, K.; Leeuwen, D. van

    2007-01-01

    Emotions can be recognized by audible paralinguistic cues in speech. By detecting these paralinguistic cues that can consist of laughter, a trembling voice, coughs, changes in the intonation contour etc., information about the speaker’s state and emotion can be revealed. This paper describes the dev

  8. Real-time speech-driven animation of expressive talking faces

    Science.gov (United States)

    Liu, Jia; You, Mingyu; Chen, Chun; Song, Mingli

    2011-05-01

    In this paper, we present a real-time facial animation system in which speech drives mouth movements and facial expressions synchronously. Considering five basic emotions, a hierarchical structure with an upper layer of emotion classification is established. Based on the recognized emotion label, the under-layer classification at sub-phonemic level has been modelled on the relationship between acoustic features of frames and audio labels in phonemes. Using certain constraint, the predicted emotion labels of speech are adjusted to gain the facial expression labels which are combined with sub-phonemic labels. The combinations are mapped into facial action units (FAUs), and audio-visual synchronized animation with mouth movements and facial expressions is generated by morphing between FAUs. The experimental results demonstrate that the two-layer structure succeeds in both emotion and sub-phonemic classifications, and the synthesized facial sequences reach a comparative convincing quality.

  9. A speech reception in noise test for preschool children (the Galker-test)

    DEFF Research Database (Denmark)

    Lauritsen, Maj-Britt Glenn; Kreiner, Svend; Söderström, Margareta;

    2015-01-01

    Purpose: This study evaluates initial validity and reliability of the “Galker test of speech reception in noise” developed for Danish preschool children suspected to have problems with hearing or understanding speech against strict psychometric standards and assesses acceptance by the children....... Methods:The Galker test is an audio-visual, computerised, word discrimination test in background noise, originally comprised of 50 word pairs. Three hundred and eighty eight children attending ordinary day care centres and aged 3–5 years were included. With multiple regression and the Rasch item response...... model it was examined whether the total score of the Galker test validly reflected item responses across subgroups defined by sex, age, bilingualism, tympanometry, audiometry and verbal comprehension. Results: A total of 370 children (95%) accepted testing and 339 (87%) completed all 50 items...

  10. Temporal visual cues aid speech recognition

    DEFF Research Database (Denmark)

    Zhou, Xiang; Ross, Lars; Lehn-Schiøler, Tue;

    2006-01-01

    BACKGROUND: It is well known that under noisy conditions, viewing a speaker's articulatory movement aids the recognition of spoken words. Conventionally it is thought that the visual input disambiguates otherwise confusing auditory input. HYPOTHESIS: In contrast we hypothesize that it is the temp......BACKGROUND: It is well known that under noisy conditions, viewing a speaker's articulatory movement aids the recognition of spoken words. Conventionally it is thought that the visual input disambiguates otherwise confusing auditory input. HYPOTHESIS: In contrast we hypothesize...... that it is the temporal synchronicity of the visual input that aids parsing of the auditory stream. More specifically, we expected that purely temporal information, which does not convey information such as place of articulation may facility word recognition. METHODS: To test this prediction we used temporal features...... of audio to generate an artificial talking-face video and measured word recognition performance on simple monosyllabic words. RESULTS: When presenting words together with the artificial video we find that word recognition is improved over purely auditory presentation. The effect is significant (p...

  11. 构建互联网视听节目集中监管平台的探索和设想%Exploration and Imagination of Building Internet Audio -visual Programs Regulatory Platform

    Institute of Scientific and Technical Information of China (English)

    钱卫; 朱磊

    2011-01-01

    With the development of Internet technology, the Internet audio - visual means of communication programs, content and format are changed dramatically. Because of the composition of monitoring system construction for different means of communication, the workers timely constructed the corresponding special control system. But multiple function - specific monitoring systems to use and interconnection of information caused much inconvenience, resulting in low efficiency; information resources can not fully play its role. This paper focus on audio - visual programs for radio and television system monitoring system problems and lack of global, systematic solutions present situation of the Internet audio and video programs to build an integrated platform for centralized monitoring, mainly building "Interuet audio program information resources", and " concentrate on monitoring data processing system "and" regulatory systems business to focus on "two systems, audio and video programs to achieve the kinds of massive data analysis and processing faster and more accurate grasp of the various types of audio - visual show the number of network communication, dynamic, range, impactsituation, a comprehensive analysis of public opinion to enhance and improve the efficiency of supervision.%随着互联网技术的发展,互联网视听节目的传播手段、内容、形式也发生了巨大变化。在监管系统建设构成中针对不同的传播方式,适时建设相应的专项监管系统,多个功能专一的监管系统,给使用和信息的互联互通带来很多不便,造成工作效率低,信息资源不能充分发挥作用。针对广电系统内视听节目监管系统存在的问题以及缺乏全局的、系统化的解决方案的现状,提出构建互联网视听节目综合的集中监管平台,主要是建设“互联网视听节目信息资源库”,以及“监管数据集中分析处理系统”和“监管业务集中处理

  12. Application of audio visual education in health education for elderly patients with chronic hepatitis B%电化教育在老年慢性乙型肝炎病人健康教育中的应用

    Institute of Scientific and Technical Information of China (English)

    杨茜; 李雨昕; 黄艳芳; 陈燕华

    2016-01-01

    [目的]评价电化教育在老年慢性乙型肝炎病人健康教育中的应用效果。[方法]将112例老年慢性乙型肝炎病人按随机数字表法分为观察组和对照组,每组56例,对照组采取常规口头健康教育,观察组在此基础上开展电化教育,健康教育后分别对两组病人疾病知识掌握程度、肝功能及生活质量进行比较。[结果]健康教育后,两观察组病人疾病知识掌握程度高于对照组、肝功能优于对照组、生活质量评分显著高于对照组,经比较差异均有统计学意义(P <0.01或 P <0.05)。[结论]采用电化教育方式对老年乙型肝炎病人实施健康教育,能使病人主动参与学习,提高病人对疾病知识的掌握程度,提高病人治疗依从性,改善肝功能,提高老年乙型肝炎病人的生活质量。%Abstracct Objective:To evaluate the application effect of audio visual education in health education for elderly patients with chronic hepatitis B.Methods:A total of 1 12 cases of elderly patients with chronic hepatitis B were randomly divided into observation group and control group based on random number table,56 cases in each.The patients in control group were given routine oral health education,and the patients in observation group received audio visual education and routine oral health education.After the health education,the mastering of disease knowledge,liver function and quality of life were compared between both groups.Results:After health educa-tion,the mastering of disease knowledge in observation group was higher than that in control group,liver func-tion was better than that in control group,and the quality of life score was significantly higher than that in con-trol group.The differences were statistically significant (P <0.01 or P <0.05 ).Conclusion:Implementing the health education with audio visual education for elderly patients with hepatitis B could make the patients ac-tively participate in learning

  13. 教师在基于网络的大学英语视听说教学中的角色定位%On Roles of Teachers in Web-Based College English Audio-Visual and Speaking Teaching

    Institute of Scientific and Technical Information of China (English)

    王辰晖; 杨贤玉

    2012-01-01

    The importance of listening and speaking skills was highlighted in the 2007 version of "College English Curriculum Requirements" released by Ministry of Education,and a teaching mode based on computer and web was also required to be employed by all universities and colleges.For most college English teachers,the teaching mode of web-based college English audio-visual speaking course is the realization of an emerging teaching philosophy.Teachers are supposed to understand their roles in this teaching mode correctly.At the beginning of the college English teaching reform,many English teachers had a misunderstanding of the roles of teacher.Some stuck to the traditional roles,some denied the function of teaching in this new mode totally.In the teaching mode of web-based college English audio-visual speaking course,the roles of teachers are more functional and up-to-date.Teachers are the administrator of the teaching network,the designer of teaching content,the study collaborator of students,the participant in the evaluation and the trainer of language skills.The new roles of teachers require a higher quality of future college English teachers as well.A correct understanding of the roles of teachers in the web-based college English audio-visual and speaking teaching will help to serve the teaching activities and improve the outcomes of teaching in a better and practical way.%2007年教育部发布的《大学英语课程教学要求》突出强调了听说能力的重要性,并要求各高校采用基于计算机和网络的教学模式。在基于网络的大学英语视听说课程的教学模式中,教师的角色与功能更具有时代性与功能性,教师是教学网络的管理者,教学内容的设计者,学生学习的合作者,教学评估的参与者和语言技能的培养者。新的教师角色定位,同时对未来的大学英语教师所应具备的素质也提出了更高的要求。只有正确看待教师在基于网络的大学英语视听说教学

  14. Speech Development

    Science.gov (United States)

    ... Spotlight Fundraising Ideas Vehicle Donation Volunteer Efforts Speech Development skip to submenu Parents & Individuals Information for Parents & Individuals Speech Development To download the PDF version of this factsheet, ...

  15. Research on the application of audio-visual-oral cognitive paradigm under net connectivism%网络连接主义视阈下的视听说认知范式

    Institute of Scientific and Technical Information of China (English)

    石小娟

    2011-01-01

    Connectivism offers a new theoretical perspective and constructive framework for studying audio-visual-oral cognitive model. Based on the principle of node and connection as the key factors in forming knowledge network, this paper explores the construction of the interactive information flow of input and output as well as the multi-dimensional network of static and dynamic node resources. Studies show that connectivism can effectively promote learners to connect various internal cognitive nodes and external resources to form an integral cognitive network. The application of web technologies also gives a solid support to multiple assessments of the learning process. Audio-visual-oral cognitive approach under connectivism will provide an applicable language cognitive paradigm featuring that language learning can be autonomous, interactive, connective and communicative in information age.%网络连接主义为英语视听说认知模式研究提供了新的理论阐释视角和构建框架。基于节点和连接是形成知识网络要素的观点,探讨了语言输入一输出互动信息流系统和静态一动态多维节点资源网络系统的构建。研究表明,连接主义能有效地促进学习者连接内部认知节点和外部资源形成整体认知网络,同时网络技术应用为多元化评价学习过程提供了有力支撑,连接主义下的视听说认知模式体现了语言学习自主、互动、连接、交流的特征,将为信息时代语言认知提供一个适用范式。

  16. Promoting smoke-free homes: a novel behavioral intervention using real-time audio-visual feedback on airborne particle levels.

    Directory of Open Access Journals (Sweden)

    Neil E Klepeis

    Full Text Available Interventions are needed to protect the health of children who live with smokers. We pilot-tested a real-time intervention for promoting behavior change in homes that reduces second hand tobacco smoke (SHS levels. The intervention uses a monitor and feedback system to provide immediate auditory and visual signals triggered at defined thresholds of fine particle concentration. Dynamic graphs of real-time particle levels are also shown on a computer screen. We experimentally evaluated the system, field-tested it in homes with smokers, and conducted focus groups to obtain general opinions. Laboratory tests of the monitor demonstrated SHS sensitivity, stability, precision equivalent to at least 1 µg/m(3, and low noise. A linear relationship (R(2 = 0.98 was observed between the monitor and average SHS mass concentrations up to 150 µg/m(3. Focus groups and interviews with intervention participants showed in-home use to be acceptable and feasible. The intervention was evaluated in 3 homes with combined baseline and intervention periods lasting 9 to 15 full days. Two families modified their behavior by opening windows or doors, smoking outdoors, or smoking less. We observed evidence of lower SHS levels in these homes. The remaining household voiced reluctance to changing their smoking activity and did not exhibit lower SHS levels in main smoking areas or clear behavior change; however, family members expressed receptivity to smoking outdoors. This study established the feasibility of the real-time intervention, laying the groundwork for controlled trials with larger sample sizes. Visual and auditory cues may prompt family members to take immediate action to reduce SHS levels. Dynamic graphs of SHS levels may help families make decisions about specific mitigation approaches.

  17. Perception and the temporal properties of speech

    Science.gov (United States)

    Gordon, Peter C.

    1991-11-01

    Four experiments addressing the role of attention in phonetic perception are reported. The first experiment shows that the relative importance of two cues to the voicing distinction changes when subjects must perform an arithmetic distractor task at the same time as identifying a speech stimulus. The voice onset time cue loses phonetic significance when subjects are distracted, while the F0 onset frequency cue does not. The second experiment shows a similar pattern for two cues to the distinction between the vowels /i/ (as in 'beat') and /I/ (as in 'bit'). Together these experiments indicate that careful attention to speech perception is necessary for strong acoustic cues to achieve their full phonetic impact, while weaker acoustic cues achieve their full phonetic impact without close attention. Experiment 3 shows that this pattern is obtained when the distractor task places little demand on verbal short term memory. Experiment 4 provides a large data set for testing formal models of the role of attention in speech perception. Attention is shown to influence the signal to noise ratio in phonetic encoding. This principle is instantiated in a network model in which the role of attention is to reduce noise in the phonetic encoding of acoustic cues. Implications of this work for understanding speech perception and general theories of the role of attention in perception are discussed.

  18. Speech synthesis, speech simulation and speech science

    OpenAIRE

    Huckvale, M.

    2002-01-01

    Speech synthesis research has been transformed in recent years through the exploitation of speech corpora – both for statistical modelling and as a source of signals for concatenative synthesis. This revolution in methodology and the new techniques it brings calls into question the received wisdom that better computer voice output will come from a better understanding of how humans produce speech. This paper discusses the relationship between this new technology of simulated speech and the tr...

  19. The Empirical Research on the Results of Cultural Teaching in the Perspective of Listening and Audio-visual Reform%大学英语视听说教改视域下的文化教学研究

    Institute of Scientific and Technical Information of China (English)

    王一鸣

    2014-01-01

    无论是培养学生的实际运用能力,还是增进学生跨文化交际素质,视听说作为培养学生口语交际能力的一门课程,都具有重要的作用;该课题旨在研究文化教学在视听说课程改革中的教学效果,通过与无文化教学的传统视听说教学方式对比,研究发现融合文化因素的教学方式对学生的口语能力有更大的促进作用,促进作用主要体现在学生口语交际的流利性,准确度和地道性三个方面。%English Listening and speaking course has always been playing a significant role in cultivating students' cross cultural communicative competence as well as practical communication. This paper tends to research on the impacts of cultural teaching compared with the traditional audio-visual teaching method, which shows that the teaching method integrated with cultural fac-tors is prone to have even greater impacts on students oral English and tend to make their speaking more fluent, much more accu-rate and more idiomatic.

  20. 高职英语视听说教材建设的研究与实践%Research and Practice on the Construction of Teaching Materials for Higher Vocational English Audio-Visual-Oral Course

    Institute of Scientific and Technical Information of China (English)

    毕春意

    2014-01-01

    教学的成效首先取决于教的内容,也就是教材。教材建设是高职教学基本建设的重要组成部分,高质量的教材是不断提高教学水平、保障教学质量的基础。目前,市面上的很多视听说教材都不适合高职高专的学生使用,这很不利于教学,因此我们必须有针对性地建设真正合适的教材,才能有效地提高高职学生的英语应用交际能力。%The effectiveness of teaching depends primarily on teaching contents, that is, teaching materials. Teaching material construction is an important part of the capital construction of higher vocational education, as high-quality teaching materials are the foundation to continuously improve the teaching level and protect the teaching quality. At present, a lot of audio-visual-oral teaching materials are not suitable for higher vocational students, which is a disservice to teaching, so we must targetedly construct truly suitable teaching materials, in order to effectively improve higher vocational students' English applied and communicative competence.

  1. Discussion on the Necessity of Applying Audio-visual T-eaching to Physical Education%谈谈学生在体育教学中运用电化教学的必要性

    Institute of Scientific and Technical Information of China (English)

    黄建成

    2013-01-01

    素质教育的全面推行,为学校体育教学的发展提供了一个极好的契机。现在人们不仅把体育作为素质教育的主要内容,而且还把体育作为素质教育的重要手段。在体育课教学中电化教学手段的合理使用,有助于实现体育课的教学目的。%The comprehensive implementation of quality-oriented education has provided an excellent opportunity for school physi-cal education. Nowadays, physical education is regarded as not only a major content but also an important means of quality-ori-ented eduction. A reasonable use of audio-visual teaching means in physical education is conducive to the realization of the teach-ing objectives of physical education.

  2. Means Application and Meaning of Audio-visual Education Programme in the Education of Party School%浅谈电教手段在党校教育中的应用与意义

    Institute of Scientific and Technical Information of China (English)

    钱丽萍

    2009-01-01

    电化教学是利用现代科学技术成果,发展多种能储存、传递声像教育信息的媒体,采用先进的教学方法,控制教学过程的信息,以获得最优的教学效果.针对党校教育面对的群体的特殊性、时代性、实用性,电教手段也有它特殊的应用与意义.%It is to utilize technological achievement of modern science to teach with audiovisual aids, developing many kind can store, transmit the media of the educational information of the audiovideo, adopt the advanced teaching method, control the information of the teaching course,in order to obtain the optimum teaching result. Educate the particularity,era, practicability of the colony faced to the Party school,the audio-visual educa-tion programme means has its special application and meaning too.

  3. Literature Review of the Functions of Body Language in Audio-visual Comprehension%视听理解中体态语功用的文献综述

    Institute of Scientific and Technical Information of China (English)

    高翔

    2015-01-01

    目前,英语听力教学中,视频的有效性已经不言而喻,但如何让学习者了解和更好地使用视频资料成为我们亟待解决的问题。试通过分析视频语料,更细致地研究了视听理解中视觉信号的类别,尤其是体态语在视频理解中的功能,以期为学习者展示更全面的理解途径。%Now, English teachers have acknowledged that audio-visual materials have significant importance in listening in-structions, but how to help learners understand and use visual signals is an urgent task to do.This article analyzes a wide selection of visual materials and carefully studies various types of visual signals, especially, the functions of body language.This may help language learners fully understand the ways of listening comprehension.

  4. 基于网络平台的学生英语听说能力训练%Research on the Enhancement of Students' English Listening and Speaking Abilities via the Audio-Visual-Speaking System

    Institute of Scientific and Technical Information of China (English)

    戴圣虹

    2012-01-01

    At present, students are very eager to develop their abilities to use English in all-round ways, especially in listening and speaking. Their listening and speaking abilities are practiced via the audio- visual-speaking system of study. This paper reports a survey of non-English majors in Hefei University, from which it is known that this their autonomous study ability as approach can enhance students' listening and speaking abilities, and well.%当前,学生迫切需要提高自己的英语应用能力,尤其是听说能力。利用《大学英语学习系统》对学生进行听说训练,通过对问卷调查、听力和口语测试、网上学习记录等数据进行分析,结果显示:这种训练可以提高学生的英语听说能力,同时也能锻炼学生的自主学习能力。

  5. Exploring the Role of Brain Oscillations in Speech Perception in Noise: Intelligibility of Isochronously Retimed Speech

    Science.gov (United States)

    Aubanel, Vincent; Davis, Chris; Kim, Jeesun

    2016-01-01

    A growing body of evidence shows that brain oscillations track speech. This mechanism is thought to maximize processing efficiency by allocating resources to important speech information, effectively parsing speech into units of appropriate granularity for further decoding. However, some aspects of this mechanism remain unclear. First, while periodicity is an intrinsic property of this physiological mechanism, speech is only quasi-periodic, so it is not clear whether periodicity would present an advantage in processing. Second, it is still a matter of debate which aspect of speech triggers or maintains cortical entrainment, from bottom-up cues such as fluctuations of the amplitude envelope of speech to higher level linguistic cues such as syntactic structure. We present data from a behavioral experiment assessing the effect of isochronous retiming of speech on speech perception in noise. Two types of anchor points were defined for retiming speech, namely syllable onsets and amplitude envelope peaks. For each anchor point type, retiming was implemented at two hierarchical levels, a slow time scale around 2.5 Hz and a fast time scale around 4 Hz. Results show that while any temporal distortion resulted in reduced speech intelligibility, isochronous speech anchored to P-centers (approximated by stressed syllable vowel onsets) was significantly more intelligible than a matched anisochronous retiming, suggesting a facilitative role of periodicity defined on linguistically motivated units in processing speech in noise.

  6. Exploring the Role of Brain Oscillations in Speech Perception in Noise: Intelligibility of Isochronously Retimed Speech.

    Science.gov (United States)

    Aubanel, Vincent; Davis, Chris; Kim, Jeesun

    2016-01-01

    A growing body of evidence shows that brain oscillations track speech. This mechanism is thought to maximize processing efficiency by allocating resources to important speech information, effectively parsing speech into units of appropriate granularity for further decoding. However, some aspects of this mechanism remain unclear. First, while periodicity is an intrinsic property of this physiological mechanism, speech is only quasi-periodic, so it is not clear whether periodicity would present an advantage in processing. Second, it is still a matter of debate which aspect of speech triggers or maintains cortical entrainment, from bottom-up cues such as fluctuations of the amplitude envelope of speech to higher level linguistic cues such as syntactic structure. We present data from a behavioral experiment assessing the effect of isochronous retiming of speech on speech perception in noise. Two types of anchor points were defined for retiming speech, namely syllable onsets and amplitude envelope peaks. For each anchor point type, retiming was implemented at two hierarchical levels, a slow time scale around 2.5 Hz and a fast time scale around 4 Hz. Results show that while any temporal distortion resulted in reduced speech intelligibility, isochronous speech anchored to P-centers (approximated by stressed syllable vowel onsets) was significantly more intelligible than a matched anisochronous retiming, suggesting a facilitative role of periodicity defined on linguistically motivated units in processing speech in noise. PMID:27630552

  7. 基于慕课的高职英语翻转课堂模式探索--以高职英语视听说教学为例%Exploration of Flipped Classroom Model in Higher Vocational College English Teaching based on MOOCs:Take English Audio-Visual and Speaking Teaching in Higher Vocational ;College for Example

    Institute of Scientific and Technical Information of China (English)

    李传瑞

    2015-01-01

    In this paper, based on higher vocational English audio-visual and speaking teaching practice, the author tries to explore the application of lfipped classroom model based on MOOCs in order to improve higher vocational English learning way, improve students’ learning interests and improve the audio-visual and speaking classroom teaching efifciency.%立足高职英语视听说教学实践,探讨基于慕课的翻转课堂教学模式的应用,以期改善高职英语学习方式,提升高职生学习兴趣,提高视听说课堂教学效率。

  8. Visual Cues, Verbal Cues and Child Development

    Science.gov (United States)

    Valentini, Nadia

    2004-01-01

    In this article, the author discusses two strategies--visual cues (modeling) and verbal cues (short, accurate phrases) which are related to teaching motor skills in maximizing learning in physical education classes. Both visual and verbal cues are strong influences in facilitating and promoting day-to-day learning. Both strategies reinforce…

  9. A Brief Discussion on How to Effectively Carry out Oral English Activities in Public English Audio-Visual-Oral Classroom in Higher Vocational Colleges%浅谈如何有效开展高职公共英语视听说课堂口语活动

    Institute of Scientific and Technical Information of China (English)

    郑筱筠

    2015-01-01

    高职公共英语视听说课程旨在提高学生的英语交际水平,根据高职学生的特点,在满足高职学生需求的前提下,如何利用有限课堂时间展开一些实用、有效、操作性强的课堂口语活动是视听说教学的研究重点。本文参考建构主义学习理论,探讨了高职公共英语视听说课堂口语活动设计应遵循的原则和相关步骤。%Higher vocational public English audio-visual-oral course is aimed at improving students' English communication a-bility. In accordance with the characteristics of higher vocational students, under the premise of meeting the requirement of higher vocational students, how to use the limited classroom time to car-ry out some practical, effective and operational classroom oral English activities is the research focus of audio-visual-oral teaching. Referring to constructivism learning theory, this paper explores the principles that should be followed and related mea-sures in designing oral English activities in higher vocational public English audio-visual-oral classroom.

  10. 一种稳健的基于Visemic LDA的口形动态特征及听视觉语音识别%A Robust Dynamic Mouth Feature Based on Visemic LDA for Audio Visual Speech Recognition

    Institute of Scientific and Technical Information of China (English)

    谢磊; 付中华; 蒋冬梅; 赵荣椿; Wernet Verhelst; Hichem Sahli; Jan Conlenis

    2005-01-01

    视觉特征提取是听视觉语音识别研究的热点问题.文章引入了一种稳健的基于Visemic LDA的口形动态特征,这种特征充分考虑了发音时口形轮廓的变化及视觉Viseme划分.文章同时提出了一种利用语音识别结果进行LDA训练数据自动标注的方法.这种方法免去了繁重的人工标注工作,避免了标注错误.实验表明,将VisemicLDA视觉特征引入到听视觉语音识别中,可以大大地提高噪声条件下语音识别系统的识别率:将这种视觉特征与多数据流HMM结合之后,在信噪比为10dB的强噪声情况下,识别率仍可以达到80%以上.

  11. Can you hear me yet? An intracranial investigation of speech and non-speech audiovisual interactions in human cortex

    Science.gov (United States)

    Rhone, Ariane E.; Nourski, Kirill V.; Oya, Hiroyuki; Kawasaki, Hiroto; Howard, Matthew A.; McMurray, Bob

    2016-01-01

    In everyday conversation, viewing a talker's face can provide information about the timing and content of an upcoming speech signal, resulting in improved intelligibility. Using electrocorticography, we tested whether human auditory cortex in Heschl's gyrus (HG) and on superior temporal gyrus (STG) and motor cortex on precentral gyrus (PreC) were responsive to visual/gestural information prior to the onset of sound and whether early stages of auditory processing were sensitive to the visual content (speech syllable versus non-speech motion). Event-related band power (ERBP) in the high gamma band was content-specific prior to acoustic onset on STG and PreC, and ERBP in the beta band differed in all three areas. Following sound onset, we found with no evidence for content-specificity in HG, evidence for visual specificity in PreC, and specificity for both modalities in STG. These results support models of audio-visual processing in which sensory information is integrated in non-primary cortical areas. PMID:27182530

  12. New Requirements of Online CET4 for English Audio- Visual-Oral Course%英语四级网考对英语视听说课程的新要求

    Institute of Scientific and Technical Information of China (English)

    王红艳

    2014-01-01

    Required by the "Teaching Requirements" and "Re-form Plan for CET4 and CET6 (trial)"promulgated by the Min-istry of Education, the writer, starting with college E nglish au-dio-visual-oral course, adjusted teaching tasks and made teach-ing reforms to adapt to online CET4, so as to cultivate compre-hensive talents with relatively strong ability of oral expression. The currently new emerging network and multimedia teaching platform can be used to strengthen students' ability of online au-tonomous learning, thus integrating constructivism into computer information technology and constructing a new model of au-tonomous audio-visual-oral learning.%在国家教育部颁布的教学《要求》及《全国大学生四、六级考试改革方案(试行)》的要求下,作者从大学英语视听说课程的角度出发,调整教学任务,为适应四级网考对视听说课程做出教学改革以培养具有较强口语表达能力的综合性人才。利用当下新兴的网络及多媒体教学平台,加强学生的网络自主学习能力,使构建主义和计算机信息技术相结合,建立新的视听说自主学习模式。

  13. Prediction and constraint in audiovisual speech perception.

    Science.gov (United States)

    Peelle, Jonathan E; Sommers, Mitchell S

    2015-07-01

    During face-to-face conversational speech listeners must efficiently process a rapid and complex stream of multisensory information. Visual speech can serve as a critical complement to auditory information because it provides cues to both the timing of the incoming acoustic signal (the amplitude envelope, influencing attention and perceptual sensitivity) and its content (place and manner of articulation, constraining lexical selection). Here we review behavioral and neurophysiological evidence regarding listeners' use of visual speech information. Multisensory integration of audiovisual speech cues improves recognition accuracy, particularly for speech in noise. Even when speech is intelligible based solely on auditory information, adding visual information may reduce the cognitive demands placed on listeners through increasing the precision of prediction. Electrophysiological studies demonstrate that oscillatory cortical entrainment to speech in auditory cortex is enhanced when visual speech is present, increasing sensitivity to important acoustic cues. Neuroimaging studies also suggest increased activity in auditory cortex when congruent visual information is available, but additionally emphasize the involvement of heteromodal regions of posterior superior temporal sulcus as playing a role in integrative processing. We interpret these findings in a framework of temporally-focused lexical competition in which visual speech information affects auditory processing to increase sensitivity to acoustic information through an early integration mechanism, and a late integration stage that incorporates specific information about a speaker's articulators to constrain the number of possible candidates in a spoken utterance. Ultimately it is words compatible with both auditory and visual information that most strongly determine successful speech perception during everyday listening. Thus, audiovisual speech perception is accomplished through multiple stages of integration

  14. Speech & Language Therapy for Children and Adolescents with Down Syndrome

    Science.gov (United States)

    ... some children, the written word can provide helpful cues when using expressive language. What Are the Speech ... as providing the student with written rather than verbal instructions or including fewer items on a class ...

  15. Melodic and Rhythmic Contrasts in Emotional Speech and Music

    OpenAIRE

    Quinto, Lena; Thompson, William Forde; Keating, Felicity Louise

    2013-01-01

    Many cues convey emotion similarly in speech and music. Researchers have established that acoustic cues such as pitch height, tempo, and intensity carry important emotional information in both domains. In this investigation, we examined the emotional significance of melodic and rhythmic contrasts between successive syllables or tones in speech and music, referred to as Melodic Interval Variability (MIV) and the normalized Pairwise Variability Index (nPVI). The spoken stimuli were 96 tokens ex...

  16. Audio-Visual Classification of Sports Types

    DEFF Research Database (Denmark)

    Gade, Rikke; Abou-Zleikha, Mohamed; Christensen, Mads Græsbøll;

    2015-01-01

    In this work we propose a method for classification of sports types from combined audio and visual features ex- tracted from thermal video. From audio Mel Frequency Cepstral Coefficients (MFCC) are extracted, and PCA are applied to reduce the feature space to 10 dimensions. From the visual modality...... short trajectories are constructed to rep- resent the motion of players. From these, four motion fea- tures are extracted and combined directly with audio fea- tures for classification. A k-nearest neighbour classifier is applied for classification of 180 1-minute video sequences from three sports types...

  17. Audio-visual integration in schizophrenia

    NARCIS (Netherlands)

    Gelder, B.LM.F. de; Vroomen, J.; Annen, L.; Masthoff, E.D.M.; Hodiamont, P.P.G.

    2003-01-01

    Integration of information provided simultaneously by audition and vision was studied in a group of 18 schizophrenic patients. They were compared to a control group, consisting of 12 normal adults of comparable age and education. By administering two tasks, each focusing on one aspect of audio-visua

  18. Audio-visual integration in schizophrenia.

    NARCIS (Netherlands)

    Gelder, B. de; Vroomen, J.; Annen, L.; Masthof, E.; Hodiamont, P.P.G.

    2003-01-01

    Integration of information provided simultaneously by audition and vision was studied in a group of 18 schizophrenic patients. They were compared to a control group, consisting of 12 normal adults of comparable age and education. By administering two tasks, each focusing on one aspect of audio-visua

  19. Acoustic Markers of Prosodic Boundaries in Spanish Spontaneous Alaryngeal Speech

    Science.gov (United States)

    Cuenca, M. H.; Barrio, M. M.

    2010-01-01

    Prosodic information aids segmentation of the continuous speech signal and thereby facilitates auditory speech processing. Durational and pitch variations are prosodic cues especially necessary to convey prosodic boundaries, but alaryngeal speakers have inconsistent control over acoustic parameters such as F0 and duration, being as a result noisy…

  20. Speech recognition interference by the temporal and spectral properties of a single competing talker.

    Science.gov (United States)

    Fogerty, Daniel; Xu, Jiaqian

    2016-08-01

    This study investigated how speech recognition during speech-on-speech masking may be impaired due to the interaction between amplitude modulations of the target and competing talker. Young normal-hearing adults were tested in a competing talker paradigm where the target and/or competing talker was processed to primarily preserve amplitude modulation cues. Effects of talker sex and linguistic interference were also examined. Results suggest that performance patterns for natural speech-on-speech conditions are largely consistent with the same masking patterns observed for signals primarily limited to temporal amplitude modulations. However, results also suggest a role for spectral cues in talker segregation and linguistic competition. PMID:27586780

  1. Can you hear my age? Influences of speech rate and speech spontaneity on estimation of speaker age

    Directory of Open Access Journals (Sweden)

    Sara eWaller Skoog

    2015-07-01

    Full Text Available Cognitive hearing science is mainly about the study of how cognitive factors contribute to speech comprehension, but cognitive factors also partake in speech processing to infer non-linguistic information from speech signals, such as the intentions of the talker and the speaker’s age. Here, we report two experiments on age estimation by naïve listeners. The aim was to study how speech rate influences estimation of speaker age by comparing the speakers’ natural speech rate with increased or decreased speech rate. In Experiment 1, listeners were presented with audio samples of read speech from three different speaker age groups (young, middle aged and old adults. They estimated the speakers as younger when speech rate was faster than normal and as older when speech rate was slower than normal. This speech rate effect was slightly greater in magnitude for older (60-65 years speakers in comparison with younger (20-25 years speakers, suggesting that speech rate may gain greater importance as a perceptual age cue with increased speaker age. This pattern was more pronounced in Experiment 2, in which listeners estimated age from spontaneous speech. Faster speech rate was associated with lower age estimates, but only for older and middle aged (40-45 years speakers. Taken together, speakers of all age groups were estimated as older when speech rate decreased, except for the youngest speakers in Experiment 2. The absence of a linear speech rate effect in estimates of younger speakers, for spontaneous speech, implies that listeners use different age estimation strategies or cues (possibly vocabulary depending on the age of the speaker and the spontaneity of the speech. Potential implications for forensic investigations and other applied domains are discussed.

  2. Performance of current models of speech recognition and resulting challenges

    OpenAIRE

    Schubotz, Wiebke

    2015-01-01

    Speech is usually perceived in background noise (masker) that can severely hamper its recognition. Nevertheless, there are mechanisms that enable speech recognition even in difficult listening conditions. Some of them, such as e.g., the combination of across-frequency information or binaural cues, are studied in this dissertation. Moreover, masking aspects such as energetic, amplitude modulation or informational masking are considered. Speech recognition in complex maskers is investigated tha...

  3. Recognizing intentions in infant-directed speech: evidence for universals.

    Science.gov (United States)

    Bryant, Gregory A; Barrett, H Clark

    2007-08-01

    In all languages studied to date, distinct prosodic contours characterize different intention categories of infant-directed (ID) speech. This vocal behavior likely exists universally as a species-typical trait, but little research has examined whether listeners can accurately recognize intentions in ID speech using only vocal cues, without access to semantic information. We recorded native-English-speaking mothers producing four intention categories of utterances (prohibition, approval, comfort, and attention) as both ID and adult-directed (AD) speech, and we then presented the utterances to Shuar adults (South American hunter-horticulturalists). Shuar subjects were able to reliably distinguish ID from AD speech and were able to reliably recognize the intention categories in both types of speech, although performance was significantly better with ID speech. This is the first demonstration that adult listeners in an indigenous, nonindustrialized, and nonliterate culture can accurately infer intentions from both ID speech and AD speech in a language they do not speak. PMID:17680948

  4. Speech coding

    Energy Technology Data Exchange (ETDEWEB)

    Ravishankar, C., Hughes Network Systems, Germantown, MD

    1998-05-08

    Speech is the predominant means of communication between human beings and since the invention of the telephone by Alexander Graham Bell in 1876, speech services have remained to be the core service in almost all telecommunication systems. Original analog methods of telephony had the disadvantage of speech signal getting corrupted by noise, cross-talk and distortion Long haul transmissions which use repeaters to compensate for the loss in signal strength on transmission links also increase the associated noise and distortion. On the other hand digital transmission is relatively immune to noise, cross-talk and distortion primarily because of the capability to faithfully regenerate digital signal at each repeater purely based on a binary decision. Hence end-to-end performance of the digital link essentially becomes independent of the length and operating frequency bands of the link Hence from a transmission point of view digital transmission has been the preferred approach due to its higher immunity to noise. The need to carry digital speech became extremely important from a service provision point of view as well. Modem requirements have introduced the need for robust, flexible and secure services that can carry a multitude of signal types (such as voice, data and video) without a fundamental change in infrastructure. Such a requirement could not have been easily met without the advent of digital transmission systems, thereby requiring speech to be coded digitally. The term Speech Coding is often referred to techniques that represent or code speech signals either directly as a waveform or as a set of parameters by analyzing the speech signal. In either case, the codes are transmitted to the distant end where speech is reconstructed or synthesized using the received set of codes. A more generic term that is applicable to these techniques that is often interchangeably used with speech coding is the term voice coding. This term is more generic in the sense that the

  5. How is the McGurk effect modulated by Cued Speech in deaf and hearing adults ?

    OpenAIRE

    ClémenceBayard

    2014-01-01

    Speech perception for both hearing and deaf people involves an integrative process between auditory and lip-reading information. In order to disambiguate information from lips, manual cues from Cued Speech may be added. Cued Speech (CS) is a system of manual aids developed to help deaf people to clearly and completely understand speech visually (Cornett, 1967). Within this system, both labial and manual information, as lone input sources, remain ambiguous. Perceivers, therefore, have to combi...

  6. How is the McGurk effect modulated by Cued Speech in deaf and hearing adults?

    OpenAIRE

    Bayard, Clémence; Colin, Cécile; Leybaert, Jacqueline

    2014-01-01

    Speech perception for both hearing and deaf people involves an integrative process between auditory and lip-reading information. In order to disambiguate information from lips, manual cues from Cued Speech may be added. Cued Speech (CS) is a system of manual aids developed to help deaf people to clearly and completely understand speech visually (Cornett, 1967). Within this system, both labial and manual information, as lone input sources, remain ambiguous. Perceivers, therefore, have to combi...

  7. Relationship between perceptual learning in speech and statistical learning in younger and older adults

    OpenAIRE

    Thordis Marisa Neger; Esther Janse

    2014-01-01

    Within a few sentences, listeners learn to understand severely degraded speech such as noise-vocoded speech. However, individuals vary in the amount of such perceptual learning and it is unclear what underlies these differences. The present study investigates whether perceptual learning in speech relates to statistical learning, as sensitivity to probabilistic information may aid identification of relevant cues in novel speech input. If statistical learning and perceptual learning (partly) dr...

  8. Emotional speech processing at the intersection of prosody and semantics.

    Directory of Open Access Journals (Sweden)

    Rachel Schwartz

    Full Text Available The ability to accurately perceive emotions is crucial for effective social interaction. Many questions remain regarding how different sources of emotional cues in speech (e.g., prosody, semantic information are processed during emotional communication. Using a cross-modal emotional priming paradigm (Facial affect decision task, we compared the relative contributions of processing utterances with single-channel (prosody-only versus multi-channel (prosody and semantic cues on the perception of happy, sad, and angry emotional expressions. Our data show that emotional speech cues produce robust congruency effects on decisions about an emotionally related face target, although no processing advantage occurred when prime stimuli contained multi-channel as opposed to single-channel speech cues. Our data suggest that utterances with prosodic cues alone and utterances with combined prosody and semantic cues both activate knowledge that leads to emotional congruency (priming effects, but that the convergence of these two information sources does not always heighten access to this knowledge during emotional speech processing.

  9. Emotional speech processing at the intersection of prosody and semantics.

    Science.gov (United States)

    Schwartz, Rachel; Pell, Marc D

    2012-01-01

    The ability to accurately perceive emotions is crucial for effective social interaction. Many questions remain regarding how different sources of emotional cues in speech (e.g., prosody, semantic information) are processed during emotional communication. Using a cross-modal emotional priming paradigm (Facial affect decision task), we compared the relative contributions of processing utterances with single-channel (prosody-only) versus multi-channel (prosody and semantic) cues on the perception of happy, sad, and angry emotional expressions. Our data show that emotional speech cues produce robust congruency effects on decisions about an emotionally related face target, although no processing advantage occurred when prime stimuli contained multi-channel as opposed to single-channel speech cues. Our data suggest that utterances with prosodic cues alone and utterances with combined prosody and semantic cues both activate knowledge that leads to emotional congruency (priming) effects, but that the convergence of these two information sources does not always heighten access to this knowledge during emotional speech processing. PMID:23118868

  10. Hate speech

    Directory of Open Access Journals (Sweden)

    Anne Birgitta Nilsen

    2014-03-01

    Full Text Available The manifesto of the Norwegian terrorist Anders Behring Breivik is based on the “Eurabia” conspiracy theory. This theory is a key starting point for hate speech amongst many right-wing extremists in Europe, but also has ramifications beyond these environments. In brief, proponents of the Eurabia theory claim that Muslims are occupying Europe and destroying Western culture, with the assistance of the EU and European governments. By contrast, members of Al-Qaeda and other extreme Islamists promote the conspiracy theory “the Crusade” in their hate speech directed against the West. Proponents of the latter theory argue that the West is leading a crusade to eradicate Islam and Muslims, a crusade that is similarly facilitated by their governments. This article presents analyses of texts written by right-wing extremists and Muslim extremists in an effort to shed light on how hate speech promulgates conspiracy theories in order to spread hatred and intolerance.The aim of the article is to contribute to a more thorough understanding of hate speech’s nature by applying rhetorical analysis. Rhetorical analysis is chosen because it offers a means of understanding the persuasive power of speech. It is thus a suitable tool to describe how hate speech works to convince and persuade. The concepts from rhetorical theory used in this article are ethos, logos and pathos. The concept of ethos is used to pinpoint factors that contributed to Osama bin Laden's impact, namely factors that lent credibility to his promotion of the conspiracy theory of the Crusade. In particular, Bin Laden projected common sense, good morals and good will towards his audience. He seemed to have coherent and relevant arguments; he appeared to possess moral credibility; and his use of language demonstrated that he wanted the best for his audience.The concept of pathos is used to define hate speech, since hate speech targets its audience's emotions. In hate speech it is the

  11. Speech enhancement

    CERN Document Server

    Benesty, Jacob; Chen, Jingdong

    2006-01-01

    We live in a noisy world! In all applications (telecommunications, hands-free communications, recording, human-machine interfaces, etc.) that require at least one microphone, the signal of interest is usually contaminated by noise and reverberation. As a result, the microphone signal has to be ""cleaned"" with digital signal processing tools before it is played out, transmitted, or stored.This book is about speech enhancement. Different well-known and state-of-the-art methods for noise reduction, with one or multiple microphones, are discussed. By speech enhancement, we mean not only noise red

  12. Application and design of audio-visual aids in stomatology teaching cariology, endodontology and operative dentistry in non-stomatology students%直观教学法在非口腔医学专业医学生牙体牙髓病教学中的设计与应用

    Institute of Scientific and Technical Information of China (English)

    倪雪岩; 吕亚林; 曹莹; 臧滔; 董坚; 丁芳; 李若萱

    2014-01-01

    Objective To evaluate the effects of audio-visual aids on stomatology teaching cariology , end-odontology and operative dentistry among non-stomatology students .Methods Totally 77 students from 2010-2011 matriculating classes of the Preventive Medicine Department of Capital Medical University were selected .Di-versified audio-visual aids were used comprehensively in teaching .An examination of theory and a follow-up survey were carried out and analyzed to obtain the feedback of the combined teaching methods .Results The students had better theoretical knowledge of endodontics; mean score was 24.2 ±1.1; questionnaire survey showed that 89.6%(69/77) of students had positive attitude towards the improvement of teaching method .90.9% of the students (70/77) that had audio-visual aids in stomatology teaching had good learning ability .Conclusions Ap-plication of audio-visual aids for stomatology teaching increases the interest in learning and improves the teaching effect.However, the integration should be carefully prepared in combination with cross teaching method and elicit -ation pedagogy in order to accomplish optimistic teaching results .%目的:评价在非口腔医学专业医学生牙体牙髓病教学中设计并实施口腔直观教学法的教学效果。方法以首都医科大学2010、2011级预防医学专业77名学生作为研究对象,授课时综合运用多种直观教学方式与手段,教学结束后,采用理论考核和问卷调查方式评价教学效果,分析学生对口腔直观教学法的评价。结果学生对牙体牙髓病学理论知识掌握较好,平均分为(24.2±1.1)分,问卷调查结果显示,89.6%(69/77)的学生对直观教学法给予肯定。90.9%(70/77)的学生认为应用直观教学法提高了学习能力。结论直观教学法的应用,增强了学习兴趣,提高了教学效果。直观教学法适用于牙体牙髓病学教学,但需要精心设计,将直观教学

  13. Speech Intelligibility

    Science.gov (United States)

    Brand, Thomas

    Speech intelligibility (SI) is important for different fields of research, engineering and diagnostics in order to quantify very different phenomena like the quality of recordings, communication and playback devices, the reverberation of auditoria, characteristics of hearing impairment, benefit using hearing aids or combinations of these things.

  14. Alerting prefixes for speech warning messages. [in helicopters

    Science.gov (United States)

    Bucher, N. M.; Voorhees, J. W.; Karl, R. L.; Werner, E.

    1984-01-01

    A major question posed by the design of an integrated voice information display/warning system for next-generation helicopter cockpits is whether an alerting prefix should precede voice warning messages; if so, the characteristics desirable in such a cue must also be addressed. Attention is presently given to the results of a study which ascertained pilot response time and response accuracy to messages preceded by either neutral cues or the cognitively appropriate semantic cues. Both verbal cues and messages were spoken in direct, phoneme-synthesized speech, and a training manipulation was included to determine the extent to which previous exposure to speech thus produced facilitates these messages' comprehension. Results are discussed in terms of the importance of human factors research in cockpit display design.

  15. Silent Speech Interfaces

    OpenAIRE

    Denby, B; Schultz, T.; Honda, K.; Hueber, T.; Gilbert, J.M.; Brumberg, J.S.

    2010-01-01

    Abstract The possibility of speech processing in the absence of an intelligible acoustic signal has given rise to the idea of a `silent speech? interface, to be used as an aid for the speech handicapped, or as part of a communications system operating in silence-required or high-background-noise environments. The article first outlines the emergence of the silent speech interface from the fields of speech production, automatic speech processing, speech pathology research, and telec...

  16. Language and Speech Processing

    CERN Document Server

    Mariani, Joseph

    2008-01-01

    Speech processing addresses various scientific and technological areas. It includes speech analysis and variable rate coding, in order to store or transmit speech. It also covers speech synthesis, especially from text, speech recognition, including speaker and language identification, and spoken language understanding. This book covers the following topics: how to realize speech production and perception systems, how to synthesize and understand speech using state-of-the-art methods in signal processing, pattern recognition, stochastic modelling computational linguistics and human factor studi

  17. Reactivity to nicotine cues over repeated cue reactivity sessions.

    Science.gov (United States)

    LaRowe, Steven D; Saladin, Michael E; Carpenter, Matthew J; Upadhyaya, Himanshu P

    2007-12-01

    The present study investigated whether reactivity to nicotine-related cues would attenuate across four experimental sessions held 1 week apart. Participants were nineteen non-treatment seeking, nicotine-dependent males. Cue reactivity sessions were performed in an outpatient research center using in vivo cues consisting of standardized smoking-related paraphernalia (e.g., cigarettes) and neutral comparison paraphernalia (e.g., pencils). Craving ratings were collected before and after both cue presentations while physiological measures (heart rate, skin conductance) were collected before and during the cue presentations. Although craving levels decreased across sessions, smoking-related cues consistently evoked significantly greater increases in craving relative to neutral cues over all four experimental sessions. Skin conductance was higher in response to smoking cues, though this effect was not as robust as that observed for craving. Results suggest that, under the described experimental parameters, craving can be reliably elicited over repeated cue reactivity sessions.

  18. Sound frequency affects speech emotion perception: results from congenital amusia.

    Science.gov (United States)

    Lolli, Sydney L; Lewenstein, Ari D; Basurto, Julian; Winnik, Sean; Loui, Psyche

    2015-01-01

    Congenital amusics, or "tone-deaf" individuals, show difficulty in perceiving and producing small pitch differences. While amusia has marked effects on music perception, its impact on speech perception is less clear. Here we test the hypothesis that individual differences in pitch perception affect judgment of emotion in speech, by applying low-pass filters to spoken statements of emotional speech. A norming study was first conducted on Mechanical Turk to ensure that the intended emotions from the Macquarie Battery for Evaluation of Prosody were reliably identifiable by US English speakers. The most reliably identified emotional speech samples were used in Experiment 1, in which subjects performed a psychophysical pitch discrimination task, and an emotion identification task under low-pass and unfiltered speech conditions. Results showed a significant correlation between pitch-discrimination threshold and emotion identification accuracy for low-pass filtered speech, with amusics (defined here as those with a pitch discrimination threshold >16 Hz) performing worse than controls. This relationship with pitch discrimination was not seen in unfiltered speech conditions. Given the dissociation between low-pass filtered and unfiltered speech conditions, we inferred that amusics may be compensating for poorer pitch perception by using speech cues that are filtered out in this manipulation. To assess this potential compensation, Experiment 2 was conducted using high-pass filtered speech samples intended to isolate non-pitch cues. No significant correlation was found between pitch discrimination and emotion identification accuracy for high-pass filtered speech. Results from these experiments suggest an influence of low frequency information in identifying emotional content of speech. PMID:26441718

  19. Sound frequency affects speech emotion perception: results from congenital amusia

    Science.gov (United States)

    Lolli, Sydney L.; Lewenstein, Ari D.; Basurto, Julian; Winnik, Sean; Loui, Psyche

    2015-01-01

    Congenital amusics, or “tone-deaf” individuals, show difficulty in perceiving and producing small pitch differences. While amusia has marked effects on music perception, its impact on speech perception is less clear. Here we test the hypothesis that individual differences in pitch perception affect judgment of emotion in speech, by applying low-pass filters to spoken statements of emotional speech. A norming study was first conducted on Mechanical Turk to ensure that the intended emotions from the Macquarie Battery for Evaluation of Prosody were reliably identifiable by US English speakers. The most reliably identified emotional speech samples were used in Experiment 1, in which subjects performed a psychophysical pitch discrimination task, and an emotion identification task under low-pass and unfiltered speech conditions. Results showed a significant correlation between pitch-discrimination threshold and emotion identification accuracy for low-pass filtered speech, with amusics (defined here as those with a pitch discrimination threshold >16 Hz) performing worse than controls. This relationship with pitch discrimination was not seen in unfiltered speech conditions. Given the dissociation between low-pass filtered and unfiltered speech conditions, we inferred that amusics may be compensating for poorer pitch perception by using speech cues that are filtered out in this manipulation. To assess this potential compensation, Experiment 2 was conducted using high-pass filtered speech samples intended to isolate non-pitch cues. No significant correlation was found between pitch discrimination and emotion identification accuracy for high-pass filtered speech. Results from these experiments suggest an influence of low frequency information in identifying emotional content of speech. PMID:26441718

  20. Recognizing intentions in infant-directed speech: Evidence for universals

    OpenAIRE

    Bryant, GA; Barrett, HC

    2007-01-01

    In all languages studied to date, distinct prosodic contours characterize different intention categories of infant-directed (ID) speech. This vocal behavior likely exists universally as a species-typical trait, but little research has examined whether listeners can accurately recognize intentions in ID speech using only vocal cues, without access to semantic information. We recorded native-English-speaking mothers producing four intention categories of utterances (prohibition, approval, comfo...

  1. An Interaction Between Prosody and Statistics in the Segmentation of Fluent Speech

    Science.gov (United States)

    Shukla, Mohinish; Nespor, Marina; Mehler, Jacques

    2007-01-01

    Sensitivity to prosodic cues might be used to constrain lexical search. Indeed, the prosodic organization of speech is such that words are invariably aligned with phrasal prosodic edges, providing a cue to segmentation. In this paper we devise an experimental paradigm that allows us to investigate the interaction between statistical and prosodic…

  2. Audio-visual Feature Fusion Person Identification Based on SVM and Score Normalization%基于SVM和归一化技术的音视频特征融合身份识别

    Institute of Scientific and Technical Information of China (English)

    丁辉; 安今朝

    2012-01-01

    In order to solve the problem of low recognition rate of face recognition and speech recognition under the wicked noise conditions. Based on the studies of feature level fusion theory and combined with Normalization and SVM theory, a novel model for face features and speech features fusion recognition is presented in this paper. First, we extract the face features and speech features correspondingly, then we fuse the two features on the feature level in order to obtain the fusion feature, after the calculation of the distance between the test people and template people we normalize the matching distance so as to reduce the computational and to improve the recognition accuracy. Al the last, we put the normalization matching distance into SVM can we obtain the recognition result. Trie experiment show that the fusion system performs well both in response time and system accuracy especially in noisy background.%针对噪声环境下人脸识别率和说话人识别率低的问题,在研究特征层融合的基础上,结合归一化技术和SVM理论,提出了一种融合人脸和语音的多生物特征识别模型.首先采用离散余弦变换和局部保持投影算法提取人脸特征及SVM方法提取语音特征,在特征层进行融合得到融合特征后,计算测试身份与模板问的距离,为了减少计算量和提高识别性能,对匹配距离进行归一化处理,最后输入到SVM进行识别.仿真结果表明,在噪声环境下,当信噪比降低时,融合识别率要明显高于单个系统的识别率,达到了身份识别的目的.

  3. Can Prosody Be Used to Discover Hierarchical Structure in Continuous Speech?

    Science.gov (United States)

    Langus, Alan; Marchetto, Erika; Bion, Ricardo Augusto Hoffmann; Nespor, Marina

    2012-01-01

    We tested whether adult listeners can simultaneously keep track of variations in pitch and syllable duration in order to segment continuous speech into phrases and group these phrases into sentences. The speech stream was constructed so that prosodic cues signaled hierarchical structures (i.e., phrases embedded within sentences) and non-adjacent…

  4. A Computational Model of Word Segmentation from Continuous Speech Using Transitional Probabilities of Atomic Acoustic Events

    Science.gov (United States)

    Rasanen, Okko

    2011-01-01

    Word segmentation from continuous speech is a difficult task that is faced by human infants when they start to learn their native language. Several studies indicate that infants might use several different cues to solve this problem, including intonation, linguistic stress, and transitional probabilities between subsequent speech sounds. In this…

  5. Performance evaluation of a motor-imagery-based EEG-Brain computer interface using a combined cue with heterogeneous training data in BCI-Naive subjects

    Directory of Open Access Journals (Sweden)

    Lee Youngbum

    2011-10-01

    Full Text Available Abstract Background The subjects in EEG-Brain computer interface (BCI system experience difficulties when attempting to obtain the consistent performance of the actual movement by motor imagery alone. It is necessary to find the optimal conditions and stimuli combinations that affect the performance factors of the EEG-BCI system to guarantee equipment safety and trust through the performance evaluation of using motor imagery characteristics that can be utilized in the EEG-BCI testing environment. Methods The experiment was carried out with 10 experienced subjects and 32 naive subjects on an EEG-BCI system. There were 3 experiments: The experienced homogeneous experiment, the naive homogeneous experiment and the naive heterogeneous experiment. Each experiment was compared in terms of the six audio-visual cue combinations and consisted of 50 trials. The EEG data was classified using the least square linear classifier in case of the naive subjects through the common spatial pattern filter. The accuracy was calculated using the training and test data set. The p-value of the accuracy was obtained through the statistical significance test. Results In the case in which a naive subject was trained by a heterogeneous combined cue and tested by a visual cue, the result was not only the highest accuracy (p Conclusions We propose the use of this measuring methodology of a heterogeneous combined cue for training data and a visual cue for test data by the typical EEG-BCI algorithm on the EEG-BCI system to achieve effectiveness in terms of consistence, stability, cost, time, and resources management without the need for a trial and error process.

  6. Speech and Communication Disorders

    Science.gov (United States)

    ... or understand speech. Causes include Hearing disorders and deafness Voice problems, such as dysphonia or those caused by cleft lip or palate Speech problems like stuttering Developmental disabilities Learning disorders Autism spectrum disorder Brain injury Stroke Some speech and ...

  7. Composition: Cue Wheel

    DEFF Research Database (Denmark)

    Bergstrøm-Nielsen, Carl

    2014-01-01

    Cue Rondo is an open composition to be realised by improvising musicians. See more about my composition practise in the entry "Composition - General Introduction". This work is licensed under a Creative Commons "by-nc" License. You may for non-commercial purposes use and distribute it, performance...

  8. Multisensor image cueing (MUSIC)

    Science.gov (United States)

    Rodvold, David; Patterson, Tim J.

    2002-07-01

    There have been many years of research and development in the Automatic Target Recognition (ATR) community. This development has resulted in numerous algorithms to perform target detection automatically. The morphing of the ATR acronym to Aided Target Recognition provides a succinct commentary regarding the success of the automatic target recognition research. Now that the goal is aided recognition, many of the algorithms which were not able to provide autonomous recognition may now provide valuable assistance in cueing a human analyst where to look in the images under consideration. This paper describes the MUSIC system being developed for the US Air Force to provide multisensor image cueing. The tool works across multiple image phenomenologies and fuses the evidence across the set of available imagery. MUSIC is designed to work with a wide variety of sensors and platforms, and provide cueing to an image analyst in an information-rich environment. The paper concentrates on the current integration of algorithms into an extensible infrastructure to allow cueing in multiple image types.

  9. Review of 50 years Research About Speech Reading

    Directory of Open Access Journals (Sweden)

    Abdollah Mousavi

    2003-08-01

    Full Text Available Watching a speakers lips is like hearing speech by eye instead of by ear and markedly improves speech perception. In this review I summarise studies over the last sixty years about lip reading, it issues, methodological problems, experimental and co relational studies, issues of cerebral lateralization, localization and cognitive and neuro psychologic function. Several studies on speech reading in general suggest that hearing impaired groups actually do not possess superior speech reading skills compared to normal controls. With function magnetic resonance imaging (FMRI it was also found that the linguistic visual cues are sufficient to activate auditory cortex in the absence of auditory speech sounds. Here I presented data and arguments about all aspects of the phenomenon of lip reading and its use in rehabilitation audio logy.

  10. Visual Speech Perception in Children with Language Learning Impairments

    Science.gov (United States)

    Knowland, Victoria C. P.; Evans, Sam; Snell, Caroline; Rosen, Stuart

    2016-01-01

    Purpose: The purpose of the study was to assess the ability of children with developmental language learning impairments (LLIs) to use visual speech cues from the talking face. Method: In this cross-sectional study, 41 typically developing children (mean age: 8 years 0 months, range: 4 years 5 months to 11 years 10 months) and 27 children with…

  11. Mind your pricing cues.

    Science.gov (United States)

    Anderson, Eric; Simester, Duncan

    2003-09-01

    For most of the items they buy, consumers don't have an accurate sense of what the price should be. Ask them to guess how much a four-pack of 35-mm film costs, and you'll get a variety of wrong answers: Most people will underestimate; many will only shrug. Research shows that consumers' knowledge of the market is so far from perfect that it hardly deserves to be called knowledge at all. Yet people happily buy film and other products every day. Is this because they don't care what kind of deal they're getting? No. Remarkably, it's because they rely on retailers to tell them whether they're getting a good price. In subtle and not-so-subtle ways, retailers send signals to customers, telling them whether a given price is relatively high or low. In this article, the authors review several common pricing cues retailers use--"sale" signs, prices that end in 9, signpost items, and price-matching guarantees. They also offer some surprising facts about how--and how well--those cues work. For instance, the authors' tests with several mail-order catalogs reveal that including the word "sale" beside a price can increase demand by more than 50%. The practice of using a 9 at the end of a price to denote a bargain is so common, you'd think customers would be numb to it. Yet in a study the authors did involving a women's clothing catalog, they increased demand by a third just by changing the price of a dress from $34 to $39. Pricing cues are powerful tools for guiding customers' purchasing decisions, but they must be applied judiciously. Used inappropriately, the cues may breach customers' trust, reduce brand equity, and give rise to lawsuits. PMID:12964397

  12. Sound frequency affects speech emotion perception: Results from congenital amusia

    Directory of Open Access Journals (Sweden)

    Sydney eLolli

    2015-09-01

    Full Text Available Congenital amusics, or tone-deaf individuals, show difficulty in perceiving and producing small pitch differences. While amusia has marked effects on music perception, its impact on speech perception is less clear. Here we test the hypothesis that individual differences in pitch perception affect judgment of emotion in speech, by applying band-pass filters to spoken statements of emotional speech. A norming study was first conducted on Mechanical Turk to ensure that the intended emotions from the Macquarie Battery for Evaluation of Prosody (MBEP were reliably identifiable by US English speakers. The most reliably identified emotional speech samples were used in in Experiment 1, in which subjects performed a psychophysical pitch discrimination task, and an emotion identification task under band-pass and unfiltered speech conditions. Results showed a significant correlation between pitch discrimination threshold and emotion identification accuracy for band-pass filtered speech, with amusics (defined here as those with a pitch discrimination threshold > 16 Hz performing worse than controls. This relationship with pitch discrimination was not seen in unfiltered speech conditions. Given the dissociation between band-pass filtered and unfiltered speech conditions, we inferred that amusics may be compensating for poorer pitch perception by using speech cues that are filtered out in this manipulation.

  13. A Study between College English Autonomous Audio-visual Learning of Network and Cultivating the Ability of Listening Comprehension%大学英语网络自主视听学习与听力理解能力培养研究

    Institute of Scientific and Technical Information of China (English)

    柴春曼

    2014-01-01

    Under the network environment ,cultivating college students’ autonomous learning ability is an important subject in college English teaching reform .Language acquisition is a kind of independent learning ,so language learning must go through the acquisition process in order to achieve the pragmatic purpose ,and the foreign language acquisition cannot do without the students’ autonomous learning .In the process of independent audio-visual learning ,after trying a variety of learning strategies ,learners developed cognitive ability ,improved the scores of English listening test .%网络环境下培养大学生的自主学习能力是大学英语教学改革的一个重要课题。语言习得是一种自主的习得过程,外语学习必须经历习得的过程才能达到运用的目的,外语的习得离不开学生的自主学习。学习者在自主视听学习的过程中,尝试各种学习策略,认知能力得到发展,英语听力考试的成绩得到提高。

  14. The Empirical Study between College Students'English Autonomous Audio-visual Learning under the Network Environment and Cultivating the Ability of Listening Comprehension%网络环境下大学生英语自主视听学习与听力理解能力培养的实证研究

    Institute of Scientific and Technical Information of China (English)

    刘建国

    2013-01-01

    网络环境下培养大学生的自主学习能力是大学英语教学改革的一个重要课题。语言习得是一种自主的习得过程,外语学习必须经历习得的过程才能达到运用的目的,外语的习得离不开学生的自主学习。学习者在自主视听学习的过程中,尝试各种学习策略,认知能力得到发展,英语听力考试的成绩得到提高。%Under the network environment,cultivating college students’autonomous learning ability is an important subject in college English teaching reform.Language acquisition is a kind of in-dependent learning,so language learning must go through the acquisition process in order to achieve the pragmatic purpose,and the foreign language acquisition cannot do without the students'autono-mous learning.In the process of independent audio-visual learning,after trying a variety of learning strategies,learners developed cognitive ability,improved the scores of English listening test.

  15. Coding pitch differences in voiceless fricatives: Whispered relative to normal speech.

    Science.gov (United States)

    Heeren, Willemijn F L

    2015-12-01

    Intonation can be perceived in whispered speech despite the absence of the fundamental frequency. In the past, acoustic correlates of pitch in whisper have been sought in vowel content, but, recently, studies of normal speech demonstrated correlates of intonation in consonants as well. This study examined how consonants may contribute to the coding of intonation in whispered relative to normal speech. The acoustic characteristics of whispered, voiceless fricatives /s/ and /f/, produced at different pitch targets (low, mid, high), were investigated and compared to corresponding normal speech productions to assess if whisper contained secondary or compensatory pitch correlates. Furthermore, listener sensitivity to fricative cues to pitch in whisper was established, also relative to normal speech. Consistent with recent studies, acoustic correlates of whispered and normal speech fricatives systematically varied with pitch target. Comparable findings across speech modes showed that acoustic correlates were secondary. Discrimination of vowel-fricative-vowel stimuli was less accurate and slower in whispered than normal speech, which is attributed to differences in acoustic cues available. Perception of fricatives presented without their vowel contexts, however, revealed comparable processing speeds and response accuracies between speech modes, supporting the finding that within fricatives, acoustic correlates of pitch are similar across speech modes. PMID:26723300

  16. Modeling the Contribution of Phonotactic Cues to the Problem of Word Segmentation

    Science.gov (United States)

    Blanchard, Daniel; Heinz, Jeffrey; Golinkoff, Roberta

    2010-01-01

    How do infants find the words in the speech stream? Computational models help us understand this feat by revealing the advantages and disadvantages of different strategies that infants might use. Here, we outline a computational model of word segmentation that aims both to incorporate cues proposed by language acquisition researchers and to…

  17. Weighting of Acoustic Cues to a Manner Distinction by Children with and without Hearing Loss

    Science.gov (United States)

    Nittrouer, Susan; Lowenstein, Joanna H.

    2015-01-01

    Purpose: Children must develop optimal perceptual weighting strategies for processing speech in their first language. Hearing loss can interfere with that development, especially if cochlear implants are required. The three goals of this study were to measure, for children with and without hearing loss: (a) cue weighting for a manner distinction,…

  18. Influences of Semantic and Prosodic Cues on Word Repetition and Categorization in Autism

    Science.gov (United States)

    Singh, Leher; Harrow, MariLouise S.

    2014-01-01

    Purpose: To investigate sensitivity to prosodic and semantic cues to emotion in individuals with high-functioning autism (HFA). Method: Emotional prosody and semantics were independently manipulated to assess the relative influence of prosody versus semantics on speech processing. A sample of 10-year-old typically developing children (n = 10) and…

  19. Cross-Linguistic Differences in Prosodic Cues to Syntactic Disambiguation in German and English

    Science.gov (United States)

    O'Brien, Mary Grantham; Jackson, Carrie N.; Gardner, Christine E.

    2014-01-01

    This study examined whether late-learning English-German second language (L2) learners and late-learning German-English L2 learners use prosodic cues to disambiguate temporarily ambiguous first language and L2 sentences during speech production. Experiments 1a and 1b showed that English-German L2 learners and German-English L2 learners used a…

  20. Cueing Visual Attention to Spatial Locations With Auditory Cues

    OpenAIRE

    Kean, Matthew; Crawford, Trevor J

    2008-01-01

    We investigated exogenous and endogenous orienting of visual attention to the spatial loca-tion of an auditory cue. In Experiment 1, significantly faster saccades were observed to vis-ual targets appearing ipsilateral, compared to contralateral, to the peripherally-presented cue. This advantage was greatest in an 80% target-at-cue (TAC) condition but equivalent in 20% and 50% TAC conditions. In Experiment 2, participants maintained central fixation while making an elevation judgment of the pe...

  1. Music and speech prosody: a common rhythm.

    Science.gov (United States)

    Hausen, Maija; Torppa, Ritva; Salmela, Viljami R; Vainio, Martti; Särkämö, Teppo

    2013-01-01

    Disorders of music and speech perception, known as amusia and aphasia, have traditionally been regarded as dissociated deficits based on studies of brain damaged patients. This has been taken as evidence that music and speech are perceived by largely separate and independent networks in the brain. However, recent studies of congenital amusia have broadened this view by showing that the deficit is associated with problems in perceiving speech prosody, especially intonation and emotional prosody. In the present study the association between the perception of music and speech prosody was investigated with healthy Finnish adults (n = 61) using an on-line music perception test including the Scale subtest of Montreal Battery of Evaluation of Amusia (MBEA) and Off-Beat and Out-of-key tasks as well as a prosodic verbal task that measures the perception of word stress. Regression analyses showed that there was a clear association between prosody perception and music perception, especially in the domain of rhythm perception. This association was evident after controlling for music education, age, pitch perception, visuospatial perception, and working memory. Pitch perception was significantly associated with music perception but not with prosody perception. The association between music perception and visuospatial perception (measured using analogous tasks) was less clear. Overall, the pattern of results indicates that there is a robust link between music and speech perception and that this link can be mediated by rhythmic cues (time and stress).

  2. Music and speech prosody: A common rhythm

    Directory of Open Access Journals (Sweden)

    Maija eHausen

    2013-09-01

    Full Text Available Disorders of music and speech perception, known as amusia and aphasia, have traditionally been regarded as dissociated deficits based on studies of brain damaged patients. This has been taken as evidence that music and speech are perceived by largely separate and independent networks in the brain. However, recent studies of congenital amusia have broadened this view by showing that the deficit is associated with problems in perceiving speech prosody, especially intonation and emotional prosody. In the present study the association between the perception of music and speech prosody was investigated with healthy Finnish adults (n = 61 using an on-line music perception test including the Scale subtest of Montreal Battery of Evaluation of Amusia (MBEA and Off-Beat and Out-of-key tasks as well as a prosodic verbal task that measures the perception of word stress. Regression analyses showed that there was a clear association between prosody perception and music perception, especially in the domain of rhythm perception. This association was evident after controlling for music education, age, pitch perception, visuospatial perception and working memory. Pitch perception was significantly associated with music perception but not with prosody perception. The association between music perception and visuospatial perception (measured using analogous tasks was less clear. Overall, the pattern of results indicates that there is a robust link between music and speech perception and that this link can be mediated by rhythmic cues (time and stress.

  3. Speech Repairs, Intonational Boundaries and Discourse Markers Modeling Speakers' Utterances in Spoken Dialog

    CERN Document Server

    Heeman, P A

    1999-01-01

    In this thesis, we present a statistical language model for resolving speech repairs, intonational boundaries and discourse markers. Rather than finding the best word interpretation for an acoustic signal, we redefine the speech recognition problem to so that it also identifies the POS tags, discourse markers, speech repairs and intonational phrase endings (a major cue in determining utterance units). Adding these extra elements to the speech recognition problem actually allows it to better predict the words involved, since we are able to make use of the predictions of boundary tones, discourse markers and speech repairs to better account for what word will occur next. Furthermore, we can take advantage of acoustic information, such as silence information, which tends to co-occur with speech repairs and intonational phrase endings, that current language models can only regard as noise in the acoustic signal. The output of this language model is a much fuller account of the speaker's turn, with part-of-speech ...

  4. Criteria for public speech planning : characteristics of language learning

    Directory of Open Access Journals (Sweden)

    Tomaž Petek

    2012-12-01

    Full Text Available Public speaking is understood as monological discourse production, directed at a wider or narrower public or group of people. The theoretical part of this article introduces the characteristics of effective public speaking; criteria were designed for the preparation of a public speech, and four main sections defined, i.e. a construction of public speech (consideration of text type characteristics, appropriateness of the topic and selection of content, appropriateness of the mode of topic development, formation of a meaningful, comprehensible and integrated text; b integral mode of public speech (fluent, natural and free speaking, clear diction; c verbal language (social genre, selection of words consistent with the speech, grammatical correctness, correct pronunciation, formal constructions, formal [dynamic] accent, non-verbal language (auditory non-verbal speech cues, visual non-verbal speech cues. The fulfilment of these criteria was tested in practice, namely on second and third year undergraduate students (prospective teachers (N = 211. On the whole, all the average marks of third year students were better than those of the second year students. The most common difficulty facing the students was fluent, natural and free speaking as well as appropriate topic development, whereas the most successfully fulfilled criteria were those of appropriate topic selection and consideration of text type characteristics.

  5. Cue conflicts in context

    DEFF Research Database (Denmark)

    Boeg Thomsen, Ditte; Poulsen, Mads

    2015-01-01

    preschoolers. However, object-first clauses may be context-sensitive structures, which are infelicitous in isolation. In a second act-out study we presented OVS clauses in supportive and unsupportive discourse contexts and in isolation and found that five-to-six-year-olds’ OVS comprehension was enhanced...... in discourse-pragmatically felicitous contexts. Our results extend previous findings of preschoolers’ sensitivity to discourse-contextual cues in sentence comprehension (Hurewitz, 2001; Song & Fisher, 2005) to the basic task of assigning agent and patient roles....

  6. Sound of mind : electrophysiological and behavioural evidence for the role of context, variation and informativity in human speech processing

    NARCIS (Netherlands)

    Nixon, Jessie Sophia

    2014-01-01

    Spoken communication involves transmission of a message which takes physical form in acoustic waves. Within any given language, acoustic cues pattern in language-specific ways along language-specific acoustic dimensions to create speech sound contrasts. These cues are utilized by listeners to discri

  7. SynFace—Speech-Driven Facial Animation for Virtual Speech-Reading Support

    Directory of Open Access Journals (Sweden)

    Salvi Giampiero

    2009-01-01

    Full Text Available This paper describes SynFace, a supportive technology that aims at enhancing audio-based spoken communication in adverse acoustic conditions by providing the missing visual information in the form of an animated talking head. Firstly, we describe the system architecture, consisting of a 3D animated face model controlled from the speech input by a specifically optimised phonetic recogniser. Secondly, we report on speech intelligibility experiments with focus on multilinguality and robustness to audio quality. The system, already available for Swedish, English, and Flemish, was optimised for German and for Swedish wide-band speech quality available in TV, radio, and Internet communication. Lastly, the paper covers experiments with nonverbal motions driven from the speech signal. It is shown that turn-taking gestures can be used to affect the flow of human-human dialogues. We have focused specifically on two categories of cues that may be extracted from the acoustic signal: prominence/emphasis and interactional cues (turn-taking/back-channelling.

  8. Speech recognition: Acoustic phonetic and lexical knowledge representation

    Science.gov (United States)

    Zue, V. W.

    1984-02-01

    The purpose of this program is to develop a speech data base facility under which the acoustic characteristics of speech sounds in various contexts can be studied conveniently; investigate the phonological properties of a large lexicon of, say 10,000 words and determine to what extent the phonotactic constraints can be utilized in speech recognition; study the acoustic cues that are used to mark work boundaries; develop a test bed in the form of a large-vocabulary, IWR system to study the interactions of acoustic, phonetic and lexical knowledge; and develop a limited continuous speech recognition system with the goal of recognizing any English word from its spelling in order to assess the interactions of higher-level knowledge sources.

  9. Speech and Language Impairments

    Science.gov (United States)

    ... easily be mistaken for other disabilities such as autism or learning disabilities, so it’s very important to ensure that the child receives a thorough evaluation by a certified speech-language pathologist. Back to top What Causes Speech ...

  10. Frequency band-importance functions for auditory and auditory-visual speech recognition

    Science.gov (United States)

    Grant, Ken W.

    2005-04-01

    In many everyday listening environments, speech communication involves the integration of both acoustic and visual speech cues. This is especially true in noisy and reverberant environments where the speech signal is highly degraded, or when the listener has a hearing impairment. Understanding the mechanisms involved in auditory-visual integration is a primary interest of this work. Of particular interest is whether listeners are able to allocate their attention to various frequency regions of the speech signal differently under auditory-visual conditions and auditory-alone conditions. For auditory speech recognition, the most important frequency regions tend to be around 1500-3000 Hz, corresponding roughly to important acoustic cues for place of articulation. The purpose of this study is to determine the most important frequency region under auditory-visual speech conditions. Frequency band-importance functions for auditory and auditory-visual conditions were obtained by having subjects identify speech tokens under conditions where the speech-to-noise ratio of different parts of the speech spectrum is independently and randomly varied on every trial. Point biserial correlations were computed for each separate spectral region and the normalized correlations are interpreted as weights indicating the importance of each region. Relations among frequency-importance functions for auditory and auditory-visual conditions will be discussed.

  11. Speech perception as categorization

    OpenAIRE

    Holt, Lori L.; Lotto, Andrew J.

    2010-01-01

    Speech perception (SP) most commonly refers to the perceptual mapping from the highly variable acoustic speech signal to a linguistic representation, whether it be phonemes, diphones, syllables, or words. This is an example of categorization, in that potentially discriminable speech sounds are assigned to functionally equivalent classes. In this tutorial, we present some of the main challenges to our understanding of the categorization of speech sounds and the conceptualization of SP that has...

  12. Perception of aircraft Deviation Cues

    Science.gov (United States)

    Martin, Lynne; Azuma, Ronald; Fox, Jason; Verma, Savita; Lozito, Sandra

    2005-01-01

    To begin to address the need for new displays, required by a future airspace concept to support new roles that will be assigned to flight crews, a study of potentially informative display cues was undertaken. Two cues were tested on a simple plan display - aircraft trajectory and flight corridor. Of particular interest was the speed and accuracy with which participants could detect an aircraft deviating outside its flight corridor. Presence of the trajectory cue significantly reduced participant reaction time to a deviation while the flight corridor cue did not. Although non-significant, the flight corridor cue seemed to have a relationship with the accuracy of participants judgments rather than their speed. As this is the second of a series of studies, these issues will be addressed further in future studies.

  13. Infants deploy selective attention to the mouth of a talking face when learning speech.

    Science.gov (United States)

    Lewkowicz, David J; Hansen-Tift, Amy M

    2012-01-31

    The mechanisms underlying the acquisition of speech-production ability in human infancy are not well understood. We tracked 4-12-mo-old English-learning infants' and adults' eye gaze while they watched and listened to a female reciting a monologue either in their native (English) or nonnative (Spanish) language. We found that infants shifted their attention from the eyes to the mouth between 4 and 8 mo of age regardless of language and then began a shift back to the eyes at 12 mo in response to native but not nonnative speech. We posit that the first shift enables infants to gain access to redundant audiovisual speech cues that enable them to learn their native speech forms and that the second shift reflects growing native-language expertise that frees them to shift attention to the eyes to gain access to social cues. On this account, 12-mo-old infants do not shift attention to the eyes when exposed to nonnative speech because increasing native-language expertise and perceptual narrowing make it more difficult to process nonnative speech and require them to continue to access redundant audiovisual cues. Overall, the current findings demonstrate that the development of speech production capacity relies on changes in selective audiovisual attention and that this depends critically on early experience. PMID:22307596

  14. Audio-Visual Realization of Aesthetic Connotation of Shanghai Images in the Start-up of China's Films%中国电影初创期上海影像美学内涵的视听实现

    Institute of Scientific and Technical Information of China (English)

    衣凤翱

    2011-01-01

    中国电影艺术初创期上海影像美学内涵视听化为上海的政治影像、都市形象、女性形象等,都充分体现出上海影像与海派文化随社会历史的变迁,不断实现着有机的审美“耦合”。中国电影初创期上海影像的政治影像美学内涵萌芽视听实现为上海作为中国近代思想启蒙地和中国近代民族独立思想策源地。其近代都市形象美学内涵视听实现为上海影像中“罪恶”之都的符号化表达——帮会和现代都市景观显现。其女性形象美学内涵视听实现为电影中的都市娼妓和都市新女性。%In the start-up of China's film art, the aesthetic connotations of Shanghai images were audio-visually reflected inpolitical, metropolitan and female images, which suggest that Shanghai images and culture aesthetically affected each Other over the time. In this phase, Shanghai was considered to be an enlightenment place of modern China and the source of modern Chi- nese national independent thinking, as witnessed by the political aesthetic connotations of Shanghai images. An evil city, sym- bolized by gangs and modern metropolitan landscapes, was symbolic of modern Shanghai metropolitan images. Also, female image aesthetic connotation was embodied in metropolitan prostitutes and new women in films.

  15. 无经营许可销售侵权音像复制品行为的刑法适用——以犯罪对象为切人点%No business license for the sale of the infringement audio-visual copies behavior of the Criminal Law applying --the object of crime as the starting point

    Institute of Scientific and Technical Information of China (English)

    王志

    2012-01-01

    实务界和学术界对无经营许可销售侵权音像复制品行为的刑法适用都有不同的认识,主要争议在于销售侵权复制品罪与非法经营罪之间是否存在竞合关系。以及属于何种竞合形态。从司法解释来看,侵权音像复制品是销售侵权复制品罪的犯罪对象,并为非法经营罪所排斥。销售侵权复制品罪与非法经营罪的保护客体为排斥关系。销售侵权复制品罪与非法经营罪不具有竞合关系。该行为只能以销售侵权复制品罪定罪量刑。。%It is different about no business license for the sale of the infringement audio-visual copies behavior of the Criminal Law applying in criminal law theorists and Jude, the main controversy is that the crime of sale infringing copies and crime of illegal business have concurrence, and belong to what kind of concurrence model. Basis judicial interpretation infringing copies is criminal object of the crime of sale infringing copies, isn'tthe crime of illegal business. The relation of the object of protection of the crime of sale infringing copies and the crime of illegal business is exclusion. There isn't overlap of law between criminal object of the crime of sale infringing copies and the crime of illegal business. This behavior can only be convicted and sentenced by the crime of sale infringing copies.

  16. 城市综合大学项目融入式“英语视听说”课程教学模式探讨%On the Teaching Mode of Project - oriented English Audio - visual Course in Comprehensive City University

    Institute of Scientific and Technical Information of China (English)

    李萍

    2012-01-01

    经过定性与定量研究,课题组提出与城市综合大学城市应用人才培养定位相匹配的项目融入式“英语视听说”课程教学模式。将课程教学模式从知识传授的教师中心模式转化为知识技能掌握并重的学生中心模式,突出教师引导下的学生自我经历、自我发现、自我反思和自我构建。强调通过“行而知、融项目、强能力、重外传”,提高学生为城市经济文化发展提供国际交流与传播的英语应用服务交际能力。%Through a number of qualitative and quantitative researches, researchers have put forward the project - integrated teaching mode for English audio - visual course in comprehensive city university, which meets the demand of the cultivation of the talents. The new mode switches the traditional knowledge- focused and teacher- centered teaching modes into a knowledge- skill- focused and student- centered teaching mode. It lays great emphasis on teacher- guided self- experience, self- discovery, self- reflection and self- construction by the stu- dents. It is hoped that through "learning by doing, participating in doing course - integrated projects, cultivating practical abilities, giving priority to international communication ability", students are able to benefit from this course and gain the needed communication competence to provide service for urban economic and cultural development as well as international exchange and communication.

  17. Speech-Language Pathologists

    Science.gov (United States)

    ... INDEX | OOH SITE MAP | EN ESPAÑOL Healthcare > Speech-Language Pathologists PRINTER-FRIENDLY EN ESPAÑOL Summary What They ... workers and occupations. What They Do -> What Speech-Language Pathologists Do About this section Speech-language pathologists ...

  18. Talking Speech Input.

    Science.gov (United States)

    Berliss-Vincent, Jane; Whitford, Gigi

    2002-01-01

    This article presents both the factors involved in successful speech input use and the potential barriers that may suggest that other access technologies could be more appropriate for a given individual. Speech input options that are available are reviewed and strategies for optimizing use of speech recognition technology are discussed. (Contains…

  19. Perception of speech in noise: neural correlates.

    Science.gov (United States)

    Song, Judy H; Skoe, Erika; Banai, Karen; Kraus, Nina

    2011-09-01

    The presence of irrelevant auditory information (other talkers, environmental noises) presents a major challenge to listening to speech. The fundamental frequency (F(0)) of the target speaker is thought to provide an important cue for the extraction of the speaker's voice from background noise, but little is known about the relationship between speech-in-noise (SIN) perceptual ability and neural encoding of the F(0). Motivated by recent findings that music and language experience enhance brainstem representation of sound, we examined the hypothesis that brainstem encoding of the F(0) is diminished to a greater degree by background noise in people with poorer perceptual abilities in noise. To this end, we measured speech-evoked auditory brainstem responses to /da/ in quiet and two multitalker babble conditions (two-talker and six-talker) in native English-speaking young adults who ranged in their ability to perceive and recall SIN. Listeners who were poorer performers on a standardized SIN measure demonstrated greater susceptibility to the degradative effects of noise on the neural encoding of the F(0). Particularly diminished was their phase-locked activity to the fundamental frequency in the portion of the syllable known to be most vulnerable to perceptual disruption (i.e., the formant transition period). Our findings suggest that the subcortical representation of the F(0) in noise contributes to the perception of speech in noisy conditions.

  20. The effects of noise vocoding on speech quality perception.

    Science.gov (United States)

    Anderson, Melinda C; Arehart, Kathryn H; Kates, James M

    2014-03-01

    Speech perception depends on access to spectral and temporal acoustic cues. Temporal cues include slowly varying amplitude changes (i.e. temporal envelope, TE) and quickly varying amplitude changes associated with the center frequency of the auditory filter (i.e. temporal fine structure, TFS). This study quantifies the effects of TFS randomization through noise vocoding on the perception of speech quality by parametrically varying the amount of original TFS available above 1500Hz. The two research aims were: 1) to establish the role of TFS in quality perception, and 2) to determine if the role of TFS in quality perception differs between subjects with normal hearing and subjects with sensorineural hearing loss. Ratings were obtained from 20 subjects (10 with normal hearing and 10 with hearing loss) using an 11-point quality scale. Stimuli were processed in three different ways: 1) A 32-channel noise-excited vocoder with random envelope fluctuations in the noise carrier, 2) a 32-channel noise-excited vocoder with the noise-carrier envelope smoothed, and 3) removal of high-frequency bands. Stimuli were presented in quiet and in babble noise at 18dB and 12dB signal-to-noise ratios. TFS randomization had a measurable detrimental effect on quality ratings for speech in quiet and a smaller effect for speech in background babble. Subjects with normal hearing and subjects with sensorineural hearing loss provided similar quality ratings for noise-vocoded speech. PMID:24333929

  1. The timing and effort of lexical access in natural and degraded speech

    Directory of Open Access Journals (Sweden)

    Anita Eva Wagner

    2016-03-01

    Full Text Available Understanding speech is effortless in ideal situations, and although adverse conditions, such as caused by hearing impairment, often render it an effortful task, they do not necessarily suspend speech comprehension. A prime example of this is speech perception by cochlear implant users, whose hearing prostheses transmit speech as a significantly degraded signal. It is yet unknown how mechanisms of speech processing deal with such degraded signals, and whether they are affected by effortful processing of speech.This paper compares the automatic process of lexical competition between natural and degraded speech, and combines gaze fixations, which capture the course of lexical disambiguation, with pupillometry, which quantifies the mental effort involved in processing speech. Listeners’ ocular responses were recorded during disambiguation of lexical embeddings with matching and mismatching durational cues. Durational cues were selected due to their substantial role in listeners' quick limitation of the number of lexical candidates for lexical access in natural speech. Results showed that lexical competition increased effort in processing natural stimuli in particular in presence of mismatching cues. Signal degradation reduced listeners’ ability to quickly integrate durational cues in lexical selection, and delayed and prolonged lexical competition. The effort of processing degraded speech was increased overall, and because it had its sources at the pre-lexical level this effect can be attributed to listening to degraded speech rather than to lexical disambiguation. In sum, the course of lexical competition was largely comparable for natural and degraded speech, but showed crucial shifts in timing, and different sources of increased mental effort. We argue that well-timed progress of information from sensory to pre-lexical and lexical stages of processing, which is the result of perceptual adaptation during speech development, is the reason why

  2. The Timing and Effort of Lexical Access in Natural and Degraded Speech.

    Science.gov (United States)

    Wagner, Anita E; Toffanin, Paolo; Başkent, Deniz

    2016-01-01

    Understanding speech is effortless in ideal situations, and although adverse conditions, such as caused by hearing impairment, often render it an effortful task, they do not necessarily suspend speech comprehension. A prime example of this is speech perception by cochlear implant users, whose hearing prostheses transmit speech as a significantly degraded signal. It is yet unknown how mechanisms of speech processing deal with such degraded signals, and whether they are affected by effortful processing of speech. This paper compares the automatic process of lexical competition between natural and degraded speech, and combines gaze fixations, which capture the course of lexical disambiguation, with pupillometry, which quantifies the mental effort involved in processing speech. Listeners' ocular responses were recorded during disambiguation of lexical embeddings with matching and mismatching durational cues. Durational cues were selected due to their substantial role in listeners' quick limitation of the number of lexical candidates for lexical access in natural speech. Results showed that lexical competition increased mental effort in processing natural stimuli in particular in presence of mismatching cues. Signal degradation reduced listeners' ability to quickly integrate durational cues in lexical selection, and delayed and prolonged lexical competition. The effort of processing degraded speech was increased overall, and because it had its sources at the pre-lexical level this effect can be attributed to listening to degraded speech rather than to lexical disambiguation. In sum, the course of lexical competition was largely comparable for natural and degraded speech, but showed crucial shifts in timing, and different sources of increased mental effort. We argue that well-timed progress of information from sensory to pre-lexical and lexical stages of processing, which is the result of perceptual adaptation during speech development, is the reason why in ideal

  3. Perception of correlations between acoustic cues in category tuning and speaker adaptation

    Science.gov (United States)

    Holt, Lori; Wade, Travis

    2001-05-01

    In English and many other languages, fundamental frequency (f0) varies with voicing such that voiced consonants are produced with lower f0's than their voiceless counterparts. This regularity robustly influences perception, such that sounds synthesized or spoken with a low f0 are more often perceived as voiced than are sounds with a higher f0. This series of studies exploited these observations to investigate category tuning as a function of incidental exposure to correlations among speech cues and adaptation to speaker idiosyncrasies or accent. Manipulation of f0 across sets of natural speech utterances produced stimulus sets varying in their inherent f0/voicing relationship. Listeners were exposed to these different f0/voicing patterns via spoken word and nonword items in a lexical decision task, and their resulting categorization of ambiguous consonants varying in f0 and voice onset time (VOT) was measured. The results suggest listeners adapt quickly to speaker-specific cues but also remain influenced by more global, naturally occurring covariance patterns of f0 and voicing in English. This pattern contrasts somewhat with studies where idiosyncrasy is represented instead by manipulation of primary, first-order cues to speech sounds, in which listeners are seen to adapt more straightforwardly to the cues they are presented.

  4. Pathomechanisms and compensatory efforts related to Parkinsonian speech

    Directory of Open Access Journals (Sweden)

    Christiane Arnold

    2014-01-01

    Full Text Available Voice and speech in Parkinson's disease (PD patients are classically affected by a hypophonia, dysprosody, and dysarthria. The underlying pathomechanisms of these disabling symptoms are not well understood. To identify functional anomalies related to pathophysiology and compensation we compared speech-related brain activity and effective connectivity in early PD patients who did not yet develop voice or speech symptoms and matched controls. During fMRI 20 PD patients ON and OFF levodopa and 20 control participants read 75 sentences covertly, overtly with neutral, or with happy intonation. A cue-target reading paradigm allowed for dissociating task preparation from execution. We found pathologically reduced striato-prefrontal preparatory effective connectivity in early PD patients associated with subcortical (OFF state or cortical (ON state compensatory networks. While speaking, PD patients showed signs of diminished monitoring of external auditory feedback. During generation of affective prosody, a reduced functional coupling between the ventral and dorsal striatum was observed. Our results suggest three pathomechanisms affecting speech in PD: While diminished energization on the basis of striato-prefrontal hypo-connectivity together with dysfunctional self-monitoring mechanisms could underlie hypophonia, dysarthria may result from fading speech motor representations given that they are not sufficiently well updated by external auditory feedback. A pathological interplay between the limbic and sensorimotor striatum could interfere with affective modulation of speech routines, which affects emotional prosody generation. However, early PD patients show compensatory mechanisms that could help improve future speech therapies.

  5. Digital speech processing using Matlab

    CERN Document Server

    Gopi, E S

    2014-01-01

    Digital Speech Processing Using Matlab deals with digital speech pattern recognition, speech production model, speech feature extraction, and speech compression. The book is written in a manner that is suitable for beginners pursuing basic research in digital speech processing. Matlab illustrations are provided for most topics to enable better understanding of concepts. This book also deals with the basic pattern recognition techniques (illustrated with speech signals using Matlab) such as PCA, LDA, ICA, SVM, HMM, GMM, BPN, and KSOM.

  6. Effects of Within-Talker Variability on Speech Intelligibility in Mandarin-Speaking Adult and Pediatric Cochlear Implant Patients

    Science.gov (United States)

    Su, Qiaotong; Galvin, John J.; Zhang, Guoping; Li, Yongxin

    2016-01-01

    Cochlear implant (CI) speech performance is typically evaluated using well-enunciated speech produced at a normal rate by a single talker. CI users often have greater difficulty with variations in speech production encountered in everyday listening. Within a single talker, speaking rate, amplitude, duration, and voice pitch information may be quite variable, depending on the production context. The coarse spectral resolution afforded by the CI limits perception of voice pitch, which is an important cue for speech prosody and for tonal languages such as Mandarin Chinese. In this study, sentence recognition from the Mandarin speech perception database was measured in adult and pediatric Mandarin-speaking CI listeners for a variety of speaking styles: voiced speech produced at slow, normal, and fast speaking rates; whispered speech; voiced emotional speech; and voiced shouted speech. Recognition of Mandarin Hearing in Noise Test sentences was also measured. Results showed that performance was significantly poorer with whispered speech relative to the other speaking styles and that performance was significantly better with slow speech than with fast or emotional speech. Results also showed that adult and pediatric performance was significantly poorer with Mandarin Hearing in Noise Test than with Mandarin speech perception sentences at the normal rate. The results suggest that adult and pediatric Mandarin-speaking CI patients are highly susceptible to whispered speech, due to the lack of lexically important voice pitch cues and perhaps other qualities associated with whispered speech. The results also suggest that test materials may contribute to differences in performance observed between adult and pediatric CI users. PMID:27363714

  7. Beat synchronization predicts neural speech encoding and reading readiness in preschoolers.

    Science.gov (United States)

    Woodruff Carr, Kali; White-Schwoch, Travis; Tierney, Adam T; Strait, Dana L; Kraus, Nina

    2014-10-01

    Temporal cues are important for discerning word boundaries and syllable segments in speech; their perception facilitates language acquisition and development. Beat synchronization and neural encoding of speech reflect precision in processing temporal cues and have been linked to reading skills. In poor readers, diminished neural precision may contribute to rhythmic and phonological deficits. Here we establish links between beat synchronization and speech processing in children who have not yet begun to read: preschoolers who can entrain to an external beat have more faithful neural encoding of temporal modulations in speech and score higher on tests of early language skills. In summary, we propose precise neural encoding of temporal modulations as a key mechanism underlying reading acquisition. Because beat synchronization abilities emerge at an early age, these findings may inform strategies for early detection of and intervention for language-based learning disabilities. PMID:25246562

  8. Lip movements entrain the observers' low-frequency brain oscillations to facilitate speech intelligibility

    OpenAIRE

    Park, Hyojin; Kayser, Christoph; Thut, Gregor; Gross, Joachim

    2016-01-01

    eLife digest People are able communicate effectively with each other even in very noisy places where it is difficult to actually hear what others are saying. In a face-to-face conversation, people detect and respond to many physical cues – including body posture, facial expressions, head and eye movements and gestures – alongside the sound cues. Lip movements are particularly important and contain enough information to allow trained observers to understand speech even if they cannot hear the ...

  9. Using auditory-visual speech to probe the basis of noise-impaired consonant-vowel perception in dyslexia and auditory neuropathy

    Science.gov (United States)

    Ramirez, Joshua; Mann, Virginia

    2005-08-01

    Both dyslexics and auditory neuropathy (AN) subjects show inferior consonant-vowel (CV) perception in noise, relative to controls. To better understand these impairments, natural acoustic speech stimuli that were masked in speech-shaped noise at various intensities were presented to dyslexic, AN, and control subjects either in isolation or accompanied by visual articulatory cues. AN subjects were expected to benefit from the pairing of visual articulatory cues and auditory CV stimuli, provided that their speech perception impairment reflects a relatively peripheral auditory disorder. Assuming that dyslexia reflects a general impairment of speech processing rather than a disorder of audition, dyslexics were not expected to similarly benefit from an introduction of visual articulatory cues. The results revealed an increased effect of noise masking on the perception of isolated acoustic stimuli by both dyslexic and AN subjects. More importantly, dyslexics showed less effective use of visual articulatory cues in identifying masked speech stimuli and lower visual baseline performance relative to AN subjects and controls. Last, a significant positive correlation was found between reading ability and the ameliorating effect of visual articulatory cues on speech perception in noise. These results suggest that some reading impairments may stem from a central deficit of speech processing.

  10. Relative cue encoding in the context of sophisticated models of categorization: Separating information from categorization

    Science.gov (United States)

    McMurray, Bob

    2014-01-01

    Traditional studies of human categorization often treat the processes of encoding features and cues as peripheral to the question of how stimuli are categorized. However, in domains where the features and cues are less transparent, how information is encoded prior to categorization may constrain our understanding of the architecture of categorization. This is particularly true in speech perception, where acoustic cues to phonological categories are ambiguous and influenced by multiple factors. Here, it is crucial to consider the joint contributions of the information in the input and the categorization architecture. We contrasted accounts that argue for raw acoustic information encoding with accounts that posit that cues are encoded relative to expectations, and investigated how two categorization architectures—exemplar models and back-propagation parallel distributed processing models—deal with each kind of information. Relative encoding, akin to predictive coding, is a form of noise reduction, so it can be expected to improve model accuracy; however, like predictive coding, the use of relative encoding in speech perception by humans is controversial, so results are compared to patterns of human performance, rather than on the basis of overall accuracy. We found that, for both classes of models, in the vast majority of parameter settings, relative cues greatly helped the models approximate human performance. This suggests that expectation-relative processing is a crucial precursor step in phoneme categorization, and that understanding the information content is essential to understanding categorization processes. PMID:25475048

  11. Evaluation of multimodal ground cues

    DEFF Research Database (Denmark)

    Nordahl, Rolf; Lecuyer, Anatole; Serafin, Stefania;

    2012-01-01

    This chapter presents an array of results on the perception of ground surfaces via multiple sensory modalities,with special attention to non visual perceptual cues, notably those arising from audition and haptics, as well as interactions between them. It also reviews approaches to combining...... synthetic multimodal cues, from vision, haptics, and audition, in order to realize virtual experiences of walking on simulated ground surfaces or other features....

  12. Real-Time Audio-Visual Analysis for Multiperson Videoconferencing

    Directory of Open Access Journals (Sweden)

    Petr Motlicek

    2013-01-01

    Full Text Available We describe the design of a system consisting of several state-of-the-art real-time audio and video processing components enabling multimodal stream manipulation (e.g., automatic online editing for multiparty videoconferencing applications in open, unconstrained environments. The underlying algorithms are designed to allow multiple people to enter, interact, and leave the observable scene with no constraints. They comprise continuous localisation of audio objects and its application for spatial audio object coding, detection, and tracking of faces, estimation of head poses and visual focus of attention, detection and localisation of verbal and paralinguistic events, and the association and fusion of these different events. Combined all together, they represent multimodal streams with audio objects and semantic video objects and provide semantic information for stream manipulation systems (like a virtual director. Various experiments have been performed to evaluate the performance of the system. The obtained results demonstrate the effectiveness of the proposed design, the various algorithms, and the benefit of fusing different modalities in this scenario.

  13. Automatic Identification used in Audio-Visual indexing and Analysis

    Directory of Open Access Journals (Sweden)

    A. Satish Chowdary

    2011-09-01

    Full Text Available To locate a video clip in large collections is very important for retrieval applications, especially for digital rights management. We attempt to provide a comprehensive and high-level review of audiovisual features that can be extracted from the standard compressed domains, such as MPEG-1 and MPEG-2. This paper presents a graph transformation and matching approach to identify the occurrence of potentially different ordering or length due to content editing. With a novel batch query algorithm to retrieve similar frames, the mapping relationship between the query and database video is first represented by a bipartite graph. The densely matched parts along the long sequence are then extracted, followed by a filter-and-refine search strategy to prune some irrelevant subsequences. During the filtering stage, Maximum Size Matching is deployed for each sub graph constructed by the query and candidate subsequence to obtain a smaller set of candidates. During the refinement stage, Sub-Maximum Similarity Matching is devised to identify the subsequence with the highest aggregate score from all candidates, according to a robust video similarity model that incorporates visual content, temporal order, and frame alignment information. This new algorithm is based on dynamic programming that fully uses the temporal dimension to measure the similarity between two video sequences. A normalized chromaticity histogram is used as a feature which is illumination invariant. Dynamic programming is applied on shot level to find the optimal nonlinear mapping between video sequences. Two new normalized distance measures are presented for video sequence matching. One measure is based on the normalization of the optimal path found by dynamic programming. The other measure combines both the visual features and the temporal information. The proposed distance measures are suitable for variable-length comparisons.

  14. Audio-visual Training for Lip–reading

    DEFF Research Database (Denmark)

    Gebert, Hermann; Bothe, Hans-Heinrich

    2011-01-01

    This new edited book aims to bring together researchers and developers from various related areas to share their knowledge and experience, to describe current state of the art in mobile and wireless-based adaptive e-learning and to present innovative techniques and solutions that support a person...

  15. Evaluating audio-visual and computer programs for classroom use.

    Science.gov (United States)

    Van Ort, S

    1989-01-01

    Appropriate faculty decisions regarding adoption of audiovisual and computer programs are critical to the classroom use of these learning materials. The author describes the decision-making process in one college of nursing and the adaptation of an evaluation tool for use by faculty in reviewing audiovisual and computer programs. PMID:2467237

  16. Audio-visual interactions in product sound design

    NARCIS (Netherlands)

    Özcan, E.; Van Egmond, R.

    2010-01-01

    Consistent product experience requires congruity between product properties such as visual appearance and sound. Therefore, for designing appropriate product sounds by manipulating their spectral-temporal structure, product sounds should preferably not be considered in isolation but as an integral p

  17. Preattentive processing of audio-visual emotional signals

    DEFF Research Database (Denmark)

    Föcker, J.; Gondan, Matthias; Röder, B.

    2011-01-01

    Previous research has shown that redundant information in faces and voices leads to faster emotional categorization compared to incongruent emotional information even when attending to only one modality. The aim of the present study was to test whether these crossmodal effects are predominantly d...

  18. A Joint Audio-Visual Approach to Audio Localization

    DEFF Research Database (Denmark)

    Jensen, Jesper Rindom; Christensen, Mads Græsbøll

    2015-01-01

    Localization of audio sources is an important research problem, e.g., to facilitate noise reduction. In the recent years, the problem has been tackled using distributed microphone arrays (DMA). A common approach is to apply direction-of-arrival (DOA) estimation on each array (denoted as nodes), a...... time-of-flight cameras. Moreover, we propose an optimal method for weighting such DOA and range information for audio localization. Our experiments on both synthetic and real data show that there is a clear, potential advantage of using the joint audiovisual localization framework....

  19. Audio-visual interactions in product sound design

    OpenAIRE

    Özcan, E.; Van Egmond, R.

    2010-01-01

    Consistent product experience requires congruity between product properties such as visual appearance and sound. Therefore, for designing appropriate product sounds by manipulating their spectral-temporal structure, product sounds should preferably not be considered in isolation but as an integral part of the main product concept. Because visual aspects of a product are considered to dominate the communication of the desired product concept, sound is usually expected to fit the visual charact...

  20. Uses and Abuses of Audio-Visual Aids in Reading.

    Science.gov (United States)

    Eggers, Edwin H.

    Audiovisual aids are properly used in reading when they "turn students on," and they are abused when they fail to do so or when they actually "turn students off." General guidelines one could use in sorting usable from unusable aids are (1) Has the teacher saved time by using an audiovisual aid? (2) Is the aid appropriate to the sophistication…

  1. Audio-Visual Equipment Depreciation. RDU-75-07.

    Science.gov (United States)

    Drake, Miriam A.; Baker, Martha

    A study was conducted at Purdue University to gather operational and budgetary planning data for the Libraries and Audiovisual Center. The objectives were: (1) to complete a current inventory of equipment including year of purchase, costs, and salvage value; (2) to determine useful life data for general classes of equipment; and (3) to determine…

  2. Utilization of audio-visual aids by family welfare workers.

    Science.gov (United States)

    Naik, V R; Jain, P K; Sharma, B B

    1977-01-01

    Communication efforts have been an important component of the Indian Family Planning Welfare Program since its inception. However, its chief interests in its early years were clinical, until the adoption of the extension approach in 1963. Educational materials were developed, especially in the period 1965-8, to fit mass, group meeting and home visit approaches. Audiovisual aids were developed for use by extension workers, who had previously relied entirely on verbal approaches. This paper examines their use. A questionnaire was designed for workers in motivational programs at 3 levels: Village Level (Family Planning Health Assistant, Auxilliary Nurse-Midwife, Dias), Block Level (Public Health Nurse, Lady Health Visitor, Block Extension Educator), and District (District Extension Educator, District Mass Education and Information Officer). 3 Districts were selected from each State on the basis of overall family planning performance during 1970-2 (good, average, or poor). Units of other agencies were also included on the same basis. Findings: 1) Workers in all 3 categories preferred individual contacts over group meetings or mass approach. 2) 56-64% said they used audiovisual aids "sometimes" (when available). 25% said they used them "many times" and only 15.9% said "rarely." 3) More than 1/2 of workers in each category said they were not properly oriented toward the use of audiovisual aids. Nonavailability of the aids in the market was also cited. About 1/3 of village level and 1/2 of other workers said that the materials were heavy and liable to be damaged. Complexity, inaccuracy and confusion in use were not widely cited (less than 30%).

  3. Combining cues while avoiding perceptual conflicts

    NARCIS (Netherlands)

    Hogervorst, M.A.; Brenner, E.

    2004-01-01

    A common assumption in cue combination models is that small discrepancies between cues are due to the limited resolution of the individual cues. Whenever this assumption holds, information from the separate cues can best be combined to give a single, more accurate estimate of the property of interes

  4. Aggression detection in speech using sensor and semantic information

    NARCIS (Netherlands)

    Lefter, I.; Rothkrantz, L.J.M.; Burghouts, G.J.

    2012-01-01

    By analyzing a multimodal (audio-visual) database with aggressive incidents in trains, we have observed that there are no trivial fusion algorithms to successfully predict multimodal aggression based on unimodal sensor inputs. We proposed a fusion framework that contains a set of intermediate level

  5. Exploration of Speech Planning and Producing by Speech Error Analysis

    Institute of Scientific and Technical Information of China (English)

    冷卉

    2012-01-01

    Speech error analysis is an indirect way to discover speech planning and producing processes. From some speech errors made by people in their daily life, linguists and learners can reveal the planning and producing processes more easily and clearly.

  6. Indirect Speech Acts

    Institute of Scientific and Technical Information of China (English)

    李威

    2001-01-01

    Indirect speech acts are frequently used in verbal communication, the interpretation of them is of great importance in order to meet the demands of the development of students' communicative competence. This paper, therefore, intends to present Searle' s indirect speech acts and explore the way how indirect speech acts are interpreted in accordance with two influential theories. It consists of four parts. Part one gives a general introduction to the notion of speech acts theory. Part two makes an elaboration upon the conception of indirect speech act theory proposed by Searle and his supplement and development of illocutionary acts. Part three deals with the interpretation of indirect speech acts. Part four draws implication from the previous study and also serves as the conclusion of the dissertation.

  7. 一种噪音环境下的基于特征口形的音频视频混合连续语音识别系统%An Eigen Mouth Based Audio Visual Continuous Speech Recognition System in Noisy Environments

    Institute of Scientific and Technical Information of China (English)

    谢磊; I.Cravyse; 蒋冬梅; 赵荣椿; H.Sahli; Werner Verhelst; J Cornelis; Ignace Lemahieu

    2003-01-01

    文章抓住人类语音感知多模型的特点,尝试建立一个在噪音环境下的基于音频和视频复合特征的连续语音识别系统.在视频特征提取方面,引入了一种基于特征口形的提取方法.识别实验证明,这种视频特征提取方法比传统DCT、DWT方法能够带来更高的识别率;基于特征口形的音频-视频混合连续语音识别系统具有很好的抗噪性.

  8. Speech Alarms Pilot Study

    Science.gov (United States)

    Sandor, Aniko; Moses, Haifa

    2016-01-01

    Speech alarms have been used extensively in aviation and included in International Building Codes (IBC) and National Fire Protection Association's (NFPA) Life Safety Code. However, they have not been implemented on space vehicles. Previous studies conducted at NASA JSC showed that speech alarms lead to faster identification and higher accuracy. This research evaluated updated speech and tone alerts in a laboratory environment and in the Human Exploration Research Analog (HERA) in a realistic setup.

  9. Advances in speech processing

    Science.gov (United States)

    Ince, A. Nejat

    1992-10-01

    The field of speech processing is undergoing a rapid growth in terms of both performance and applications and this is fueled by the advances being made in the areas of microelectronics, computation, and algorithm design. The use of voice for civil and military communications is discussed considering advantages and disadvantages including the effects of environmental factors such as acoustic and electrical noise and interference and propagation. The structure of the existing NATO communications network and the evolving Integrated Services Digital Network (ISDN) concept are briefly reviewed to show how they meet the present and future requirements. The paper then deals with the fundamental subject of speech coding and compression. Recent advances in techniques and algorithms for speech coding now permit high quality voice reproduction at remarkably low bit rates. The subject of speech synthesis is next treated where the principle objective is to produce natural quality synthetic speech from unrestricted text input. Speech recognition where the ultimate objective is to produce a machine which would understand conversational speech with unrestricted vocabulary, from essentially any talker, is discussed. Algorithms for speech recognition can be characterized broadly as pattern recognition approaches and acoustic phonetic approaches. To date, the greatest degree of success in speech recognition has been obtained using pattern recognition paradigms. It is for this reason that the paper is concerned primarily with this technique.

  10. Advances in Speech Recognition

    CERN Document Server

    Neustein, Amy

    2010-01-01

    This volume is comprised of contributions from eminent leaders in the speech industry, and presents a comprehensive and in depth analysis of the progress of speech technology in the topical areas of mobile settings, healthcare and call centers. The material addresses the technical aspects of voice technology within the framework of societal needs, such as the use of speech recognition software to produce up-to-date electronic health records, not withstanding patients making changes to health plans and physicians. Included will be discussion of speech engineering, linguistics, human factors ana

  11. Principles of speech coding

    CERN Document Server

    Ogunfunmi, Tokunbo

    2010-01-01

    It is becoming increasingly apparent that all forms of communication-including voice-will be transmitted through packet-switched networks based on the Internet Protocol (IP). Therefore, the design of modern devices that rely on speech interfaces, such as cell phones and PDAs, requires a complete and up-to-date understanding of the basics of speech coding. Outlines key signal processing algorithms used to mitigate impairments to speech quality in VoIP networksOffering a detailed yet easily accessible introduction to the field, Principles of Speech Coding provides an in-depth examination of the

  12. Increased pain intensity is associated with greater verbal communication difficulty and increased production of speech and co-speech gestures.

    Directory of Open Access Journals (Sweden)

    Samantha Rowbotham

    Full Text Available Effective pain communication is essential if adequate treatment and support are to be provided. Pain communication is often multimodal, with sufferers utilising speech, nonverbal behaviours (such as facial expressions, and co-speech gestures (bodily movements, primarily of the hands and arms that accompany speech and can convey semantic information to communicate their experience. Research suggests that the production of nonverbal pain behaviours is positively associated with pain intensity, but it is not known whether this is also the case for speech and co-speech gestures. The present study explored whether increased pain intensity is associated with greater speech and gesture production during face-to-face communication about acute, experimental pain. Participants (N = 26 were exposed to experimentally elicited pressure pain to the fingernail bed at high and low intensities and took part in video-recorded semi-structured interviews. Despite rating more intense pain as more difficult to communicate (t(25  = 2.21, p =  .037, participants produced significantly longer verbal pain descriptions and more co-speech gestures in the high intensity pain condition (Words: t(25  = 3.57, p  = .001; Gestures: t(25  = 3.66, p =  .001. This suggests that spoken and gestural communication about pain is enhanced when pain is more intense. Thus, in addition to conveying detailed semantic information about pain, speech and co-speech gestures may provide a cue to pain intensity, with implications for the treatment and support received by pain sufferers. Future work should consider whether these findings are applicable within the context of clinical interactions about pain.

  13. Increased pain intensity is associated with greater verbal communication difficulty and increased production of speech and co-speech gestures.

    Science.gov (United States)

    Rowbotham, Samantha; Wardy, April J; Lloyd, Donna M; Wearden, Alison; Holler, Judith

    2014-01-01

    Effective pain communication is essential if adequate treatment and support are to be provided. Pain communication is often multimodal, with sufferers utilising speech, nonverbal behaviours (such as facial expressions), and co-speech gestures (bodily movements, primarily of the hands and arms that accompany speech and can convey semantic information) to communicate their experience. Research suggests that the production of nonverbal pain behaviours is positively associated with pain intensity, but it is not known whether this is also the case for speech and co-speech gestures. The present study explored whether increased pain intensity is associated with greater speech and gesture production during face-to-face communication about acute, experimental pain. Participants (N = 26) were exposed to experimentally elicited pressure pain to the fingernail bed at high and low intensities and took part in video-recorded semi-structured interviews. Despite rating more intense pain as more difficult to communicate (t(25)  = 2.21, p =  .037), participants produced significantly longer verbal pain descriptions and more co-speech gestures in the high intensity pain condition (Words: t(25)  = 3.57, p  = .001; Gestures: t(25)  = 3.66, p =  .001). This suggests that spoken and gestural communication about pain is enhanced when pain is more intense. Thus, in addition to conveying detailed semantic information about pain, speech and co-speech gestures may provide a cue to pain intensity, with implications for the treatment and support received by pain sufferers. Future work should consider whether these findings are applicable within the context of clinical interactions about pain. PMID:25343486

  14. Speech-Language Therapy (For Parents)

    Science.gov (United States)

    ... 5 Things to Know About Zika & Pregnancy Speech-Language Therapy KidsHealth > For Parents > Speech-Language Therapy Print ... with speech and/or language disorders. Speech Disorders, Language Disorders, and Feeding Disorders A speech disorder refers ...

  15. Time-expanded speech and speech recognition in older adults.

    Science.gov (United States)

    Vaughan, Nancy E; Furukawa, Izumi; Balasingam, Nirmala; Mortz, Margaret; Fausti, Stephen A

    2002-01-01

    Speech understanding deficits are common in older adults. In addition to hearing sensitivity, changes in certain cognitive functions may affect speech recognition. One such change that may impact the ability to follow a rapidly changing speech signal is processing speed. When speakers slow the rate of their speech naturally in order to speak clearly, speech recognition is improved. The acoustic characteristics of naturally slowed speech are of interest in developing time-expansion algorithms to improve speech recognition for older listeners. In this study, we tested younger normally hearing, older normally hearing, and older hearing-impaired listeners on time-expanded speech using increased duration and increased intensity of unvoiced consonants. Although all groups performed best on unprocessed speech, performance with processed speech was better with the consonant gain feature without time expansion in the noise condition and better at the slowest time-expanded rate in the quiet condition. The effects of signal processing on speech recognition are discussed. PMID:17642020

  16. Speech recognition in natural background noise.

    Directory of Open Access Journals (Sweden)

    Julien Meyer

    Full Text Available In the real world, human speech recognition nearly always involves listening in background noise. The impact of such noise on speech signals and on intelligibility performance increases with the separation of the listener from the speaker. The present behavioral experiment provides an overview of the effects of such acoustic disturbances on speech perception in conditions approaching ecologically valid contexts. We analysed the intelligibility loss in spoken word lists with increasing listener-to-speaker distance in a typical low-level natural background noise. The noise was combined with the simple spherical amplitude attenuation due to distance, basically changing the signal-to-noise ratio (SNR. Therefore, our study draws attention to some of the most basic environmental constraints that have pervaded spoken communication throughout human history. We evaluated the ability of native French participants to recognize French monosyllabic words (spoken at 65.3 dB(A, reference at 1 meter at distances between 11 to 33 meters, which corresponded to the SNRs most revealing of the progressive effect of the selected natural noise (-8.8 dB to -18.4 dB. Our results showed that in such conditions, identity of vowels is mostly preserved, with the striking peculiarity of the absence of confusion in vowels. The results also confirmed the functional role of consonants during lexical identification. The extensive analysis of recognition scores, confusion patterns and associated acoustic cues revealed that sonorant, sibilant and burst properties were the most important parameters influencing phoneme recognition. . Altogether these analyses allowed us to extract a resistance scale from consonant recognition scores. We also identified specific perceptual consonant confusion groups depending of the place in the words (onset vs. coda. Finally our data suggested that listeners may access some acoustic cues of the CV transition, opening interesting perspectives for

  17. Processing of audio and visual speech for telecommunication systems

    Science.gov (United States)

    Shah, Druti; Marshall, Stephen

    1999-07-01

    Most verbal communications use cues from both the visual and acoustic modalities to convey messages. During the production of speech, the visible information provided by the external articulatory organs can influence the understanding of the language, by interpreting the combined information into meaningful linguistic expressions. The task of integrating speech and image data to emulate the bimodal human interaction system c an be depicted by developing automated systems. These systems have a wide range of applications such as the videophone systems, where the interdependencies between image and speech signals can be exploited for data compression and in solving the task of lip synchronization which has been a major problem. Therefore the objective of this work is to investigate and quantify this relationship such that the knowledge gained will assist in longer term multimedia and videophone research.

  18. Do We Perceive Others Better than Ourselves? A Perceptual Benefit for Noise-Vocoded Speech Produced by an Average Speaker.

    Directory of Open Access Journals (Sweden)

    William L Schuerman

    Full Text Available In different tasks involving action perception, performance has been found to be facilitated when the presented stimuli were produced by the participants themselves rather than by another participant. These results suggest that the same mental representations are accessed during both production and perception. However, with regard to spoken word perception, evidence also suggests that listeners' representations for speech reflect the input from their surrounding linguistic community rather than their own idiosyncratic productions. Furthermore, speech perception is heavily influenced by indexical cues that may lead listeners to frame their interpretations of incoming speech signals with regard to speaker identity. In order to determine whether word recognition evinces similar self-advantages as found in action perception, it was necessary to eliminate indexical cues from the speech signal. We therefore asked participants to identify noise-vocoded versions of Dutch words that were based on either their own recordings or those of a statistically average speaker. The majority of participants were more accurate for the average speaker than for themselves, even after taking into account differences in intelligibility. These results suggest that the speech representations accessed during perception of noise-vocoded speech are more reflective of the input of the speech community, and hence that speech perception is not necessarily based on representations of one's own speech.

  19. Speech Compression for Noise-Corrupted Thai Expressive Speech

    Directory of Open Access Journals (Sweden)

    Suphattharachai Chomphan

    2011-01-01

    Full Text Available Problem statement: In speech communication, speech coding aims at preserving the speech quality with lower coding bitrate. When considering the communication environment, various types of noises deteriorates the speech quality. The expressive speech with different speaking styles may cause different speech quality with the same coding method. Approach: This research proposed a study of speech compression for noise-corrupted Thai expressive speech by using two coding methods of CS-ACELP and MP-CELP. The speech material included a hundredmale speech utterances and a hundred female speech utterances. Four speaking styles included enjoyable, sad, angry and reading styles. Five sentences of Thai speech were chosen. Three types of noises were included (train, car and air conditioner. Five levels of each type of noise were varied from 0-20 dB. The subjective test of mean opinion score was exploited in the evaluation process. Results: The experimental results showed that CS-ACELP gave the better speech quality than that of MP-CELP at all three bitrates of 6000, 8600-12600 bps. When considering the levels of noise, the 20-dB noise gave the best speech quality, while 0-dB noise gave the worst speech quality. When considering the speech gender, female speech gave the better results than that of male speech. When considering the types of noise, the air-conditioner noise gave the best speech quality, while the train noise gave the worst speech quality. Conclusion: From the study, it can be seen that coding methods, types of noise, levels of noise, speech gender influence on the coding speech quality.

  20. Improving Alaryngeal Speech Intelligibility.

    Science.gov (United States)

    Christensen, John M.; Dwyer, Patricia E.

    1990-01-01

    Laryngectomized patients using esophageal speech or an electronic artificial larynx have difficulty producing correct voicing contrasts between homorganic consonants. This paper describes a therapy technique that emphasizes "pushing harder" on voiceless consonants to improve alaryngeal speech intelligibility and proposes focusing on the production…

  1. Tracking Speech Sound Acquisition

    Science.gov (United States)

    Powell, Thomas W.

    2011-01-01

    This article describes a procedure to aid in the clinical appraisal of child speech. The approach, based on the work by Dinnsen, Chin, Elbert, and Powell (1990; Some constraints on functionally disordered phonologies: Phonetic inventories and phonotactics. "Journal of Speech and Hearing Research", 33, 28-37), uses a railway idiom to track gains in…

  2. Private Speech in Ballet

    Science.gov (United States)

    Johnston, Dale

    2006-01-01

    Authoritarian teaching practices in ballet inhibit the use of private speech. This paper highlights the critical importance of private speech in the cognitive development of young ballet students, within what is largely a non-verbal art form. It draws upon research by Russian psychologist Lev Vygotsky and contemporary socioculturalists, to…

  3. Free Speech Yearbook 1977.

    Science.gov (United States)

    Phifer, Gregg, Ed.

    The eleven articles in this collection explore various aspects of freedom of speech. Topics include the lack of knowledge on the part of many judges regarding the complex act of communication; the legislatures and free speech in colonial Connecticut and Rhode Island; contributions of sixteenth century Anabaptist heretics to First Amendment…

  4. Speech Situations and TEFL

    Institute of Scientific and Technical Information of China (English)

    吴树奇; 高建国

    2008-01-01

    This paper deals with how speech situations or ratherspeech implicatures affect TEFL.As far as the writer is concerned,they have much influence on many aspect of language teaching.To illustrate this point explicitly,the writer focuses on the influence of speech situations upon pronunciation,intonation,lexical meanings,sentence comprehension and the grammatical study of the English language.

  5. Free Speech. No. 38.

    Science.gov (United States)

    Kane, Peter E., Ed.

    This issue of "Free Speech" contains the following articles: "Daniel Schoor Relieved of Reporting Duties" by Laurence Stern, "The Sellout at CBS" by Michael Harrington, "Defending Dan Schorr" by Tome Wicker, "Speech to the Washington Press Club, February 25, 1976" by Daniel Schorr, "Funds Voted For Schorr Inquiry" by Richard Lyons, "Erosion of the…

  6. Speech processing standards

    Science.gov (United States)

    Ince, A. Nejat

    1990-05-01

    Speech processing standards are given for 64, 32, 16 kb/s and lower rate speech and more generally, speech-band signals which are or will be promulgated by CCITT and NATO. The International Telegraph and Telephone Consultative Committee (CCITT) of the International body which deals, among other things, with speech processing within the context of ISDN. Within NATO there are also bodies promulgating standards which make interoperability, possible without complex and expensive interfaces. Some of the applications for low-bit rate voice and the related work undertaken by CCITT Study Groups which are responsible for developing standards in terms of encoding algorithms, codec design objectives as well as standards on the assessment of speech quality, are highlighted.

  7. Charisma in business speeches

    DEFF Research Database (Denmark)

    Niebuhr, Oliver; Brem, Alexander; Novák-Tót, Eszter;

    2016-01-01

    of the acoustic-prosodic signal, secondly, focuses on business speeches like product presentations, and, thirdly, in doing so, advances the still fairly fragmentary evidence on the prosodic correlates of charismatic speech. We show that the prosodic features of charisma in political speeches also apply......Charisma is a key component of spoken language interaction; and it is probably for this reason that charismatic speech has been the subject of intensive research for centuries. However, what is still largely missing is a quantitative and objective line of research that, firstly, involves analyses...... to business speeches. Consistent with the public opinion, our findings are indicative of Steve Jobs being a more charismatic speaker than Mark Zuckerberg. Beyond previous studies, our data suggest that rhythm and emphatic accentuation are also involved in conveying charisma. Furthermore, the differences...

  8. The temporal binding window for audiovisual speech: Children are like little adults.

    Science.gov (United States)

    Hillock-Dunn, Andrea; Grantham, D Wesley; Wallace, Mark T

    2016-07-29

    During a typical communication exchange, both auditory and visual cues contribute to speech comprehension. The influence of vision on speech perception can be measured behaviorally using a task where incongruent auditory and visual speech stimuli are paired to induce perception of a novel token reflective of multisensory integration (i.e., the McGurk effect). This effect is temporally constrained in adults, with illusion perception decreasing as the temporal offset between the auditory and visual stimuli increases. Here, we used the McGurk effect to investigate the development of the temporal characteristics of audiovisual speech binding in 7-24 year-olds. Surprisingly, results indicated that although older participants perceived the McGurk illusion more frequently, no age-dependent change in the temporal boundaries of audiovisual speech binding was observed. PMID:26920938

  9. The Exploitation of Subphonemic Acoustic Detail in L2 Speech Segmentation

    Science.gov (United States)

    Shoemaker, Ellenor

    2014-01-01

    The current study addresses an aspect of second language (L2) phonological acquisition that has received little attention to date--namely, the acquisition of allophonic variation as a word boundary cue. The role of subphonemic variation in the segmentation of speech by native speakers has been indisputably demonstrated; however, the acquisition of…

  10. Effects of First and Second Language on Segmentation of Non-Native Speech

    Science.gov (United States)

    Hanulikova, Adriana; Mitterer, Holger; McQueen, James M.

    2011-01-01

    Do Slovak-German bilinguals apply native Slovak phonological and lexical knowledge when segmenting German speech? When Slovaks listen to their native language, segmentation is impaired when fixed-stress cues are absent (Hanulikova, McQueen & Mitterer, 2010), and, following the Possible-Word Constraint (PWC; Norris, McQueen, Cutler & Butterfield,…

  11. Children's Responses to Computer-Synthesized Speech in Educational Media: Gender Consistency and Gender Similarity Effects

    Science.gov (United States)

    Lee, Kwan Min; Liao, Katharine; Ryu, Seoungho

    2007-01-01

    This study examines children's social responses to gender cues in synthesized speech in a computer-based instruction setting. Eighty 5th-grade elementary school children were randomly assigned to one of the conditions in a full-factorial 2 (participant gender) x 2 (voice gender) x 2 (content gender) experiment. Results show that children apply…

  12. Durational Patterning at Syntactic and Discourse Boundaries in Mandarin Spontaneous Speech

    Science.gov (United States)

    Fon, Janice; Johnson, Keith; Chen, Sally

    2011-01-01

    This study focused on durational cues (i.e., syllable duration, pause duration, and syllable onset intervals (SOIs)) at discourse boundaries in two dialects of Mandarin, Taiwan and Mainland varieties. Speech was elicited by having 18 participants describe events in "The Pear Story" film. Recorded data were transcribed, labeled, and segmented into…

  13. The Production of Emotional Prosody in Varying Degrees of Severity of Apraxia of Speech.

    Science.gov (United States)

    Van Putten, Steffany M.; Walker, Judy P.

    2003-01-01

    A study examined the abilities of three adults with varying degrees of apraxia of speech (AOS) to produce emotional prosody. Acoustic analyses of the subjects' productions revealed that unlike the control subject, the subjects with AOS did not produce differences in duration and amplitude cues to convey different emotions. (Contains references.)…

  14. Adaptive changes between cue abstraction and exemplar memory in a multiple-cue judgment task with continuous cues.

    Science.gov (United States)

    Karlsson, Linea; Juslin, Peter; Olsson, Henrik

    2007-12-01

    The majority of previous studies on multiple-cue judgment with continuous cues have involved comparisons between judgments and multiple linear regression models that integrated cues into a judgment. The authors present an experiment indicating that in a judgment task with additive combination of multiple continuous cues, people indeed displayed abstract knowledge of the cue criterion relations that was mentally integrated into a judgment, but in a task with multiplicative combination of continuous cues, people instead relied on retrieval of memory traces of similar judgment cases (exemplars). These results suggest that people may adopt qualitatively distinct forms of knowledge, depending on the structure of a multiple-cue judgment task. The authors discuss implications for theories of multiple-cue judgment. PMID:18229487

  15. Cue weight in the perception of Trique glottal consonants.

    Science.gov (United States)

    DiCanio, Christian

    2014-02-01

    This paper examines the perceptual weight of cues to the coda glottal consonant contrast in Trique (Oto-Manguean) with native listeners. The language contrasts words with no coda (/Vː/) from words with a coda glottal stop (/VɁ/) or breathy coda (/Vɦ/). The results from a speeded AX (same-different) lexical discrimination task show high accuracy in lexical identification for the /Vː/-/Vɦ/ contrast, but lower accuracy for the other contrasts. The second experiment consists of a labeling task where the three acoustic dimensions that distinguished the glottal consonant codas in production [duration, the amplitude difference between the first two harmonics (H1-H2), and F0] were modified orthogonally using step-wise resynthesis. This task determines the relative weight of each dimension in phonological categorization. The results show that duration was the strongest cue. Listeners were only sensitive to changes in H1-H2 for the /Vː/-/Vɦ/ and /Vː/-/VɁ/ contrasts when duration was ambiguous. Listeners were only sensitive to changes in F0 for the /Vː/-/Vɦ/ contrast when both duration and H1-H2 were ambiguous. The perceptual cue weighting for each contrast closely matches existing production data [DiCanio (2012 a). J. Phon. 40, 162-176] Cue weight differences in speech perception are explained by differences in step-interval size and the notion of adaptive plasticity [Francis et al. (2008). J. Acoust. Soc. Am. 124, 1234-1251; Holt and Lotto (2006). J. Acoust. Soc. Am. 119, 3059-3071]. PMID:25234896

  16. Zebra finches can use positional and transitional cues to distinguish vocal element strings.

    Science.gov (United States)

    Chen, Jiani; Ten Cate, Carel

    2015-08-01

    Learning sequences is of great importance to humans and non-human animals. Many motor and mental actions, such as singing in birds and speech processing in humans, rely on sequential learning. At least two mechanisms are considered to be involved in such learning. The chaining theory proposes that learning of sequences relies on memorizing the transitions between adjacent items, while the positional theory suggests that learners encode the items according to their ordinal position in the sequence. Positional learning is assumed to dominate sequential learning. However, human infants exposed to a string of speech sounds can learn transitional (chaining) cues. So far, it is not clear whether birds, an increasingly important model for examining vocal processing, can do this. In this study we use a Go-Nogo design to examine whether zebra finches can use transitional cues to distinguish artificially constructed strings of song elements. Zebra finches were trained with sequences differing in transitional and positional information and next tested with novel strings sharing positional and transitional similarities with the training strings. The results show that they can attend to both transitional and positional cues and that their sequential coding strategies can be biased toward transitional cues depending on the learning context. This article is part of a Special Issue entitled: In Honor of Jerry Hogan.

  17. Behavioral Cues of Interpersonal Warmth

    Science.gov (United States)

    Bayes, Marjorie A.

    1972-01-01

    The results of this study suggest, first, that interpersonal warmth does seem to be a personality dimension which can be reliably judged and, second, that it was possible to define and demonstrate the relevance of a number of behavioral cues for warmth. (Author)

  18. Optimal assessment of multiple cues

    NARCIS (Netherlands)

    Fawcett, TW; Johnstone, RA

    2003-01-01

    In a wide range of contexts from mate choice to foraging, animals are required to discriminate between alternative options on the basis of multiple cues. How should they best assess such complex multicomponent stimuli? Here, we construct a model to investigate this problem, focusing on a simple case

  19. Book review: Speech and harm: controversies over free speech

    OpenAIRE

    Zarali, Kally

    2013-01-01

    Most liberal societies are deeply committed to the principle of free speech. At the same time, however, there is evidence that some kinds of speech are harmful in ways that are detrimental to important liberal values such as social equality. Might a genuine commitment to free speech require that we legally permit speech even when it is harmful, and even when doing so is in conflict with our commitment to values like equality? Kally Zarali regards Speech & Harm as a valuable gui...

  20. Speech synthesis : Developing a web application implementing speech technology

    OpenAIRE

    Gebremariam, Gudeta

    2016-01-01

    Speech is a natural media of communication for humans. Text-to-speech (TTS) technology uses a computer to synthesize speech. There are three main techniques of TTS synthesis. These are formant-based, articulatory and concatenative. The application areas of TTS include accessibility, education, entertainment and communication aid in mass transit. A web application was developed to demonstrate the application of speech synthesis technology. Existing speech synthesis engines for the Finnish ...

  1. Speech Acts In President Barack Obama Victory Speech 2012

    OpenAIRE

    Januarini, Erna

    2016-01-01

    In the thesis, entitled Speech Acts In President Barack Obama's Victory Speech 2012. The author analyzes the illocutionary acts and direct and indirect speech acts and by Barack Obama as a speaker based on representative, directive, expressive, commissive, and declaration. The purpose of this thesis is to find the types of illocutionary acts and direct and indirect speech acts and in Barack Obama's victory speech 2012. In writing this thesis, the author uses a qualitative method from Huberman...

  2. Environmental Contamination of Normal Speech.

    Science.gov (United States)

    Harley, Trevor A.

    1990-01-01

    Environmentally contaminated speech errors (irrelevant words or phrases derived from the speaker's environment and erroneously incorporated into speech) are hypothesized to occur at a high level of speech processing, but with a relatively late insertion point. The data indicate that speech production processes are not independent of other…

  3. Discovering words in fluent speech: the contribution of two kinds of statistical information.

    Science.gov (United States)

    Thiessen, Erik D; Erickson, Lucy C

    2012-01-01

    To efficiently segment fluent speech, infants must discover the predominant phonological form of words in the native language. In English, for example, content words typically begin with a stressed syllable. To discover this regularity, infants need to identify a set of words. We propose that statistical learning plays two roles in this process. First, it provides a cue that allows infants to segment words from fluent speech, even without language-specific phonological knowledge. Second, once infants have identified a set of lexical forms, they can learn from the distribution of acoustic features across those word forms. The current experiments demonstrate both processes are available to 5-month-old infants. This demonstration of sensitivity to statistical structure in speech, weighted more heavily than phonological cues to segmentation at an early age, is consistent with theoretical accounts that claim statistical learning plays a role in helping infants to adapt to the structure of their native language from very early in life. PMID:23335903

  4. Application and design of audio-visual aids stomatology teaching in orthodontic non-stomatology students%非口腔医学专业医学生口腔正畸学教学中“口腔直观教学法”的设计与应用

    Institute of Scientific and Technical Information of China (English)

    李若萱; 吕亚林; 王晓庚

    2012-01-01

    Objective This study is to discuss the effects of audio- visual aids stomatology teaching in undergraduate orthodontic training for students majoring in preventive medicine in two credit hours.Methods We selected 85 students from the 2007 and 2008 matriculating classes of the preventive medicine department of Capital Medical University.Using the eight-year orthodontic textbook as our reference,we taught the theory through the multimedia pathway in the first class hour,and implemented teaching by playing situation in the trainee class hour.A follow-up survey was carried out to obtain students' feedback on the combined teaching method.Results Our survey showed that the majority of students realized the goal of using the method and believed their interest in learning orthodontics was significantly enhanced.In fact,they became fascinated by orthodontics in the limited time of the study.Conclusions We concluded that the integration of object teaching combination with situational teaching is of great assistance to orthodontic training; however,the integration must be carefully prepared to ensure student participation,maximize the benefits of integration and improve the course from direct feedback.%目的 在2学时的非口腔医学专业本科学生口腔正畸学教学中设计并实施“口腔直观教学法”,并评价其教学效果.方法 以首都医科大学2007级和2008级预防医学专业85名学生作为研究对象,以八年制口腔正畸学教科书为教材,1学时理论教学采用多媒体形式,1学时见习教学采用情景扮演方式.教学结束后,采用理论考核和问卷调查方式评价教学效果,分析学生对“口腔直观教学法”的反馈评价.结果 学生对口腔正畸学理论知识掌握较好,大部分学生能够明确教学目的.学生认为“口腔直观教学法”增强了对学习口腔正畸学的兴趣,在极其有限的时间内,对口腔正畸学留下了深刻印象.结论 “口腔直观教学法”适合

  5. The Rhetoric in English Speech

    Institute of Scientific and Technical Information of China (English)

    马鑫

    2014-01-01

    English speech has a very long history and always attached importance of people highly. People usually give a speech in economic activities, political forums and academic reports to express their opinions to investigate or persuade others. English speech plays a rather important role in English literature. The distinct theme of speech should attribute to the rhetoric. It discusses parallelism, repetition and rhetorical question in English speech, aiming to help people appreciate better the charm of them.

  6. Cues for localization in the horizontal plane

    DEFF Research Database (Denmark)

    Jeppesen, Jakob; Møller, Henrik

    2005-01-01

    manipulated in HRTFs used for binaural synthesis of sound in the horizontal plane. The manipulation of cues resulted in HRTFs with cues ranging from correct combinations of spectral information and ITDs to combinations with severely conflicting cues. Both the ITD and the spectral information seem to be...

  7. Fragrances as Cues for Remembering Words

    Science.gov (United States)

    Eich, James Eric

    1978-01-01

    Results of this experiment suggest that specific encoding of a word is not a necessary condition for cue effectiveness. Results imply that the effect of a nominal fragrance cue arises through the mediation of a functional, implicitly generated semantic cue. (Author/SW)

  8. Cue salience influences the use of height cues in reorientation in pigeons (Columba livia).

    Science.gov (United States)

    Du, Yu; Mahdi, Nuha; Paul, Breanne; Spetch, Marcia L

    2016-07-01

    Although orienting ability has been examined with numerous types of cues, most research has focused only on cues from the horizontal plane. The current study investigated pigeons' use of wall height, a vertical cue, in an open-field task and compared it with their use of horizontal cues. Pigeons were trained to locate food in 2 diagonal corners of a rectangular enclosure with 2 opposite high walls as height cues. Before each trial, pigeons were rotated to disorient them. In training, pigeons could use either the horizontal cues from the rectangular enclosure or the height information from the walls to locate the food. In testing, the apparatus was modified to provide (a) horizontal cues only, (b) height cues only, and (c) both height and horizontal cues in conflict. In Experiment 1 the lower and high walls, respectively, were 40 and 80 cm, whereas in Experiment 2 they were made more perceptually salient by shortening them to 20 and 40 cm. Pigeons accurately located the goal corners with horizontal cues alone in both experiments, but they searched accurately with height cues alone only in Experiment 2. When the height cues conflicted with horizontal cues, pigeons preferred the horizontal cues over the height cues in Experiment 1 but not in Experiment 2, suggesting that perceptual salience influences the relative weighting of cues. (PsycINFO Database Record PMID:27379717

  9. Cue salience influences the use of height cues in reorientation in pigeons (Columba livia).

    Science.gov (United States)

    Du, Yu; Mahdi, Nuha; Paul, Breanne; Spetch, Marcia L

    2016-07-01

    Although orienting ability has been examined with numerous types of cues, most research has focused only on cues from the horizontal plane. The current study investigated pigeons' use of wall height, a vertical cue, in an open-field task and compared it with their use of horizontal cues. Pigeons were trained to locate food in 2 diagonal corners of a rectangular enclosure with 2 opposite high walls as height cues. Before each trial, pigeons were rotated to disorient them. In training, pigeons could use either the horizontal cues from the rectangular enclosure or the height information from the walls to locate the food. In testing, the apparatus was modified to provide (a) horizontal cues only, (b) height cues only, and (c) both height and horizontal cues in conflict. In Experiment 1 the lower and high walls, respectively, were 40 and 80 cm, whereas in Experiment 2 they were made more perceptually salient by shortening them to 20 and 40 cm. Pigeons accurately located the goal corners with horizontal cues alone in both experiments, but they searched accurately with height cues alone only in Experiment 2. When the height cues conflicted with horizontal cues, pigeons preferred the horizontal cues over the height cues in Experiment 1 but not in Experiment 2, suggesting that perceptual salience influences the relative weighting of cues. (PsycINFO Database Record

  10. An Exploration of Rhythmic Grouping of Speech Sequences by French- and German-Learning Infants

    Science.gov (United States)

    Abboub, Nawal; Boll-Avetisyan, Natalie; Bhatara, Anjali; Höhle, Barbara; Nazzi, Thierry

    2016-01-01

    Rhythm in music and speech can be characterized by a constellation of several acoustic cues. Individually, these cues have different effects on rhythmic perception: sequences of sounds alternating in duration are perceived as short-long pairs (weak-strong/iambic pattern), whereas sequences of sounds alternating in intensity or pitch are perceived as loud-soft, or high-low pairs (strong-weak/trochaic pattern). This perceptual bias—called the Iambic-Trochaic Law (ITL)–has been claimed to be an universal property of the auditory system applying in both the music and the language domains. Recent studies have shown that language experience can modulate the effects of the ITL on rhythmic perception of both speech and non-speech sequences in adults, and of non-speech sequences in 7.5-month-old infants. The goal of the present study was to explore whether language experience also modulates infants’ grouping of speech. To do so, we presented sequences of syllables to monolingual French- and German-learning 7.5-month-olds. Using the Headturn Preference Procedure (HPP), we examined whether they were able to perceive a rhythmic structure in sequences of syllables that alternated in duration, pitch, or intensity. Our findings show that both French- and German-learning infants perceived a rhythmic structure when it was cued by duration or pitch but not intensity. Our findings also show differences in how these infants use duration and pitch cues to group syllable sequences, suggesting that pitch cues were the easier ones to use. Moreover, performance did not differ across languages, failing to reveal early language effects on rhythmic perception. These results contribute to our understanding of the origin of rhythmic perception and perceptual mechanisms shared across music and speech, which may bootstrap language acquisition. PMID:27378887

  11. An Exploration of Rhythmic Grouping of Speech Sequences by French- and German-Learning Infants.

    Science.gov (United States)

    Abboub, Nawal; Boll-Avetisyan, Natalie; Bhatara, Anjali; Höhle, Barbara; Nazzi, Thierry

    2016-01-01

    Rhythm in music and speech can be characterized by a constellation of several acoustic cues. Individually, these cues have different effects on rhythmic perception: sequences of sounds alternating in duration are perceived as short-long pairs (weak-strong/iambic pattern), whereas sequences of sounds alternating in intensity or pitch are perceived as loud-soft, or high-low pairs (strong-weak/trochaic pattern). This perceptual bias-called the Iambic-Trochaic Law (ITL)-has been claimed to be an universal property of the auditory system applying in both the music and the language domains. Recent studies have shown that language experience can modulate the effects of the ITL on rhythmic perception of both speech and non-speech sequences in adults, and of non-speech sequences in 7.5-month-old infants. The goal of the present study was to explore whether language experience also modulates infants' grouping of speech. To do so, we presented sequences of syllables to monolingual French- and German-learning 7.5-month-olds. Using the Headturn Preference Procedure (HPP), we examined whether they were able to perceive a rhythmic structure in sequences of syllables that alternated in duration, pitch, or intensity. Our findings show that both French- and German-learning infants perceived a rhythmic structure when it was cued by duration or pitch but not intensity. Our findings also show differences in how these infants use duration and pitch cues to group syllable sequences, suggesting that pitch cues were the easier ones to use. Moreover, performance did not differ across languages, failing to reveal early language effects on rhythmic perception. These results contribute to our understanding of the origin of rhythmic perception and perceptual mechanisms shared across music and speech, which may bootstrap language acquisition. PMID:27378887

  12. Speech impairment (adult)

    Science.gov (United States)

    ... brain tumors or degenerative diseases that affect the language areas of the brain. This term does not apply to children who ... gradually, but anyone can develop a speech and language impairment ... Brain tumor (more common in aphasia than dysarthria) Dementia ...

  13. Speech and Swallowing

    Science.gov (United States)

    ... Español In Your Area NPF Shop Speech and Swallowing Problems Make Text Smaller Make Text Larger You ... How do I know if I have a swallowing problem? I have recently lost weight without trying. ...

  14. Speech disorders - children

    Science.gov (United States)

    ... this page: //medlineplus.gov/ency/article/001430.htm Speech disorders - children To use the sharing features on ... PA: Elsevier Saunders; 2011:chap 32. Read More Autism spectrum disorder Cerebral palsy Hearing loss Intellectual disability ...

  15. Audiovisual integration for speech during mid-childhood: electrophysiological evidence.

    Science.gov (United States)

    Kaganovich, Natalya; Schumaker, Jennifer

    2014-12-01

    Previous studies have demonstrated that the presence of visual speech cues reduces the amplitude and latency of the N1 and P2 event-related potential (ERP) components elicited by speech stimuli. However, the developmental trajectory of this effect is not yet fully mapped. We examined ERP responses to auditory, visual, and audiovisual speech in two groups of school-age children (7-8-year-olds and 10-11-year-olds) and in adults. Audiovisual speech led to the attenuation of the N1 and P2 components in all groups of participants, suggesting that the neural mechanisms underlying these effects are functional by early school years. Additionally, while the reduction in N1 was largest over the right scalp, the P2 attenuation was largest over the left and midline scalp. The difference in the hemispheric distribution of the N1 and P2 attenuation supports the idea that these components index at least somewhat disparate neural processes within the context of audiovisual speech perception.

  16. Visual cues for data mining

    Science.gov (United States)

    Rogowitz, Bernice E.; Rabenhorst, David A.; Gerth, John A.; Kalin, Edward B.

    1996-04-01

    This paper describes a set of visual techniques, based on principles of human perception and cognition, which can help users analyze and develop intuitions about tabular data. Collections of tabular data are widely available, including, for example, multivariate time series data, customer satisfaction data, stock market performance data, multivariate profiles of companies and individuals, and scientific measurements. In our approach, we show how visual cues can help users perform a number of data mining tasks, including identifying correlations and interaction effects, finding clusters and understanding the semantics of cluster membership, identifying anomalies and outliers, and discovering multivariate relationships among variables. These cues are derived from psychological studies on perceptual organization, visual search, perceptual scaling, and color perception. These visual techniques are presented as a complement to the statistical and algorithmic methods more commonly associated with these tasks, and provide an interactive interface for the human analyst.

  17. Predicting speech intelligibility in conditions with nonlinearly processed noisy speech

    DEFF Research Database (Denmark)

    Jørgensen, Søren; Dau, Torsten

    2013-01-01

    The speech-based envelope power spectrum model (sEPSM; [1]) was proposed in order to overcome the limitations of the classical speech transmission index (STI) and speech intelligibility index (SII). The sEPSM applies the signal-tonoise ratio in the envelope domain (SNRenv), which was demonstrated...... to successfully predict speech intelligibility in conditions with nonlinearly processed noisy speech, such as processing with spectral subtraction. Moreover, a multiresolution version (mr-sEPSM) was demonstrated to account for speech intelligibility in various conditions with stationary and fluctuating...

  18. SPEECH DISORDERS ENCOUNTERED DURING SPEECH THERAPY AND THERAPY TECHNIQUES

    Directory of Open Access Journals (Sweden)

    İlhan ERDEM

    2013-06-01

    Full Text Available Speech which is a physical and mental process, agreed signs and sounds to create a sense of mind to the message that change . Process to identify the sounds of speech it is essential to know the structure and function of various organs which allows to happen the conversation. Speech is a physical and mental process so many factors can lead to speech disorders. Speech disorder can be about language acquisitions as well as it can be caused medical and psychological many factors. Disordered speech, language, medical and psychological conditions as well as acquisitions also be caused by many factors. Speaking, is the collective work of many organs, such as an orchestra. Mental dimension of the speech disorder which is a very complex skill so it must be found which of these obstacles inhibit conversation. Speech disorder is a defect in speech flow, rhythm, tizliğinde, beats, the composition and vocalization. In this study, speech disorders such as articulation disorders, stuttering, aphasia, dysarthria, a local dialect speech, , language and lip-laziness, rapid speech peech defects in a term of language skills. This causes of speech disorders were investigated and presented suggestions for remedy was discussed.

  19. The effects of auditory and visual vowel training on speech reading performance

    Science.gov (United States)

    Richie, Carolyn; Kewley-Port, Diane

    2003-10-01

    Speech reading, the use of visual cues to understand speech, may provide a substantial benefit for normal-hearing listeners in noisy environments and for hearing-impaired listeners in everyday communication. However, there exists great individual variability in speech reading ability, and studies have shown that only a modest improvement in speech reading ability is achieved with training. The purpose of this investigation was to determine the effects of a novel approach to speech reading training on word and sentence identification tasks. In contrast to previous research, which involved training on consonant recognition, this study focused on vowels. Two groups of normal-hearing adults participated in auditory-visual (AV) conditions with added background noise. The first group of listeners received training on the recognition of 14 English vowels in isolated words, while the second group of listeners received no training. All listeners performed speech reading pre- and post-tests, on words and sentences. Results are discussed in terms of differences between groups, dependent upon whether training was administered, and a comparison is made between this and other speech reading training methods. Finally, the potential benefit of this vowel-based speech reading training method for the rehabilitation for hearing-impaired listeners is discussed. [Work supported by NIHDCD-02229.

  20. Level variations in speech: Effect on masking release in hearing-impaired listeners.

    Science.gov (United States)

    Reed, Charlotte M; Desloge, Joseph G; Braida, Louis D; Perez, Zachary D; Léger, Agnès C

    2016-07-01

    Acoustic speech is marked by time-varying changes in the amplitude envelope that may pose difficulties for hearing-impaired listeners. Removal of these variations (e.g., by the Hilbert transform) could improve speech reception for such listeners, particularly in fluctuating interference. Léger, Reed, Desloge, Swaminathan, and Braida [(2015b). J. Acoust. Soc. Am. 138, 389-403] observed that a normalized measure of masking release obtained for hearing-impaired listeners using speech processed to preserve temporal fine-structure (TFS) cues was larger than that for unprocessed or envelope-based speech. This study measured masking release for two other speech signals in which level variations were minimal: peak clipping and TFS processing of an envelope signal. Consonant identification was measured for hearing-impaired listeners in backgrounds of continuous and fluctuating speech-shaped noise. The normalized masking release obtained using speech with normal variations in overall level was substantially less than that observed using speech processed to achieve highly restricted level variations. These results suggest that the performance of hearing-impaired listeners in fluctuating noise may be improved by signal processing that leads to a decrease in stimulus level variations. PMID:27475136

  1. Practical speech user interface design

    CERN Document Server

    Lewis, James R

    2010-01-01

    Although speech is the most natural form of communication between humans, most people find using speech to communicate with machines anything but natural. Drawing from psychology, human-computer interaction, linguistics, and communication theory, Practical Speech User Interface Design provides a comprehensive yet concise survey of practical speech user interface (SUI) design. It offers practice-based and research-based guidance on how to design effective, efficient, and pleasant speech applications that people can really use. Focusing on the design of speech user interfaces for IVR application

  2. Quit interest influences smoking cue-reactivity.

    Science.gov (United States)

    Veilleux, Jennifer C; Skinner, Kayla D; Pollert, Garrett A

    2016-12-01

    Interest in quitting smoking is important to model in cue-reactivity studies, because the craving elicited by cue exposure likely requires different self-regulation efforts for smokers who are interested in quitting compared to those without any quit interest. The objective of the current study was to evaluate the role of quit interest in how cigarette cue exposure influences self-control efforts. Smokers interested in quitting (n=37) and smokers with no interest in quitting (n=53) were randomly assigned to a cigarette or neutral cue exposure task. Following the cue exposure, all participants completed two self-control tasks, a measure of risky gambling (the Iowa Gambling Task) and a cold pressor tolerance task. Results indicated that smokers interested in quitting had worse performance on the gambling task when exposed to a cigarette cue compared to neutral cue exposure. We also found that people interested in quitting tolerated the cold pressor task for a shorter amount of time than people not interested in quitting. Finally, we found that for people interested in quitting, exposure to a cigarette cue was associated with increased motivation to take steps toward decreasing use. Overall these results suggest that including quit interest in studies of cue reactivity is valuable, as quit interest influenced smoking cue-reactivity responses. PMID:27487082

  3. Prosodic cues to word order: what level of representation?

    Directory of Open Access Journals (Sweden)

    Carline eBernard

    2012-10-01

    Full Text Available Within language, systematic correlations exist between syntactic structure and prosody. Prosodic prominence, for instance, falls on the complement and not the head of syntactic phrases, and its realization depends on the phrasal position of the prominent element. Thus, in Japanese, a functor-final language, prominence is phrase-initial and realized as increased pitch (^Tōkyō ni ‘Tokyo to’, whereas in French, English or Italian, functor-initial languages, it manifests itself as phrase-final lengthening (to Rome. Prosody is readily available in the linguistic signal even to the youngest infants. It has, therefore, been proposed that young learners might be able to exploit its correlations with syntax to bootstrap language structure. In this study, we tested this hypothesis, investigating how 8-month-old monolingual French infants processed an artificial grammar manipulating the relative position of prosodic prominence and word frequency. In Condition 1, we created a speech stream in which the two cues, prosody and frequency, were aligned, frequent words being prosodically non-prominent and infrequent ones being prominent, as is the case in natural language (functors are prosodically minimal compared to content words. In Condition 2, the two cues were misaligned, with frequent words carrying prosodic prominence, unlike in natural language. After familiarization with the aligned or the misaligned stream in a headturn preference procedure, we tested infants’ preference for test items having a frequent word initial or a frequent word final word order. We found that infants’ familiarized with the aligned stream showed the expected preference for the frequent word initial test items, mimicking the functor-initial word order of French. Infants in the misaligned condition showed no preference. These results suggest that infants are able to use word frequency and prosody as early cues to word order and they integrate them into a coherent

  4. Robust Speech/Non-Speech Classification in Heterogeneous Multimedia Content

    NARCIS (Netherlands)

    Huijbregts, Marijn; Jong, de Franciska

    2011-01-01

    In this paper we present a speech/non-speech classification method that allows high quality classification without the need to know in advance what kinds of audible non-speech events are present in an audio recording and that does not require a single parameter to be tuned on in-domain data. Because

  5. The influence of masker type on early reflection processing and speech intelligibility (L)

    DEFF Research Database (Denmark)

    Arweiler, Iris; Buchholz, Jörg M.; Dau, Torsten

    2013-01-01

    listening did not provide a benefit from ERs apart from a binaural energy summation, such that monaural auditory processing could account for the data. However, a diffuse speech shaped noise (SSN) was used in the speech intelligibility experiments, which does not provide distinct binaural cues to the...... auditory system. In the present study, the monaural and binaural benefit from ERs for speech intelligibility was investigated using three directional maskers presented from 90° azimuth: a SSN, a multi-talker babble, and a reversed two-talker masker. For normal-hearing as well as hearing-impaired listeners......Arweiler and Buchholz [J. Acoust. Soc. Am. 130, 996-1005 (2011)] showed that, while the energy of early reflections (ERs) in a room improves speech intelligibility, the benefit is smaller than that provided by the energy of the direct sound (DS). In terms of integration of ERs and DS, binaural...

  6. Denial Denied: Freedom of Speech

    Directory of Open Access Journals (Sweden)

    Glen Newey

    2009-12-01

    Full Text Available Free speech is a widely held principle. This is in some ways surprising, since formal and informal censorship of speech is widespread, and rather different issues seem to arise depending on whether the censorship concerns who speaks, what content is spoken or how it is spoken. I argue that despite these facts, free speech can indeed be seen as a unitary principle. On my analysis, the core of the free speech principle is the denial of the denial of speech, whether to a speaker, to a proposition, or to a mode of expression. Underlying free speech is the principle of freedom of association, according to which speech is both a precondition of future association (e.g. as a medium for negotiation and a mode of association in its own right. I conclude by applying this account briefly to two contentious issues: hate speech and pornography.

  7. Explicit authenticity and stimulus features interact to modulate BOLD response induced by emotional speech.

    Science.gov (United States)

    Drolet, Matthis; Schubotz, Ricarda I; Fischer, Julia

    2013-06-01

    Context has been found to have a profound effect on the recognition of social stimuli and correlated brain activation. The present study was designed to determine whether knowledge about emotional authenticity influences emotion recognition expressed through speech intonation. Participants classified emotionally expressive speech in an fMRI experimental design as sad, happy, angry, or fearful. For some trials, stimuli were cued as either authentic or play-acted in order to manipulate participant top-down belief about authenticity, and these labels were presented both congruently and incongruently to the emotional authenticity of the stimulus. Contrasting authentic versus play-acted stimuli during uncued trials indicated that play-acted stimuli spontaneously up-regulate activity in the auditory cortex and regions associated with emotional speech processing. In addition, a clear interaction effect of cue and stimulus authenticity showed up-regulation in the posterior superior temporal sulcus and the anterior cingulate cortex, indicating that cueing had an impact on the perception of authenticity. In particular, when a cue indicating an authentic stimulus was followed by a play-acted stimulus, additional activation occurred in the temporoparietal junction, probably pointing to increased load on perspective taking in such trials. While actual authenticity has a significant impact on brain activation, individual belief about stimulus authenticity can additionally modulate the brain response to differences in emotionally expressive speech.

  8. The minor third communicates sadness in speech, mirroring its use in music.

    Science.gov (United States)

    Curtis, Meagan E; Bharucha, Jamshed J

    2010-06-01

    There is a long history of attempts to explain why music is perceived as expressing emotion. The relationship between pitches serves as an important cue for conveying emotion in music. The musical interval referred to as the minor third is generally thought to convey sadness. We reveal that the minor third also occurs in the pitch contour of speech conveying sadness. Bisyllabic speech samples conveying four emotions were recorded by 9 actresses. Acoustic analyses revealed that the relationship between the 2 salient pitches of the sad speech samples tended to approximate a minor third. Participants rated the speech samples for perceived emotion, and the use of numerous acoustic parameters as cues for emotional identification was modeled using regression analysis. The minor third was the most reliable cue for identifying sadness. Additional participants rated musical intervals for emotion, and their ratings verified the historical association between the musical minor third and sadness. These findings support the theory that human vocal expressions and music share an acoustic code for communicating sadness.

  9. Speech spectrogram expert

    Energy Technology Data Exchange (ETDEWEB)

    Johannsen, J.; Macallister, J.; Michalek, T.; Ross, S.

    1983-01-01

    Various authors have pointed out that humans can become quite adept at deriving phonetic transcriptions from speech spectrograms (as good as 90percent accuracy at the phoneme level). The authors describe an expert system which attempts to simulate this performance. The speech spectrogram expert (spex) is actually a society made up of three experts: a 2-dimensional vision expert, an acoustic-phonetic expert, and a phonetics expert. The visual reasoning expert finds important visual features of the spectrogram. The acoustic-phonetic expert reasons about how visual features relates to phonemes, and about how phonemes change visually in different contexts. The phonetics expert reasons about allowable phoneme sequences and transformations, and deduces an english spelling for phoneme strings. The speech spectrogram expert is highly interactive, allowing users to investigate hypotheses and edit rules. 10 references.

  10. Punctuation in Quoted Speech

    CERN Document Server

    Doran, C F

    1996-01-01

    Quoted speech is often set off by punctuation marks, in particular quotation marks. Thus, it might seem that the quotation marks would be extremely useful in identifying these structures in texts. Unfortunately, the situation is not quite so clear. In this work, I will argue that quotation marks are not adequate for either identifying or constraining the syntax of quoted speech. More useful information comes from the presence of a quoting verb, which is either a verb of saying or a punctual verb, and the presence of other punctuation marks, usually commas. Using a lexicalized grammar, we can license most quoting clauses as text adjuncts. A distinction will be made not between direct and indirect quoted speech, but rather between adjunct and non-adjunct quoting clauses.

  11. Protection limits on free speech

    Institute of Scientific and Technical Information of China (English)

    李敏

    2014-01-01

    Freedom of speech is one of the basic rights of citizens should receive broad protection, but in the real context of China under what kind of speech can be protected and be restricted, how to grasp between state power and free speech limit is a question worth considering. People tend to ignore the freedom of speech and its function, so that some of the rhetoric cannot be demonstrated in the open debates.

  12. Improved perception of speech in noise and Mandarin tones with acoustic simulations of harmonic coding for cochlear implants.

    Science.gov (United States)

    Li, Xing; Nie, Kaibao; Imennov, Nikita S; Won, Jong Ho; Drennan, Ward R; Rubinstein, Jay T; Atlas, Les E

    2012-11-01

    Harmonic and temporal fine structure (TFS) information are important cues for speech perception in noise and music perception. However, due to the inherently coarse spectral and temporal resolution in electric hearing, the question of how to deliver harmonic and TFS information to cochlear implant (CI) users remains unresolved. A harmonic-single-sideband-encoder [(HSSE); Nie et al. (2008). Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing; Lie et al., (2010). Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing] strategy has been proposed that explicitly tracks the harmonics in speech and transforms them into modulators conveying both amplitude modulation and fundamental frequency information. For unvoiced speech, HSSE transforms the TFS into a slowly varying yet still noise-like signal. To investigate its potential, four- and eight-channel vocoder simulations of HSSE and the continuous-interleaved-sampling (CIS) strategy were implemented, respectively. Using these vocoders, five normal-hearing subjects' speech recognition performance was evaluated under different masking conditions; another five normal-hearing subjects' Mandarin tone identification performance was also evaluated. Additionally, the neural discharge patterns evoked by HSSE- and CIS-encoded Mandarin tone stimuli were simulated using an auditory nerve model. All subjects scored significantly higher with HSSE than with CIS vocoders. The modeling analysis demonstrated that HSSE can convey temporal pitch cues better than CIS. Overall, the results suggest that HSSE is a promising strategy to enhance speech perception with CIs. PMID:23145619

  13. Seeing the talker’s face supports executive processing of speech in steady state noise

    Directory of Open Access Journals (Sweden)

    Sushmit eMishra

    2013-11-01

    Full Text Available Listening to speech in noise depletes cognitive resources, affecting speech processing. The present study investigated how remaining resources or cognitive spare capacity (CSC can be deployed by young adults with normal hearing. We administered a test of CSC (CSCT, Mishra et al., 2013 along with a battery of established cognitive tests to 20 participants with normal hearing. In the CSCT, lists of two-digit numbers were presented with and without visual cues in quiet, as well as in steady-state and speech-like noise at a high intelligibility level. In low load conditions, two numbers were recalled according to instructions inducing executive processing (updating, inhibition and in high load conditions the participants were additionally instructed to recall one extra number, which was the always the first item in the list. In line with previous findings, results showed that CSC was sensitive to memory load and executive function but generally not related to working memory capacity. Furthermore, CSCT scores in quiet were lowered by visual cues, probably due to distraction. In steady-state noise, the presence of visual cues improved CSCT scores, probably by enabling better encoding. Contrary to our expectation, CSCT performance was disrupted more in steady-state than speech-like noise, although only without visual cues, possibly because selective attention could be used to ignore the speech-like background and provide an enriched representation of target items in working memory similar to that obtained in quiet. This interpretation is supported by a consistent association between CSCT scores and updating skills.

  14. Abortion and compelled physician speech.

    Science.gov (United States)

    Orentlicher, David

    2015-01-01

    Informed consent mandates for abortion providers may infringe the First Amendment's freedom of speech. On the other hand, they may reinforce the physician's duty to obtain informed consent. Courts can promote both doctrines by ensuring that compelled physician speech pertains to medical facts about abortion rather than abortion ideology and that compelled speech is truthful and not misleading.

  15. The University and Free Speech

    OpenAIRE

    Grcic, Joseph

    2014-01-01

    Free speech is a necessary condition for the growth of knowledge and the implementation of real and rational democracy. Educational institutions play a central role in socializing individuals to function within their society. Academic freedom is the right to free speech in the context of the university and tenure, properly interpreted, is a necessary component of protecting academic freedom and free speech.

  16. Abortion and compelled physician speech.

    Science.gov (United States)

    Orentlicher, David

    2015-01-01

    Informed consent mandates for abortion providers may infringe the First Amendment's freedom of speech. On the other hand, they may reinforce the physician's duty to obtain informed consent. Courts can promote both doctrines by ensuring that compelled physician speech pertains to medical facts about abortion rather than abortion ideology and that compelled speech is truthful and not misleading. PMID:25846035

  17. Gradient sensitivity to acoustic detail and temporal integration of phonetic cues

    Science.gov (United States)

    McMurray, Bob; Clayards, Meghan A.; Aslin, Richard N.; Tanenhaus, Michael K.

    2001-05-01

    Speech contains systematic covariation at the subphonemic level that could be used to integrate information over time (McMurray et al., 2003; Gow, 2001). Previous research has established sensitivity to this variation: activation for lexical competitors is sensitive to within-category variation in voice-onset-time (McMurray et al., 2002). This study extends this investigation to other subphonemic speech cues by examining formant transitions (r/l and d/g), formant slope (b/w) and VOT (b/p) in an eye-tracking paradigm similar to McMurray et al. (2002). Vowel length was also varied to examine temporal organization (e.g., VOT precedes the vowel). Subjects heard a token from each continua and selected the target from a screen containing pictures of the target, competitor and unrelated items. Fixations to the competitor increased with distance from the boundary along each of the speech continua. Unlike prior work, there was also an effect on fixations to the target. There was no effect of vowel length on the d/g or r/l continua, but rate dependent continua (b/w and b/p) showed length effects. Importantly, the temporal order of cues was reflected in the pattern of looks to competitors, providing an important window into the processes by which acoustic detail is temporally integrated.

  18. Synchronization by the hand: The sight of gestures modulates low-frequency activity in brain responses to continuous speech

    Directory of Open Access Journals (Sweden)

    Emmanuel eBiau

    2015-09-01

    Full Text Available During social interactions, speakers often produce spontaneous gestures to accompany their speech. These coordinated body movements convey communicative intentions, and modulate how listeners perceive the message in a subtle, but important way. In the present perspective, we put the focus on the role that congruent non-verbal information from beat gestures may play in the neural responses to speech. Whilst delta-theta oscillatory brain responses reflect the time-frequency structure of the speech signal, we argue that beat gestures promote phase resetting at relevant word onsets. This mechanism may facilitate the anticipation of associated acoustic cues relevant for prosodic/syllabic-based segmentation in speech perception. We report recently published data supporting this hypothesis, and discuss the potential of beats (and gestures in general for further studies investigating continuous AV speech processing through low-frequency oscillations.

  19. Synchronization by the hand: the sight of gestures modulates low-frequency activity in brain responses to continuous speech.

    Science.gov (United States)

    Biau, Emmanuel; Soto-Faraco, Salvador

    2015-01-01

    During social interactions, speakers often produce spontaneous gestures to accompany their speech. These coordinated body movements convey communicative intentions, and modulate how listeners perceive the message in a subtle, but important way. In the present perspective, we put the focus on the role that congruent non-verbal information from beat gestures may play in the neural responses to speech. Whilst delta-theta oscillatory brain responses reflect the time-frequency structure of the speech signal, we argue that beat gestures promote phase resetting at relevant word onsets. This mechanism may facilitate the anticipation of associated acoustic cues relevant for prosodic/syllabic-based segmentation in speech perception. We report recently published data supporting this hypothesis, and discuss the potential of beats (and gestures in general) for further studies investigating continuous AV speech processing through low-frequency oscillations. PMID:26441618

  20. The Practice and Reflection of "Project Teaching":Taking Electronic Audio-Visual Technology Major of Anhui Broadcasting Movie and Television College for an Example%“项目教学”的实践与思考——以安徽广播影视职业技术学院电子声像技术专业为例

    Institute of Scientific and Technical Information of China (English)

    孙博文

    2011-01-01

    Since 2009 when Electronic Audio-Visual Technology Major of Anhui Broadcasting Movie and Television College unfolded the workflow-based project teaching reform,the major has undertaken a lot of exploration and practice,together with the revise of personnel training scheme,the increasing of practice class rate.Besides,all core courses are demanded to compile workflow-based project teaching course syllabus.The teaching reform is mingled with achievements and,of course,some problems.In regard to these problems,the teaching and research section has undertaken a lot of teaching research,and proposed some feasible methods.%安徽广播影视职业技术学院电子声像专业自2009年开展以工作流程为导向的项目教学改革以来,该专业从教学内容到教学方法方面进行了大量的探索和实践,重新修订了人才培养方案,增大了实践课课时的比例,核心课程均要求编写以工作流程为导向的项目教学课程大纲。在教学改革中,既有成绩,也发现了一些问题。针对这些问题,教研室进行了大量的教研,提出了一些切实可行的方法。

  1. A measure for assessing the effects of audiovisual speech integration.

    Science.gov (United States)

    Altieri, Nicholas; Townsend, James T; Wenger, Michael J

    2014-06-01

    We propose a measure of audiovisual speech integration that takes into account accuracy and response times. This measure should prove beneficial for researchers investigating multisensory speech recognition, since it relates to normal-hearing and aging populations. As an example, age-related sensory decline influences both the rate at which one processes information and the ability to utilize cues from different sensory modalities. Our function assesses integration when both auditory and visual information are available, by comparing performance on these audiovisual trials with theoretical predictions for performance under the assumptions of parallel, independent self-terminating processing of single-modality inputs. We provide example data from an audiovisual identification experiment and discuss applications for measuring audiovisual integration skills across the life span.

  2. Visual cues for landmine detection

    Science.gov (United States)

    Staszewski, James J.; Davison, Alan D.; Tischuk, Julia A.; Dippel, David J.

    2007-04-01

    Can human vision supplement the information that handheld landmine detection equipment provides its operators to increase detection rates and reduce the hazard of the task? Contradictory viewpoints exist regarding the viability of visual detection of landmines. Assuming both positions are credible, this work aims to reconcile them by exploring the visual information produced by landmine burial and how any visible signatures change as a function of time in a natural environment. Its objective is to acquire objective, foundational knowledge on which training could be based and subsequently evaluated. A representative set of demilitarized landmines were buried at a field site with bare soil and vegetated surfaces using doctrinal procedures. High resolution photographs of the ground surface were taken for approximately one month starting in April 2006. Photos taken immediately after burial show clearly visible surface signatures. Their features change with time and weather exposure, but the patterns they define persist, as photos taken a month later show. An analysis exploiting the perceptual sensitivity of expert observers showed signature photos to domain experts with instructions to identify the cues and patterns that defined the signatures. Analysis of experts' verbal descriptions identified a small set of easily communicable cues that characterize signatures and their changes over the duration of observation. Findings suggest that visual detection training is viable and has potential to enhance detection capabilities. The photos and descriptions generated offer materials for designing such training and testing its utility. Plans for investigating the generality of the findings, especially potential limiting conditions, are discussed.

  3. Comparison of Speech Features on the Speech Recognition Task

    Directory of Open Access Journals (Sweden)

    Iosif Mporas

    2007-01-01

    Full Text Available In the present work we overview some recently proposed discrete Fourier transform (DFT- and discrete wavelet packet transform (DWPT-based speech parameterization methods and evaluate their performance on the speech recognition task. Specifically, in order to assess the practical value of these less studied speech parameterization methods, we evaluate them in a common experimental setup and compare their performance against traditional techniques, such as the Mel-frequency cepstral coefficients (MFCC and perceptual linear predictive (PLP cepstral coefficients which presently dominate the speech recognition field. In particular, utilizing the well established TIMIT speech corpus and employing the Sphinx-III speech recognizer, we present comparative results of 8 different speech parameterization techniques.

  4. Awareness of rhythm patterns in speech and music in children with specific language impairments

    Directory of Open Access Journals (Sweden)

    Ruth eCumming

    2015-12-01

    Full Text Available Children with specific language impairments (SLIs show impaired perception and production of language, and also show impairments in perceiving auditory cues to rhythm (amplitude rise time [ART] and sound duration and in tapping to a rhythmic beat. Here we explore potential links between language development and rhythm perception in 45 children with SLI and 50 age-matched controls. We administered three rhythmic tasks, a musical beat detection task, a tapping-to-music task, and a novel music/speech task, which varied rhythm and pitch cues independently or together in both speech and music. Via low-pass filtering, the music sounded as though it was played from a low-quality radio and the speech sounded as though it was muffled (heard behind the door. We report data for all of the SLI children (N = 45, IQ varying, as well as for two independent subgroupings with intact IQ. One subgroup, Pure SLI, had intact phonology and reading (N=16, the other, SLI PPR (N=15, had impaired phonology and reading. When IQ varied (all SLI children, we found significant group differences in all the rhythmic tasks. For the Pure SLI group, there were rhythmic impairments in the tapping task only. For children with SLI and poor phonology (SLI PPR, group differences were found in all of the filtered speech/music AXB tasks. We conclude that difficulties with rhythmic cues in both speech and music are present in children with SLIs, but that some rhythmic measures are more sensitive than others. The data are interpreted within a ‘prosodic phrasing’ hypothesis, and we discuss the potential utility of rhythmic and musical interventions in remediating speech and language difficulties in children.

  5. Awareness of Rhythm Patterns in Speech and Music in Children with Specific Language Impairments.

    Science.gov (United States)

    Cumming, Ruth; Wilson, Angela; Leong, Victoria; Colling, Lincoln J; Goswami, Usha

    2015-01-01

    Children with specific language impairments (SLIs) show impaired perception and production of language, and also show impairments in perceiving auditory cues to rhythm [amplitude rise time (ART) and sound duration] and in tapping to a rhythmic beat. Here we explore potential links between language development and rhythm perception in 45 children with SLI and 50 age-matched controls. We administered three rhythmic tasks, a musical beat detection task, a tapping-to-music task, and a novel music/speech task, which varied rhythm and pitch cues independently or together in both speech and music. Via low-pass filtering, the music sounded as though it was played from a low-quality radio and the speech sounded as though it was muffled (heard "behind the door"). We report data for all of the SLI children (N = 45, IQ varying), as well as for two independent subgroupings with intact IQ. One subgroup, "Pure SLI," had intact phonology and reading (N = 16), the other, "SLI PPR" (N = 15), had impaired phonology and reading. When IQ varied (all SLI children), we found significant group differences in all the rhythmic tasks. For the Pure SLI group, there were rhythmic impairments in the tapping task only. For children with SLI and poor phonology (SLI PPR), group differences were found in all of the filtered speech/music AXB tasks. We conclude that difficulties with rhythmic cues in both speech and music are present in children with SLIs, but that some rhythmic measures are more sensitive than others. The data are interpreted within a "prosodic phrasing" hypothesis, and we discuss the potential utility of rhythmic and musical interventions in remediating speech and language difficulties in children. PMID:26733848

  6. Infants with Williams syndrome detect statistical regularities in continuous speech.

    Science.gov (United States)

    Cashon, Cara H; Ha, Oh-Ryeong; Graf Estes, Katharine; Saffran, Jenny R; Mervis, Carolyn B

    2016-09-01

    Williams syndrome (WS) is a rare genetic disorder associated with delays in language and cognitive development. The reasons for the language delay are unknown. Statistical learning is a domain-general mechanism recruited for early language acquisition. In the present study, we investigated whether infants with WS were able to detect the statistical structure in continuous speech. Eighteen 8- to 20-month-olds with WS were familiarized with 2min of a continuous stream of synthesized nonsense words; the statistical structure of the speech was the only cue to word boundaries. They were tested on their ability to discriminate statistically-defined "words" and "part-words" (which crossed word boundaries) in the artificial language. Despite significant cognitive and language delays, infants with WS were able to detect the statistical regularities in the speech stream. These findings suggest that an inability to track the statistical properties of speech is unlikely to be the primary basis for the delays in the onset of language observed in infants with WS. These results provide the first evidence of statistical learning by infants with developmental delays. PMID:27299804

  7. Learning to perceptually organize speech signals in native fashion.

    Science.gov (United States)

    Nittrouer, Susan; Lowenstein, Joanna H

    2010-03-01

    The ability to recognize speech involves sensory, perceptual, and cognitive processes. For much of the history of speech perception research, investigators have focused on the first and third of these, asking how much and what kinds of sensory information are used by normal and impaired listeners, as well as how effective amounts of that information are altered by "top-down" cognitive processes. This experiment focused on perceptual processes, asking what accounts for how the sensory information in the speech signal gets organized. Two types of speech signals processed to remove properties that could be considered traditional acoustic cues (amplitude envelopes and sine wave replicas) were presented to 100 listeners in five groups: native English-speaking (L1) adults, 7-, 5-, and 3-year-olds, and native Mandarin-speaking adults who were excellent second-language (L2) users of English. The L2 adults performed more poorly than L1 adults with both kinds of signals. Children performed more poorly than L1 adults but showed disproportionately better performance for the sine waves than for the amplitude envelopes compared to both groups of adults. Sentence context had similar effects across groups, so variability in recognition was attributed to differences in perceptual organization of the sensory information, presumed to arise from native language experience. PMID:20329861

  8. Denial Denied: Freedom of Speech

    OpenAIRE

    Glen Newey

    2009-01-01

    Free speech is a widely held principle. This is in some ways surprising, since formal and informal censorship of speech is widespread, and rather different issues seem to arise depending on whether the censorship concerns who speaks, what content is spoken or how it is spoken. I argue that despite these facts, free speech can indeed be seen as a unitary principle. On my analysis, the core of the free speech principle is the denial of the denial of speech, whether to a speaker, to a propositio...

  9. Hemispheric Asymmetry of Endogenous Neural Oscillations in Young Children: Implications for Hearing Speech In Noise.

    Science.gov (United States)

    Thompson, Elaine C; Woodruff Carr, Kali; White-Schwoch, Travis; Tierney, Adam; Nicol, Trent; Kraus, Nina

    2016-01-25

    Speech signals contain information in hierarchical time scales, ranging from short-duration (e.g., phonemes) to long-duration cues (e.g., syllables, prosody). A theoretical framework to understand how the brain processes this hierarchy suggests that hemispheric lateralization enables specialized tracking of acoustic cues at different time scales, with the left and right hemispheres sampling at short (25 ms; 40 Hz) and long (200 ms; 5 Hz) periods, respectively. In adults, both speech-evoked and endogenous cortical rhythms are asymmetrical: low-frequency rhythms predominate in right auditory cortex, and high-frequency rhythms in left auditory cortex. It is unknown, however, whether endogenous resting state oscillations are similarly lateralized in children. We investigated cortical oscillations in children (3-5 years; N = 65) at rest and tested our hypotheses that this temporal asymmetry is evident early in life and facilitates recognition of speech in noise. We found a systematic pattern of increasing leftward asymmetry for higher frequency oscillations; this pattern was more pronounced in children who better perceived words in noise. The observed connection between left-biased cortical oscillations in phoneme-relevant frequencies and speech-in-noise perception suggests hemispheric specialization of endogenous oscillatory activity may support speech processing in challenging listening environments, and that this infrastructure is present during early childhood.

  10. Packet speech systems technology

    Science.gov (United States)

    Weinstein, C. J.; Blankenship, P. E.

    1982-09-01

    The long-range objectives of the Packet Speech Systems Technology Program are to develop and demonstrate techniques for efficient digital speech communications on networks suitable for both voice and data, and to investigate and develop techniques for integrated voice and data communication in packetized networks, including wideband common-user satellite links. Specific areas of concern are: the concentration of statistically fluctuating volumes of voice traffic, the adaptation of communication strategies to varying conditions of network links and traffic volume, and the interconnection of wideband satellite networks to terrestrial systems. Previous efforts in this area have led to new vocoder structures for improved narrowband voice performance and multiple-rate transmission, and to demonstrations of conversational speech and conferencing on the ARPANET and the Atlantic Packet Satellite Network. The current program has two major thrusts: i.e., the development and refinement of practical low-cost, robust, narrowband, and variable-rate speech algorithms and voice terminal structures; and the establishment of an experimental wideband satellite network to serve as a unique facility for the realistic investigation of voice/data networking strategies.

  11. Black History Speech

    Science.gov (United States)

    Noldon, Carl

    2007-01-01

    The author argues in this speech that one cannot expect students in the school system to know and understand the genius of Black history if the curriculum is Eurocentric, which is a residue of racism. He states that his comments are designed for the enlightenment of those who suffer from a school system that "hypocritically manipulates Black…

  12. Hearing speech in music

    Directory of Open Access Journals (Sweden)

    Seth-Reino Ekström

    2011-01-01

    Full Text Available The masking effect of a piano composition, played at different speeds and in different octaves, on speech-perception thresholds was investigated in 15 normal-hearing and 14 moderately-hearing-impaired subjects. Running speech (just follow conversation, JFC testing and use of hearing aids increased the everyday validity of the findings. A comparison was made with standard audiometric noises [International Collegium of Rehabilitative Audiology (ICRA noise and speech spectrum-filtered noise (SPN]. All masking sounds, music or noise, were presented at the same equivalent sound level (50 dBA. The results showed a significant effect of piano performance speed and octave (P<.01. Low octave and fast tempo had the largest effect; and high octave and slow tempo, the smallest. Music had a lower masking effect than did ICRA noise with two or six speakers at normal vocal effort (P<.01 and SPN (P<.05. Subjects with hearing loss had higher masked thresholds than the normal-hearing subjects (P<.01, but there were smaller differences between masking conditions (P<.01. It is pointed out that music offers an interesting opportunity for studying masking under realistic conditions, where spectral and temporal features can be varied independently. The results have implications for composing music with vocal parts, designing acoustic environments and creating a balance between speech perception and privacy in social settings.

  13. Free Speech Yearbook 1979.

    Science.gov (United States)

    Kane, Peter E., Ed.

    The seven articles in this collection deal with theoretical and practical freedom of speech issues. Topics covered are: the United States Supreme Court, motion picture censorship, and the color line; judicial decision making; the established scientific community's suppression of the ideas of Immanuel Velikovsky; the problems of avant-garde jazz,…

  14. Speech intelligibility in hospitals.

    Science.gov (United States)

    Ryherd, Erica E; Moeller, Michael; Hsu, Timothy

    2013-07-01

    Effective communication between staff members is key to patient safety in hospitals. A variety of patient care activities including admittance, evaluation, and treatment rely on oral communication. Surprisingly, published information on speech intelligibility in hospitals is extremely limited. In this study, speech intelligibility measurements and occupant evaluations were conducted in 20 units of five different U.S. hospitals. A variety of unit types and locations were studied. Results show that overall, no unit had "good" intelligibility based on the speech intelligibility index (SII > 0.75) and several locations found to have "poor" intelligibility (SII SII than unoccupied spaces on average. Additionally, staff perception of communication problems at nurse stations was significantly correlated with SII ratings. In a targeted second phase, a unit treated with sound absorption had higher SII ratings for a larger percentage of time as compared to an identical untreated unit. Taken as a whole, the study provides an extensive baseline evaluation of speech intelligibility across a variety of hospitals and unit types, offers some evidence of the positive impact of absorption on intelligibility, and identifies areas for future research.

  15. Hearing speech in music.

    Science.gov (United States)

    Ekström, Seth-Reino; Borg, Erik

    2011-01-01

    The masking effect of a piano composition, played at different speeds and in different octaves, on speech-perception thresholds was investigated in 15 normal-hearing and 14 moderately-hearing-impaired subjects. Running speech (just follow conversation, JFC) testing and use of hearing aids increased the everyday validity of the findings. A comparison was made with standard audiometric noises [International Collegium of Rehabilitative Audiology (ICRA) noise and speech spectrum-filtered noise (SPN)]. All masking sounds, music or noise, were presented at the same equivalent sound level (50 dBA). The results showed a significant effect of piano performance speed and octave (PMusic had a lower masking effect than did ICRA noise with two or six speakers at normal vocal effort (Pmusic offers an interesting opportunity for studying masking under realistic conditions, where spectral and temporal features can be varied independently. The results have implications for composing music with vocal parts, designing acoustic environments and creating a balance between speech perception and privacy in social settings. PMID:21768731

  16. 1984 Newbery Acceptance Speech.

    Science.gov (United States)

    Cleary, Beverly

    1984-01-01

    This acceptance speech for an award honoring "Dear Mr. Henshaw," a book about feelings of a lonely child of divorce intended for eight-, nine-, and ten-year-olds, highlights children's letters to author. Changes in society that affect children, the inception of "Dear Mr. Henshaw," and children's reactions to books are highlighted. (EJS)

  17. An Exploration of Rhythmic Grouping of Speech Sequences by French- and German-Learning Infants

    OpenAIRE

    Abboub, Nawal; Boll-Avetisyan, Natalie; Bhatara, Anjali; Höhle, Barbara; Nazzi, Thierry

    2016-01-01

    Rhythm in music and speech can be characterized by a constellation of several acoustic cues. Individually, these cues have different effects on rhythmic perception: sequences of sounds alternating in duration are perceived as short-long pairs (weak-strong/iambic pattern), whereas sequences of sounds alternating in intensity or pitch are perceived as loud-soft, or high-low pairs (strong-weak/trochaic pattern). This perceptual bias—called the Iambic-Trochaic Law (ITL)–has been claimed to be an ...

  18. Metaheuristic applications to speech enhancement

    CERN Document Server

    Kunche, Prajna

    2016-01-01

    This book serves as a basic reference for those interested in the application of metaheuristics to speech enhancement. The major goal of the book is to explain the basic concepts of optimization methods and their use in heuristic optimization in speech enhancement to scientists, practicing engineers, and academic researchers in speech processing. The authors discuss why it has been a challenging problem for researchers to develop new enhancement algorithms that aid in the quality and intelligibility of degraded speech. They present powerful optimization methods to speech enhancement that can help to solve the noise reduction problems. Readers will be able to understand the fundamentals of speech processing as well as the optimization techniques, how the speech enhancement algorithms are implemented by utilizing optimization methods, and will be given the tools to develop new algorithms. The authors also provide a comprehensive literature survey regarding the topic.

  19. Gaze in Visual Search Is Guided More Efficiently by Positive Cues than by Negative Cues.

    Directory of Open Access Journals (Sweden)

    Günter Kugler

    Full Text Available Visual search can be accelerated when properties of the target are known. Such knowledge allows the searcher to direct attention to items sharing these properties. Recent work indicates that information about properties of non-targets (i.e., negative cues can also guide search. In the present study, we examine whether negative cues lead to different search behavior compared to positive cues. We asked observers to search for a target defined by a certain shape singleton (broken line among solid lines. Each line was embedded in a colored disk. In "positive cue" blocks, participants were informed about possible colors of the target item. In "negative cue" blocks, the participants were informed about colors that could not contain the target. Search displays were designed such that with both the positive and negative cues, the same number of items could potentially contain the broken line ("relevant items". Thus, both cues were equally informative. We measured response times and eye movements. Participants exhibited longer response times when provided with negative cues compared to positive cues. Although negative cues did guide the eyes to relevant items, there were marked differences in eye movements. Negative cues resulted in smaller proportions of fixations on relevant items, longer duration of fixations and in higher rates of fixations per item as compared to positive cues. The effectiveness of both cue types, as measured by fixations on relevant items, increased over the course of each search. In sum, a negative color cue can guide attention to relevant items, but it is less efficient than a positive cue of the same informational value.

  20. Children's recognition of emotions from vocal cues

    NARCIS (Netherlands)

    D.A. Sauter; C. Panattoni; F. Happé

    2013-01-01

    Emotional cues contain important information about the intentions and feelings of others. Despite a wealth of research into children's understanding of facial signals of emotions, little research has investigated the developmental trajectory of interpreting affective cues in the voice. In this study

  1. Guiding Attention by Cooperative Cues

    Institute of Scientific and Technical Information of China (English)

    KangWoo Lee

    2008-01-01

    A common assumption in visual attention is based on the rationale of "limited capacity of information pro-ceasing". From this view point there is little consideration of how different information channels or modules are cooperating because cells in processing stages are forced to compete for the limited resource. To examine the mechanism behind the cooperative behavior of information channels, a computational model of selective attention is implemented based on two hypotheses. Unlike the traditional view of visual attention, the cooperative behavior is assumed to be a dynamic integration process between the bottom-up and top-down information. Furthermore, top-down information is assumed to provide a contextual cue during selection process and to guide the attentional allocation among many bottom-up candidates. The result from a series of simulation with still and video images showed some interesting properties that could not be explained by the competitive aspect of selective attention alone.

  2. Speech Motor Control in Fluent and Dysfluent Speech Production of an Individual with Apraxia of Speech and Broca's Aphasia

    Science.gov (United States)

    van Lieshout, Pascal H. H. M.; Bose, Arpita; Square, Paula A.; Steele, Catriona M.

    2007-01-01

    Apraxia of speech (AOS) is typically described as a motor-speech disorder with clinically well-defined symptoms, but without a clear understanding of the underlying problems in motor control. A number of studies have compared the speech of subjects with AOS to the fluent speech of controls, but only a few have included speech movement data and if…

  3. Cue abstraction and exemplar memory in categorization.

    Science.gov (United States)

    Juslin, Peter; Jones, Sari; Olsson, Henrik; Winman, Anders

    2003-09-01

    In this article, the authors compare 3 generic models of the cognitive processes in a categorization task. The cue abstraction model implies abstraction in training of explicit cue-criterion relations that are mentally integrated to form a judgment, the lexicographic heuristic uses only the most valid cue, and the exemplar-based model relies on retrieval of exemplars. The results from 2 experiments showed that, in lieu of the lexicographic heuristic, most participants spontaneously integrate cues. In contrast to single-system views, exemplar memory appeared to dominate when the feedback was poor, but when the feedback was rich enough to allow the participants to discern the task structure, it was exploited for abstraction of explicit cue-criterion relations. PMID:14516225

  4. Kin-informative recognition cues in ants

    DEFF Research Database (Denmark)

    Nehring, Volker; Evison, Sophie E F; Santorelli, Lorenzo A;

    2011-01-01

    behaviour is thought to be rare in one of the classic examples of cooperation--social insect colonies--because the colony-level costs of individual selfishness select against cues that would allow workers to recognize their closest relatives. In accord with this, previous studies of wasps and ants have...... found little or no kin information in recognition cues. Here, we test the hypothesis that social insects do not have kin-informative recognition cues by investigating the recognition cues and relatedness of workers from four colonies of the ant Acromyrmex octospinosus. Contrary to the theoretical...... prediction, we show that the cuticular hydrocarbons of ant workers in all four colonies are informative enough to allow full-sisters to be distinguished from half-sisters with a high accuracy. These results contradict the hypothesis of non-heritable recognition cues and suggest that there is more potential...

  5. Sensorimotor Interactions in Speech Learning

    Directory of Open Access Journals (Sweden)

    Douglas M Shiller

    2011-10-01

    Full Text Available Auditory input is essential for normal speech development and plays a key role in speech production throughout the life span. In traditional models, auditory input plays two critical roles: 1 establishing the acoustic correlates of speech sounds that serve, in part, as the targets of speech production, and 2 as a source of feedback about a talker's own speech outcomes. This talk will focus on both of these roles, describing a series of studies that examine the capacity of children and adults to adapt to real-time manipulations of auditory feedback during speech production. In one study, we examined sensory and motor adaptation to a manipulation of auditory feedback during production of the fricative “s”. In contrast to prior accounts, adaptive changes were observed not only in speech motor output but also in subjects' perception of the sound. In a second study, speech adaptation was examined following a period of auditory–perceptual training targeting the perception of vowels. The perceptual training was found to systematically improve subjects' motor adaptation response to altered auditory feedback during speech production. The results of both studies support the idea that perceptual and motor processes are tightly coupled in speech production learning, and that the degree and nature of this coupling may change with development.

  6. SPEECH CLASSIFICATION USING ZERNIKE MOMENTS

    Directory of Open Access Journals (Sweden)

    Manisha Pacharne

    2011-07-01

    Full Text Available Speech recognition is very popular field of research and speech classification improves the performance for speech recognition. Different patterns are identified using various characteristics or features of speech to do there classification. Typical speech features set consist of many parameters like standard deviation, magnitude, zero crossing representing speech signal. By considering all these parameters, system computation load and time will increase a lot, so there is need to minimize these parameters by selecting important features. Feature selection aims to get an optimal subset of features from given space, leading to high classification performance. Thus feature selection methods should derive features that should reduce the amount of data used for classification. High recognition accuracy is in demand for speech recognition system. In this paper Zernike moments of speech signal are extracted and used as features of speech signal. Zernike moments are the shape descriptor generally used to describe the shape of region. To extract Zernike moments, one dimensional audio signal is converted into two dimensional image file. Then various feature selection and ranking algorithms like t-Test, Chi Square, Fisher Score, ReliefF, Gini Index and Information Gain are used to select important feature of speech signal. Performances of the algorithms are evaluated using accuracy of classifier. Support Vector Machine (SVM is used as the learning algorithm of classifier and it is observed that accuracy is improved a lot after removing unwanted features.

  7. Fully Automated Assessment of the Severity of Parkinson's Disease from Speech.

    Science.gov (United States)

    Bayestehtashk, Alireza; Asgari, Meysam; Shafran, Izhak; McNames, James

    2015-01-01

    For several decades now, there has been sporadic interest in automatically characterizing the speech impairment due to Parkinson's disease (PD). Most early studies were confined to quantifying a few speech features that were easy to compute. More recent studies have adopted a machine learning approach where a large number of potential features are extracted and the models are learned automatically from the data. In the same vein, here we characterize the disease using a relatively large cohort of 168 subjects, collected from multiple (three) clinics. We elicited speech using three tasks - the sustained phonation task, the diadochokinetic task and a reading task, all within a time budget of 4 minutes, prompted by a portable device. From these recordings, we extracted 1582 features for each subject using openSMILE, a standard feature extraction tool. We compared the effectiveness of three strategies for learning a regularized regression and find that ridge regression performs better than lasso and support vector regression for our task. We refine the feature extraction to capture pitch-related cues, including jitter and shimmer, more accurately using a time-varying harmonic model of speech. Our results show that the severity of the disease can be inferred from speech with a mean absolute error of about 5.5, explaining 61% of the variance and consistently well-above chance across all clinics. Of the three speech elicitation tasks, we find that the reading task is significantly better at capturing cues than diadochokinetic or sustained phonation task. In all, we have demonstrated that the data collection and inference can be fully automated, and the results show that speech-based assessment has promising practical application in PD. The techniques reported here are more widely applicable to other paralinguistic tasks in clinical domain. PMID:25382935

  8. Learning foreign sounds in an alien world: videogame training improves non-native speech categorization.

    Science.gov (United States)

    Lim, Sung-joo; Holt, Lori L

    2011-01-01

    Although speech categories are defined by multiple acoustic dimensions, some are perceptually weighted more than others and there are residual effects of native-language weightings in non-native speech perception. Recent research on nonlinguistic sound category learning suggests that the distribution characteristics of experienced sounds influence perceptual cue weights: Increasing variability across a dimension leads listeners to rely upon it less in subsequent category learning (Holt & Lotto, 2006). The present experiment investigated the implications of this among native Japanese learning English /r/-/l/ categories. Training was accomplished using a videogame paradigm that emphasizes associations among sound categories, visual information, and players' responses to videogame characters rather than overt categorization or explicit feedback. Subjects who played the game for 2.5h across 5 days exhibited improvements in /r/-/l/ perception on par with 2-4 weeks of explicit categorization training in previous research and exhibited a shift toward more native-like perceptual cue weights. PMID:21827533

  9. Hate Speech: Power in the Marketplace.

    Science.gov (United States)

    Harrison, Jack B.

    1994-01-01

    A discussion of hate speech and freedom of speech on college campuses examines the difference between hate speech from normal, objectionable interpersonal comments and looks at Supreme Court decisions on the limits of student free speech. Two cases specifically concerning regulation of hate speech on campus are considered: Chaplinsky v. New…

  10. Multilevel Analysis in Analyzing Speech Data

    Science.gov (United States)

    Guddattu, Vasudeva; Krishna, Y.

    2011-01-01

    The speech produced by human vocal tract is a complex acoustic signal, with diverse applications in phonetics, speech synthesis, automatic speech recognition, speaker identification, communication aids, speech pathology, speech perception, machine translation, hearing research, rehabilitation and assessment of communication disorders and many…

  11. Cross-modal cueing in audiovisual spatial attention

    OpenAIRE

    Blurton, Steven Paul; Mark W Greenlee; Gondan, Matthias

    2015-01-01

    Visual processing is most effective at the location of our attentional focus. It has long been known that various spatial cues can direct visuospatial attention and influence the detection of auditory targets. Cross-modal cueing, however, seems to depend on the type of the visual cue: facilitation effects have been reported for endogenous visual cues while exogenous cues seem to be mostly ineffective. In three experiments, we investigated cueing effects on the processing of audiovisual signal...

  12. [Improving speech comprehension using a new cochlear implant speech processor].

    Science.gov (United States)

    Müller-Deile, J; Kortmann, T; Hoppe, U; Hessel, H; Morsnowski, A

    2009-06-01

    The aim of this multicenter clinical field study was to assess the benefits of the new Freedom 24 sound processor for cochlear implant (CI) users implanted with the Nucleus 24 cochlear implant system. The study included 48 postlingually profoundly deaf experienced CI users who demonstrated speech comprehension performance with their current speech processor on the Oldenburg sentence test (OLSA) in quiet conditions of at least 80% correct scores and who were able to perform adaptive speech threshold testing using the OLSA in noisy conditions. Following baseline measures of speech comprehension performance with their current speech processor, subjects were upgraded to the Freedom 24 speech processor. After a take-home trial period of at least 2 weeks, subject performance was evaluated by measuring the speech reception threshold with the Freiburg multisyllabic word test and speech intelligibility with the Freiburg monosyllabic word test at 50 dB and 70 dB in the sound field. The results demonstrated highly significant benefits for speech comprehension with the new speech processor. Significant benefits for speech comprehension were also demonstrated with the new speech processor when tested in competing background noise.In contrast, use of the Abbreviated Profile of Hearing Aid Benefit (APHAB) did not prove to be a suitably sensitive assessment tool for comparative subjective self-assessment of hearing benefits with each processor. Use of the preprocessing algorithm known as adaptive dynamic range optimization (ADRO) in the Freedom 24 led to additional improvements over the standard upgrade map for speech comprehension in quiet and showed equivalent performance in noise. Through use of the preprocessing beam-forming algorithm BEAM, subjects demonstrated a highly significant improved signal-to-noise ratio for speech comprehension thresholds (i.e., signal-to-noise ratio for 50% speech comprehension scores) when tested with an adaptive procedure using the Oldenburg

  13. On Speech Act Theory

    Institute of Scientific and Technical Information of China (English)

    邓仁毅

    2009-01-01

    Speech act has developed from the work of linguistic philosophers and originates in Austin's observation and study. It was the particular search for the eonstative, utterances which describe something outside the text and can therefore be judged true or false that prompted John L. Austin to direct his attention to the distinction with so -called performa-tires. The two representative linguists are Aus-tin and Searle.

  14. Cross-modal cueing in audiovisual spatial attention

    DEFF Research Database (Denmark)

    Blurton, Steven Paul; Greenlee, Mark W.; Gondan, Matthias

    2015-01-01

    effects have been reported for endogenous visual cues while exogenous cues seem to be mostly ineffective. In three experiments, we investigated cueing effects on the processing of audiovisual signals. In Experiment 1 we used endogenous cues to investigate their effect on the detection of auditory, visual......, and audiovisual targets presented with onset asynchrony. Consistent cueing effects were found in all target conditions. In Experiment 2 we used exogenous cues and found cueing effects only for visual target detection, but not auditory target detection. In Experiment 3 we used predictive exogenous cues to examine...

  15. Compensation for complete assimilation in speech perception: The case of Korean labial-to-velar assimilation

    OpenAIRE

    Mitterer, H.; Kim, S.; Cho, T.

    2013-01-01

    In connected speech, phonological assimilation to neighboring words can lead to pronunciation variants (e.g., 'garden bench'→ "gardem bench"). A large body of literature suggests that listeners use the phonetic context to reconstruct the intended word for assimilation types that often lead to incomplete assimilations (e.g., a pronunciation of "garden" that carries cues for both a labial [m] and an alveolar [n]). In the current paper, we show that a similar context effect is observed for an as...

  16. Speech and the Right Hemisphere

    Directory of Open Access Journals (Sweden)

    E. M. R. Critchley

    1991-01-01

    Full Text Available Two facts are well recognized: the location of the speech centre with respect to handedness and early brain damage, and the involvement of the right hemisphere in certain cognitive functions including verbal humour, metaphor interpretation, spatial reasoning and abstract concepts. The importance of the right hemisphere in speech is suggested by pathological studies, blood flow parameters and analysis of learning strategies. An insult to the right hemisphere following left hemisphere damage can affect residual language abilities and may activate non-propositional inner speech. The prosody of speech comprehension even more so than of speech production—identifying the voice, its affective components, gestural interpretation and monitoring one's own speech—may be an essentially right hemisphere task. Errors of a visuospatial type may occur in the learning process. Ease of learning by actors and when learning foreign languages is achieved by marrying speech with gesture and intonation, thereby adopting a right hemisphere strategy.

  17. Language Specific Speech Feature Variation

    Directory of Open Access Journals (Sweden)

    Surbhi Dewan

    2016-04-01

    Full Text Available Speech is basically used to impart message from one person to another. There are various properties of speech that may vary from person to person or from language to language. The power of human language is found to be effected by variations in language. However, not much work has been done to analyse similarities and dissimilarities between speech features between English and Hindi language. The prosodic statistics for instance like stress and rhythm which are basically coded into intensity, pitch and formants. We have further examined the utilization of pitch and formants to study the linguistic difference of speech properties in English and Hindi Language. We clustered the speech samples into two categories and concentrated basically on pitch and formant values of speech signals. From our study we observed a significant change in the values of pitch and formants in English and Hindi language.

  18. Action experience changes attention to kinematic cues

    Directory of Open Access Journals (Sweden)

    Courtney eFilippi

    2016-02-01

    Full Text Available The current study used remote corneal reflection eye-tracking to examine the relationship between motor experience and action anticipation in 13-month-old infants. To measure online anticipation of actions infants watched videos where the actor’s hand provided kinematic information (in its orientation about the type of object that the actor was going to reach for. The actor’s hand orientation either matched the orientation of a rod (congruent cue or did not match the orientation of the rod (incongruent cue. To examine relations between motor experience and action anticipation, we used a 2 (reach first vs. observe first x 2 (congruent kinematic cue vs. incongruent kinematic cue between-subjects design. We show that 13-month-old infants in the observe first condition spontaneously generate rapid online visual predictions to congruent hand orientation cues and do not visually anticipate when presented incongruent cues. We further demonstrate that the speed that these infants generate predictions to congruent motor cues is correlated with their own ability to pre-shape their hands. Finally, we demonstrate that following reaching experience, infants generate rapid predictions to both congruent and incongruent hand shape cues—suggesting that short-term experience changes attention to kinematics.

  19. When unreliable cues are good enough.

    Science.gov (United States)

    Donaldson-Matasci, Matina C; Bergstrom, Carl T; Lachmann, Michael

    2013-09-01

    In many species, nongenetic phenotypic variation helps mitigate risk associated with an uncertain environment. In some cases, developmental cues can be used to match phenotype to environment-a strategy known as predictive plasticity. When environmental conditions are entirely unpredictable, generating random phenotypic diversity may improve the long-term success of a lineage-a strategy known as diversified bet hedging. When partially reliable information is available, a well-adapted developmental strategy may strike a balance between the two strategies. We use information theory to analyze a model of development in an uncertain environment, where cue reliability is affected by variation both within and between generations. We show that within-generation variation in cues decreases the reliability of cues without affecting their fitness value. This transpires because the optimal balance of predictive plasticity and diversified bet hedging is unchanged. However, within-generation variation in cues does change the developmental mechanisms used to create that balance: developmental sensitivity to such cues not only helps match phenotype to environment but also creates phenotypic diversity that may be useful for hedging bets against environmental change. Understanding the adaptive role of developmental sensitivity thus depends on a proper assessment of both the predictive power and the structure of variation in environmental cues. PMID:23933723

  20. Cues of maternal condition influence offspring selfishness.

    Directory of Open Access Journals (Sweden)

    Janine W Y Wong

    Full Text Available The evolution of parent-offspring communication was mostly studied from the perspective of parents responding to begging signals conveying information about offspring condition. Parents should respond to begging because of the differential fitness returns obtained from their investment in offspring that differ in condition. For analogous reasons, offspring should adjust their behavior to cues/signals of parental condition: parents that differ in condition pay differential costs of care and, hence, should provide different amounts of food. In this study, we experimentally tested in the European earwig (Forficula auricularia if cues of maternal condition affect offspring behavior in terms of sibling cannibalism. We experimentally manipulated female condition by providing them with different amounts of food, kept nymph condition constant, allowed for nymph exposure to chemical maternal cues over extended time, quantified nymph survival (deaths being due to cannibalism and extracted and analyzed the females' cuticular hydrocarbons (CHC. Nymph survival was significantly affected by chemical cues of maternal condition, and this effect depended on the timing of breeding. Cues of poor maternal condition enhanced nymph survival in early broods, but reduced nymph survival in late broods, and vice versa for cues of good condition. Furthermore, female condition affected the quantitative composition of their CHC profile which in turn predicted nymph survival patterns. Thus, earwig offspring are sensitive to chemical cues of maternal condition and nymphs from early and late broods show opposite reactions to the same chemical cues. Together with former evidence on maternal sensitivities to condition-dependent nymph chemical cues, our study shows context-dependent reciprocal information exchange about condition between earwig mothers and their offspring, potentially mediated by cuticular hydrocarbons.

  1. Cues of maternal condition influence offspring selfishness.

    Science.gov (United States)

    Wong, Janine W Y; Lucas, Christophe; Kölliker, Mathias

    2014-01-01

    The evolution of parent-offspring communication was mostly studied from the perspective of parents responding to begging signals conveying information about offspring condition. Parents should respond to begging because of the differential fitness returns obtained from their investment in offspring that differ in condition. For analogous reasons, offspring should adjust their behavior to cues/signals of parental condition: parents that differ in condition pay differential costs of care and, hence, should provide different amounts of food. In this study, we experimentally tested in the European earwig (Forficula auricularia) if cues of maternal condition affect offspring behavior in terms of sibling cannibalism. We experimentally manipulated female condition by providing them with different amounts of food, kept nymph condition constant, allowed for nymph exposure to chemical maternal cues over extended time, quantified nymph survival (deaths being due to cannibalism) and extracted and analyzed the females' cuticular hydrocarbons (CHC). Nymph survival was significantly affected by chemical cues of maternal condition, and this effect depended on the timing of breeding. Cues of poor maternal condition enhanced nymph survival in early broods, but reduced nymph survival in late broods, and vice versa for cues of good condition. Furthermore, female condition affected the quantitative composition of their CHC profile which in turn predicted nymph survival patterns. Thus, earwig offspring are sensitive to chemical cues of maternal condition and nymphs from early and late broods show opposite reactions to the same chemical cues. Together with former evidence on maternal sensitivities to condition-dependent nymph chemical cues, our study shows context-dependent reciprocal information exchange about condition between earwig mothers and their offspring, potentially mediated by cuticular hydrocarbons. PMID:24498046

  2. Lecturer’s Speech Competence

    OpenAIRE

    Svetlana Viktorovna Panina; Svetlana Yurievna Zalutskaya; Galina Egorovna Zhondorova

    2014-01-01

    The analysis of the issue of lecturer’s speech competence is presented. Lecturer’s speech competence is the main component of professional image, the indicator of communicative culture, having a great impact on the quality of pedagogical activity Research objective: to define the main drawbacks of speech competence of lecturers of North-Eastern Federal University named after M. K. Ammosov (NEFU) (Russia, Yakutsk) and suggest the ways of drawbacks corrections in terms of multilingual education...

  3. Speech recognition in university classrooms

    OpenAIRE

    Wald, Mike; Bain, Keith; Basson, Sara H

    2002-01-01

    The LIBERATED LEARNING PROJECT (LLP) is an applied research project studying two core questions: 1) Can speech recognition (SR) technology successfully digitize lectures to display spoken words as text in university classrooms? 2) Can speech recognition technology be used successfully as an alternative to traditional classroom notetaking for persons with disabilities? This paper addresses these intriguing questions and explores the underlying complex relationship between speech recognition te...

  4. Visualizing structures of speech expressiveness

    OpenAIRE

    Herbelin, Bruno; Jensen, Karl Kristoffer; Graugaard, Lars

    2008-01-01

    Speech is both beautiful and informative. In this work, a conceptual study ofthe speech, through investigation of the tower of Babel, the archetypal phonemes, and astudy of the reasons of uses of language is undertaken in order to create an artistic workinvestigating the nature of speech. The Babel myth speaks about distance created whenaspiring to the heaven as the reason for language division. Meanwhile, Locquin statesthrough thorough investigations that only a few phonemes are present thro...

  5. Motor Equivalence in Speech Production

    OpenAIRE

    Perrier, Pascal; Fuchs, Susanne

    2015-01-01

    International audience The first section provides a description of the concepts of “motor equivalence” and “degrees of freedom”. It is illustrated with a few examples of motor tasks in general and of speech production tasks in particular. In the second section, the methodology used to investigate experimentally motor equivalence phenomena in speech production is presented. It is mainly based on paradigms that perturb the perception-action loop during on-going speech, either by limiting the...

  6. Contribution of auditory working memory to speech understanding in mandarin-speaking cochlear implant users.

    Directory of Open Access Journals (Sweden)

    Duoduo Tao

    importance of voice pitch cues (albeit poorly coded by the CI did not influence the relationship between working memory and speech perception.

  7. What Is Language? What Is Speech?

    Science.gov (United States)

    ... Public / Speech, Language and Swallowing / Development What Is Language? What Is Speech? [ en Español ] Kelly's 4-year-old son, Tommy, has speech and language problems. Friends and family have a hard time ...

  8. A Maximum Likelihood Estimation of Vocal-Tract-Related Filter Characteristics for Single Channel Speech Separation

    Directory of Open Access Journals (Sweden)

    Mohammad H. Radfar

    2006-11-01

    Full Text Available We present a new technique for separating two speech signals from a single recording. The proposed method bridges the gap between underdetermined blind source separation techniques and those techniques that model the human auditory system, that is, computational auditory scene analysis (CASA. For this purpose, we decompose the speech signal into the excitation signal and the vocal-tract-related filter and then estimate the components from the mixed speech using a hybrid model. We first express the probability density function (PDF of the mixed speech's log spectral vectors in terms of the PDFs of the underlying speech signal's vocal-tract-related filters. Then, the mean vectors of PDFs of the vocal-tract-related filters are obtained using a maximum likelihood estimator given the mixed signal. Finally, the estimated vocal-tract-related filters along with the extracted fundamental frequencies are used to reconstruct estimates of the individual speech signals. The proposed technique effectively adds vocal-tract-related filter characteristics as a new cue to CASA models using a new grouping technique based on an underdetermined blind source separation. We compare our model with both an underdetermined blind source separation and a CASA method. The experimental results show that our model outperforms both techniques in terms of SNR improvement and the percentage of crosstalk suppression.

  9. A Maximum Likelihood Estimation of Vocal-Tract-Related Filter Characteristics for Single Channel Speech Separation

    Directory of Open Access Journals (Sweden)

    Dansereau Richard M

    2007-01-01

    Full Text Available We present a new technique for separating two speech signals from a single recording. The proposed method bridges the gap between underdetermined blind source separation techniques and those techniques that model the human auditory system, that is, computational auditory scene analysis (CASA. For this purpose, we decompose the speech signal into the excitation signal and the vocal-tract-related filter and then estimate the components from the mixed speech using a hybrid model. We first express the probability density function (PDF of the mixed speech's log spectral vectors in terms of the PDFs of the underlying speech signal's vocal-tract-related filters. Then, the mean vectors of PDFs of the vocal-tract-related filters are obtained using a maximum likelihood estimator given the mixed signal. Finally, the estimated vocal-tract-related filters along with the extracted fundamental frequencies are used to reconstruct estimates of the individual speech signals. The proposed technique effectively adds vocal-tract-related filter characteristics as a new cue to CASA models using a new grouping technique based on an underdetermined blind source separation. We compare our model with both an underdetermined blind source separation and a CASA method. The experimental results show that our model outperforms both techniques in terms of SNR improvement and the percentage of crosstalk suppression.

  10. Differential Gaze Patterns on Eyes and Mouth During Audiovisual Speech Segmentation.

    Science.gov (United States)

    Lusk, Laina G; Mitchel, Aaron D

    2016-01-01

    Speech is inextricably multisensory: both auditory and visual components provide critical information for all aspects of speech processing, including speech segmentation, the visual components of which have been the target of a growing number of studies. In particular, a recent study (Mitchel and Weiss, 2014) established that adults can utilize facial cues (i.e., visual prosody) to identify word boundaries in fluent speech. The current study expanded upon these results, using an eye tracker to identify highly attended facial features of the audiovisual display used in Mitchel and Weiss (2014). Subjects spent the most time watching the eyes and mouth. A significant trend in gaze durations was found with the longest gaze duration on the mouth, followed by the eyes and then the nose. In addition, eye-gaze patterns changed across familiarization as subjects learned the word boundaries, showing decreased attention to the mouth in later blocks while attention on other facial features remained consistent. These findings highlight the importance of the visual component of speech processing and suggest that the mouth may play a critical role in visual speech segmentation.

  11. High visual resolution matters in audiovisual speech perception, but only for some.

    Science.gov (United States)

    Alsius, Agnès; Wayne, Rachel V; Paré, Martin; Munhall, Kevin G

    2016-07-01

    The basis for individual differences in the degree to which visual speech input enhances comprehension of acoustically degraded speech is largely unknown. Previous research indicates that fine facial detail is not critical for visual enhancement when auditory information is available; however, these studies did not examine individual differences in ability to make use of fine facial detail in relation to audiovisual speech perception ability. Here, we compare participants based on their ability to benefit from visual speech information in the presence of an auditory signal degraded with noise, modulating the resolution of the visual signal through low-pass spatial frequency filtering and monitoring gaze behavior. Participants who benefited most from the addition of visual information (high visual gain) were more adversely affected by the removal of high spatial frequency information, compared to participants with low visual gain, for materials with both poor and rich contextual cues (i.e., words and sentences, respectively). Differences as a function of gaze behavior between participants with the highest and lowest visual gains were observed only for words, with participants with the highest visual gain fixating longer on the mouth region. Our results indicate that the individual variance in audiovisual speech in noise performance can be accounted for, in part, by better use of fine facial detail information extracted from the visual signal and increased fixation on mouth regions for short stimuli. Thus, for some, audiovisual speech perception may suffer when the visual input (in addition to the auditory signal) is less than perfect.

  12. Auditory detection of non-speech and speech stimuli in noise: Native speech advantage.

    Science.gov (United States)

    Huo, Shuting; Tao, Sha; Wang, Wenjing; Li, Mingshuang; Dong, Qi; Liu, Chang

    2016-05-01

    Detection thresholds of Chinese vowels, Korean vowels, and a complex tone, with harmonic and noise carriers were measured in noise for Mandarin Chinese-native listeners. The harmonic index was calculated as the difference between detection thresholds of the stimuli with harmonic carriers and those with noise carriers. The harmonic index for Chinese vowels was significantly greater than that for Korean vowels and the complex tone. Moreover, native speech sounds were rated significantly more native-like than non-native speech and non-speech sounds. The results indicate that native speech has an advantage over other sounds in simple auditory tasks like sound detection. PMID:27250202

  13. Behavioral Signal Processing: Deriving Human Behavioral Informatics From Speech and Language

    Science.gov (United States)

    Narayanan, Shrikanth; Georgiou, Panayiotis G.

    2013-01-01

    The expression and experience of human behavior are complex and multimodal and characterized by individual and contextual heterogeneity and variability. Speech and spoken language communication cues offer an important means for measuring and modeling human behavior. Observational research and practice across a variety of domains from commerce to healthcare rely on speech- and language-based informatics for crucial assessment and diagnostic information and for planning and tracking response to an intervention. In this paper, we describe some of the opportunities as well as emerging methodologies and applications of human behavioral signal processing (BSP) technology and algorithms for quantitatively understanding and modeling typical, atypical, and distressed human behavior with a specific focus on speech- and language-based communicative, affective, and social behavior. We describe the three important BSP components of acquiring behavioral data in an ecologically valid manner across laboratory to real-world settings, extracting and analyzing behavioral cues from measured data, and developing models offering predictive and decision-making support. We highlight both the foundational speech and language processing building blocks as well as the novel processing and modeling opportunities. Using examples drawn from specific real-world applications ranging from literacy assessment and autism diagnostics to psychotherapy for addiction and marital well being, we illustrate behavioral informatics applications of these signal processing techniques that contribute to quantifying higher level, often subjectively described, human behavior in a domain-sensitive fashion. PMID:24039277

  14. Hearing impairment and audiovisual speech integration ability: a case study report.

    Science.gov (United States)

    Altieri, Nicholas; Hudock, Daniel

    2014-01-01

    Research in audiovisual speech perception has demonstrated that sensory factors such as auditory and visual acuity are associated with a listener's ability to extract and combine auditory and visual speech cues. This case study report examined audiovisual integration using a newly developed measure of capacity in a sample of hearing-impaired listeners. Capacity assessments are unique because they examine the contribution of reaction-time (RT) as well as accuracy to determine the extent to which a listener efficiently combines auditory and visual speech cues relative to independent race model predictions. Multisensory speech integration ability was examined in two experiments: an open-set sentence recognition and a closed set speeded-word recognition study that measured capacity. Most germane to our approach, capacity illustrated speed-accuracy tradeoffs that may be predicted by audiometric configuration. Results revealed that some listeners benefit from increased accuracy, but fail to benefit in terms of speed on audiovisual relative to unisensory trials. Conversely, other listeners may not benefit in the accuracy domain but instead show an audiovisual processing time benefit.

  15. Parameter masks for close talk speech segregation using deep neural networks

    Directory of Open Access Journals (Sweden)

    Jiang Yi

    2015-01-01

    Full Text Available A deep neural networks (DNN based close talk speech segregation algorithm is introduced. One nearby microphone is used to collect the target speech as close talk indicated, and another microphone is used to get the noise in environments. The time and energy difference between the two microphones signal is used as the segregation cue. A DNN estimator on each frequency channel is used to calculate the parameter masks. The parameter masks represent the target speech energy in each time frequency (T-F units. Experiment results show the good performance of the proposed system. The signal to noise ratio (SNR improvement is 8.1 dB on 0 dB noisy environment.

  16. Discovering Words in Fluent Speech: The Contribution of Two Kinds of Statistical Information

    Directory of Open Access Journals (Sweden)

    Erik D Thiessen

    2013-01-01

    Full Text Available To efficiently segment fluent speech, infants must discover the predominant phonological form of words in the native language. In English, for example, content words typically begin with a stressed syllable. To discover this regularity, infants need to identify a set of words. We propose that statistical learning plays two roles in this process. First, it provides a cue that allows infants to segment words from fluent speech, even without language-specific phonological knowledge. Second, once infants have identified a set of lexical forms, they can learn from the distribution of acoustic features across those word forms. The current experiments demonstrate both processes are available to 5-month-old infants. This is an earlier age than prior demonstration of sensitivity to statistical structure in speech, and consistent with theoretical accounts that claim statistical learning plays a role in helping infants to adapt to the structure of their native language from very early in life.

  17. An Improved Speech Enhancement Method based on Teager Energy Operator and Perceptual Wavelet Packet Decomposition

    Directory of Open Access Journals (Sweden)

    Huan Zhao

    2011-06-01

    Full Text Available According to the distribution characteristic of noise and clean speech signal in the frequency domain, a new speech enhancement method based on teager energy operator (TEO and perceptual wavelet packet decomposition (PWPD is proposed. Firstly, a modified Mask construction method is made to protect the acoustic cues at the low frequencies. Then a level-dependent parameter is introduced to further adjust the thresholds in light of the noise distribution feature. At last the sub-bands which have very little influence are set directly 0 to improve the signal-to-noise ratio (SNR and reduce the computation load. Simulation results show that, under different kinds of noise environments, this new method not only enhances the signal-to-noise ratio (SNR and perceptual evaluation of speech quality (PESQ, but also reduces the computation load, which is very advantageous for real-time realizing.

  18. Gender differences in craving and cue reactivity to smoking and negative affect/stress cues.

    Science.gov (United States)

    Saladin, Michael E; Gray, Kevin M; Carpenter, Matthew J; LaRowe, Steven D; DeSantis, Stacia M; Upadhyaya, Himanshu P

    2012-01-01

    There is evidence that women may be less successful when attempting to quit smoking than men. One potential contributory cause of this gender difference is differential craving and stress reactivity to smoking- and negative affect/stress-related cues. The present human laboratory study investigated the effects of gender on reactivity to smoking and negative affect/stress cues by exposing nicotine dependent women (n = 37) and men (n = 53) smokers to two active cue types, each with an associated control cue: (1) in vivo smoking cues and in vivo neutral control cues, and (2) imagery-based negative affect/stress script and a neutral/relaxing control script. Both before and after each cue/script, participants provided subjective reports of smoking-related craving and affective reactions. Heart rate (HR) and skin conductance (SC) responses were also measured. Results indicated that participants reported greater craving and SC in response to smoking versus neutral cues and greater subjective stress in response to the negative affect/stress versus neutral/relaxing script. With respect to gender differences, women evidenced greater craving, stress and arousal ratings and lower valence ratings (greater negative emotion) in response to the negative affect/stressful script. While there were no gender differences in responses to smoking cues, women trended towards higher arousal ratings. Implications of the findings for treatment and tobacco-related morbidity and mortality are discussed.

  19. Perception of health from facial cues.

    Science.gov (United States)

    Henderson, Audrey J; Holzleitner, Iris J; Talamas, Sean N; Perrett, David I

    2016-05-01

    Impressions of health are integral to social interactions, yet poorly understood. A review of the literature reveals multiple facial characteristics that potentially act as cues to health judgements. The cues vary in their stability across time: structural shape cues including symmetry and sexual dimorphism alter slowly across the lifespan and have been found to have weak links to actual health, but show inconsistent effects on perceived health. Facial adiposity changes over a medium time course and is associated with both perceived and actual health. Skin colour alters over a short time and has strong effects on perceived health, yet links to health outcomes have barely been evaluated. Reviewing suggested an additional influence of demeanour as a perceptual cue to health. We, therefore, investigated the association of health judgements with multiple facial cues measured objectively from two-dimensional and three-dimensional facial images. We found evidence for independent contributions of face shape and skin colour cues to perceived health. Our empirical findings: (i) reinforce the role of skin yellowness; (ii) demonstrate the utility of global face shape measures of adiposity; and (iii) emphasize the role of affect in facial images with nominally neutral expression in impressions of health. PMID:27069057

  20. Global Freedom of Speech

    DEFF Research Database (Denmark)

    Binderup, Lars Grassme

    2007-01-01

    It has been suggested that the multicultural nature of modern liberal states (in particular the formation of immigration minorities from other cultures due to the process of globalisation) provides reasons - from a liberal egalitarian perspective - for recognising a civic or democratic norm......, as opposed to a legal norm, that curbs exercises of the right to free speech that offend the feelings or beliefs of members from other cultural groups. The paper rejects the suggestion that acceptance of such a norm is in line with liberal egalitarian thinking. Following a review of the classical liberal...

  1. Speech is Golden

    DEFF Research Database (Denmark)

    Juel Henrichsen, Peter

    2014-01-01

    on the supply side. The present article reports on a new public action strategy which has taken shape in the course of 2013-14. While Denmark is a small language area, our public sector is well organised and has considerable purchasing power. Across this past year, Danish local authorities have organised around......Most of the Danish municipalities are ready to begin to adopt automatic speech recognition, but at the same time remain nervous following a long series of bad business cases in the recent past. Complaints are voiced over costly licences and low service levels, typical effects of a de facto monopoly...

  2. Perceptual learning in speech

    OpenAIRE

    D. Norris; McQueen, J; Cutler, A.

    2003-01-01

    This study demonstrates that listeners use lexical knowledge in perceptual learning of speech sounds. Dutch listeners first made lexical decisions on Dutch words and nonwords. The final fricative of 20 critical words had been replaced by an ambiguous sound, between [f] and [s]. One group of listeners heard ambiguous [f]-final words (e.g., [WI tlo?], from witlof, chicory) and unambiguous [s]-final words (e.g., naaldbos, pine forest). Another group heard the reverse (e.g., ambiguous [na:ldbo?],...

  3. Taking a Stand for Speech.

    Science.gov (United States)

    Moore, Wayne D.

    1995-01-01

    Asserts that freedom of speech issues were among the first major confrontations in U.S. constitutional law. Maintains that lessons from the controversies surrounding the Sedition Act of 1798 have continuing practical relevance. Describes and discusses the significance of freedom of speech to the U.S. political system. (CFR)

  4. Speech Prosody in Cerebellar Ataxia

    Science.gov (United States)

    Casper, Maureen A.; Raphael, Lawrence J.; Harris, Katherine S.; Geibel, Jennifer M.

    2007-01-01

    Persons with cerebellar ataxia exhibit changes in physical coordination and speech and voice production. Previously, these alterations of speech and voice production were described primarily via perceptual coordinates. In this study, the spatial-temporal properties of syllable production were examined in 12 speakers, six of whom were healthy…

  5. Separating Underdetermined Convolutive Speech Mixtures

    DEFF Research Database (Denmark)

    Pedersen, Michael Syskind; Wang, DeLiang; Larsen, Jan;

    2006-01-01

    a method for underdetermined blind source separation of convolutive mixtures. The proposed framework is applicable for separation of instantaneous as well as convolutive speech mixtures. It is possible to iteratively extract each speech signal from the mixture by combining blind source separation...

  6. Direct and Indirect Cues to Knowledge States during Word Learning

    Science.gov (United States)

    Saylor, Megan M.; Carroll, C. Brooke

    2009-01-01

    The present study investigated three-year-olds' sensitivity to direct and indirect cues to others' knowledge states for word learning purposes. Children were given either direct, physical cues to knowledge or indirect, verbal cues to knowledge. Preschoolers revealed a better ability to learn words from a speaker following direct, physical cues to…

  7. Speech Compression Using Multecirculerletet Transform

    Directory of Open Access Journals (Sweden)

    Sulaiman Murtadha

    2012-01-01

    Full Text Available Compressing the speech reduces the data storage requirements, leading to reducing the time of transmitting the digitized speech over long-haul links like internet. To obtain best performance in speech compression, wavelet transforms require filters that combine a number of desirable properties, such as orthogonality and symmetry.The MCT bases functions are derived from GHM bases function using 2D linear convolution .The fast computation algorithm methods introduced here added desirable features to the current transform. We further assess the performance of the MCT in speech compression application. This paper discusses the effect of using DWT and MCT (one and two dimension on speech compression. DWT and MCT performances in terms of compression ratio (CR, mean square error (MSE and peak signal to noise ratio (PSNR are assessed. Computer simulation results indicate that the two dimensions MCT offer a better compression ratio, MSE and PSNR than DWT.

  8. Spatial localization of speech segments

    DEFF Research Database (Denmark)

    Karlsen, Brian Lykkegaard

    1999-01-01

    Much is known about human localization of simple stimuli like sinusoids, clicks, broadband noise and narrowband noise in quiet. Less is known about human localization in noise. Even less is known about localization of speech and very few previous studies have reported data from localization of...... distribution of which azimuth angle the target is likely to have originated from. The model is trained on the experimental data. On the basis of the experimental results, it is concluded that the human ability to localize speech segments in adverse noise depends on the speech segment as well as its point of...... speech in noise. This study attempts to answer the question: ``Are there certain features of speech which have an impact on the human ability to determine the spatial location of a speaker in the horizontal plane under adverse noise conditions?''. The study consists of an extensive literature survey on...

  9. Visualizing structures of speech expressiveness

    DEFF Research Database (Denmark)

    Herbelin, Bruno; Jensen, Karl Kristoffer; Graugaard, Lars

    2008-01-01

    Speech is both beautiful and informative. In this work, a conceptual study of the speech, through investigation of the tower of Babel, the archetypal phonemes, and a study of the reasons of uses of language is undertaken in order to create an artistic work investigating the nature of speech....... The Babel myth speaks about distance created when aspiring to the heaven as the reason for language division. Meanwhile, Locquin states through thorough investigations that only a few phonemes are present throughout history. Our interpretation is that a system able to recognize archetypal phonemes through...... vowels and consonants, and which converts the speech energy into visual particles that form complex visual structures, provides us with a mean to present the expressiveness of speech into a visual mode. This system is presented in an artwork whose scenario is inspired from the reasons of language...

  10. Hammerstein Model for Speech Coding

    Directory of Open Access Journals (Sweden)

    Turunen Jari

    2003-01-01

    Full Text Available A nonlinear Hammerstein model is proposed for coding speech signals. Using Tsay's nonlinearity test, we first show that the great majority of speech frames contain nonlinearities (over 80% in our test data when using 20-millisecond speech frames. Frame length correlates with the level of nonlinearity: the longer the frames the higher the percentage of nonlinear frames. Motivated by this result, we present a nonlinear structure using a frame-by-frame adaptive identification of the Hammerstein model parameters for speech coding. Finally, the proposed structure is compared with the LPC coding scheme for three phonemes /a/, /s/, and /k/ by calculating the Akaike information criterion of the corresponding residual signals. The tests show clearly that the residual of the nonlinear model presented in this paper contains significantly less information compared to that of the LPC scheme. The presented method is a potential tool to shape the residual signal in an encode-efficient form in speech coding.

  11. Relative saliency of pitch versus phonetic cues in infancy

    Science.gov (United States)

    Cardillo, Gina; Kuhl, Patricia; Sundara, Megha

    2005-09-01

    Infants in their first year are highly sensitive to different acoustic components of speech, including phonetic detail and pitch information. The present investigation examined whether relative sensitivity to these two dimensions changes during this period, as the infant acquires language-specific phonetic categories. If pitch and phonetic discrimination are hierarchical, then the relative salience of pitch and phonetic change may become reversed between 8 and 12 months of age. Thirty-two- and 47-week-old infants were tested using an auditory preference paradigm in which they first heard a recording of a person singing a 4-note song (i.e., ``go-bi-la-tu'') and were then presented with both the familiar and an unfamiliar, modified version of that song. Modifications were either a novel pitch order (keeping syllables constant) or a novel syllable order (keeping melody constant). Compared to the younger group, older infants were predicted to show greater relative sensitivity to syllable order than pitch order, in accordance with an increased tendency to attend to linguistically relevant information (phonetic patterns) as opposed to cues that are initially more salient (pitch patterns). Preliminary data show trends toward the predicted interaction, with preference patterns commensurate with previously reported data. [Work supported by the McDonnell Foundation and NIH.

  12. Effects of Verbal Cues versus Pictorial Cues on the Transfer of Stimulus Control for Children with Autism

    Science.gov (United States)

    West, Elizabeth Anne

    2008-01-01

    The author examined the transfer of stimulus control from instructor assistance to verbal cues and pictorial cues. The intent was to determine whether it is easier to transfer stimulus control to one form of cue or the other. No studies have conducted such comparisons to date; however, literature exists to suggest that visual cues may be…

  13. PCA-Based Speech Enhancement for Distorted Speech Recognition

    Directory of Open Access Journals (Sweden)

    Tetsuya Takiguchi

    2007-09-01

    Full Text Available We investigated a robust speech feature extraction method using kernel PCA (Principal Component Analysis for distorted speech recognition. Kernel PCA has been suggested for various image processing tasks requiring an image model, such as denoising, where a noise-free image is constructed from a noisy input image. Much research for robust speech feature extraction has been done, but it remains difficult to completely remove additive or convolution noise (distortion. The most commonly used noise-removal techniques are based on the spectraldomain operation, and then for speech recognition, the MFCC (Mel Frequency Cepstral Coefficient is computed, where DCT (Discrete Cosine Transform is applied to the mel-scale filter bank output. This paper describes a new PCA-based speech enhancement algorithm using kernel PCA instead of DCT, where the main speech element is projected onto low-order features, while the noise or distortion element is projected onto high-order features. Its effectiveness is confirmed by word recognition experiments on distorted speech.

  14. Preschoolers' Learning of Brand Names from Visual Cues.

    OpenAIRE

    Macklin, M Carole

    1996-01-01

    This research addresses the question of how perceptual cues affect preschoolers' learning of brand names. It is found that when visual cues are provided in addition to brand names that are prior-associated in children's memory structures, children better remember the brand names. Although two cues (a picture and a color) improve memory over the imposition of a single cue, extensive visual cues may overtax young children's processing abilities. The study contributes to our understanding of how...

  15. Hate Speech or Free Speech: Can Broad Campus Speech Regulations Survive Current Judicial Reasoning?

    Science.gov (United States)

    Heiser, Gregory M.; Rossow, Lawrence F.

    1993-01-01

    Federal courts have found speech regulations overbroad in suits against the University of Michigan and the University of Wisconsin System. Attempts to assess the theoretical justification and probable fate of broad speech regulations that have not been explicitly rejected by the courts. Concludes that strong arguments for broader regulation will…

  16. Counterconditioning reduces cue-induced craving and actual cue-elicited consumption.

    NARCIS (Netherlands)

    D. van Gucht; F. Baeyens; D. Vansteenwegen; D. Hermans; T. Beckers

    2010-01-01

    Cue-induced craving is not easily reduced by an extinction or exposure procedure and may constitute an important route toward relapse in addictive behavior after treatment. In the present study, we investigated the effectiveness of counterconditioning as an alternative procedure to reduce cue-induce

  17. Cues for Better Writing: Empirical Assessment of a Word Counter and Cueing Application's Effectiveness

    Science.gov (United States)

    Vijayasarathy, Leo R.; Gould, Susan Martin; Gould, Michael

    2015-01-01

    Written clarity and conciseness are desired by employers and emphasized in business communication courses. We developed and tested the efficacy of a cueing tool--Scribe Bene--to help students reduce their use of imprecise and ambiguous words and wordy phrases. Effectiveness was measured by comparing cue word usage between a treatment group given…

  18. Effect of stimuli presentation method on perception of room size using only acoustic cues

    Science.gov (United States)

    Hunt, Jeffrey Barnabas

    People listen to music and speech in a large variety of rooms and many room parameters, including the size of the room, can drastically affect how well the speech is understood or the music enjoyed. While multi-modal (typically hearing and sight) tests may be more realistic, in order to isolate what acoustic cues listeners use to determine the size of a room, a listening-only tests is conducted here. Nearly all of the studies to-date on the perception of room volume using acoustic cues have presented the stimuli only over headphones and these studies have reported that, in most cases, the perceived room volume is more highly correlated with the perceived reverberation (reverberance) than with actual room volume. While reverberance may be a salient acoustic cue used for the determination or room size, the actual sound field in a room is not accurately reproduced when presented over headphones and it is thought that some of the complexities of the sound field that relate to perception of geometric volume, specifically directional information of reflections, may be lost. It is possible that the importance of reverberance may be overemphasized when using only headphones to present stimuli so a comparison of room-size perception is proposed where the sound field (from modeled and recorded impulse responses) is presented both over headphones and also over a surround system using higher order ambisonics to more accurately produce directional sound information. Major results are that, in this study, no difference could be seen between the two presentation methods and that reverberation time is highly correlated to room-size perception while real room size is not.

  19. Hate Speech/Free Speech: Using Feminist Perspectives To Foster On-Campus Dialogue.

    Science.gov (United States)

    Cornwell, Nancy; Orbe, Mark P.; Warren, Kiesha

    1999-01-01

    Explores the complex issues inherent in the tension between hate speech and free speech, focusing on the phenomenon of hate speech on college campuses. Describes the challenges to hate speech made by critical race theorists and explains how a feminist critique can reorient the parameters of hate speech. (SLD)

  20. Early syllabic segmentation of fluent speech by infants acquiring French.

    Directory of Open Access Journals (Sweden)

    Louise Goyet

    Full Text Available Word form segmentation abilities emerge during the first year of life, and it has been proposed that infants initially rely on two types of cues to extract words from fluent speech: Transitional Probabilities (TPs and rhythmic units. The main goal of the present study was to use the behavioral method of the Headturn Preference Procedure (HPP to investigate again rhythmic segmentation of syllabic units by French-learning infants at the onset of segmentation abilities (around 8 months given repeated failure to find syllabic segmentation at such a young age. The second goal was to explore the interaction between the use of TPs and syllabic units for segmentation by French-learning infants. The rationale was that decreasing TP cues around target syllables embedded in bisyllabic words would block bisyllabic word segmentation and facilitate the observation of syllabic segmentation. In Experiments 1 and 2, infants were tested in a condition of moderate TP decrease; no evidence of either syllabic or bisyllabic word segmentation was found. In Experiment 3, infants were tested in a condition of more marked TP decrease, and a novelty syllabic segmentation effect was observed. Therefore, the present study first establishes early syllabic segmentation in French-learning infants, bringing support from a syllable-based language to the proposal that rhythmic units are used at the onset of segmentation abilities. Second, it confirms that French-learning infants are sensitive to TP cues. Third, it demonstrates that they are sensitive to the relative weight of TP and rhythmic cues, explaining why effects of syllabic segmentation are not observed in context of high TPs. These findings are discussed in relation to theories of word segmentation bootstrapping, and the larger debate about statistically- versus prosodically-based accounts of early language acquisition.