WorldWideScience

Sample records for audio-visual speech cue

  1. Cross-Modal Matching of Audio-Visual German and French Fluent Speech in Infancy

    OpenAIRE

    Kubicek, Claudia; Hillairet de Boisferon, Anne; Dupierrix, Eve; Pascalis, Olivier; Lœvenbruck, Hélène; Gervain, Judit; Schwarzer, Gudrun

    2014-01-01

    The present study examined when and how the ability to cross-modally match audio-visual fluent speech develops in 4.5-, 6- and 12-month-old German-learning infants. In Experiment 1, 4.5- and 6-month-old infants' audio-visual matching ability of native (German) and non-native (French) fluent speech was assessed by presenting auditory and visual speech information sequentially, that is, in the absence of temporal synchrony cues. The results showed that 4.5-month-old infants were capable of matc...

  2. Deep Multimodal Learning for Audio-Visual Speech Recognition

    OpenAIRE

    Mroueh, Youssef; Marcheret, Etienne; Goel, Vaibhava

    2015-01-01

    In this paper, we present methods in deep multimodal learning for fusing speech and visual modalities for Audio-Visual Automatic Speech Recognition (AV-ASR). First, we study an approach where uni-modal deep networks are trained separately and their final hidden layers fused to obtain a joint feature space in which another deep network is built. While the audio network alone achieves a phone error rate (PER) of $41\\%$ under clean condition on the IBM large vocabulary audio-visual studio datase...

  3. Do gender differences in audio-visual benefit and visual influence in audio-visual speech perception emerge with age?

    Directory of Open Access Journals (Sweden)

    Magnus eAlm

    2015-07-01

    Full Text Available Gender and age have been found to affect adults’ audio-visual (AV speech perception. However, research on adult aging focuses on adults over 60 years, who have an increasing likelihood for cognitive and sensory decline, which may confound positive effects of age-related AV-experience and its interaction with gender. Observed age and gender differences in AV speech perception may also depend on measurement sensitivity and AV task difficulty. Consequently both AV benefit and visual influence were used to measure visual contribution for gender-balanced groups of young (20-30 years and middle-aged adults (50-60 years with task difficulty varied using AV syllables from different talkers in alternative auditory backgrounds. Females had better speech-reading performance than males. Whereas no gender differences in AV benefit or visual influence were observed for young adults, visually influenced responses were significantly greater for middle-aged females than middle-aged males. That speech-reading performance did not influence AV benefit may be explained by visual speech extraction and AV integration constituting independent abilities. Contrastingly, the gender difference in visually influenced responses in middle adulthood may reflect an experience-related shift in females’ general AV perceptual strategy. Although young females’ speech-reading proficiency may not readily contribute to greater visual influence, between young and middle-adulthood recurrent confirmation of the contribution of visual cues induced by speech-reading proficiency may gradually shift females AV perceptual strategy towards more visually dominated responses.

  4. Audio-Visual Speech Perception: A Developmental ERP Investigation

    Science.gov (United States)

    Knowland, Victoria C. P.; Mercure, Evelyne; Karmiloff-Smith, Annette; Dick, Fred; Thomas, Michael S. C.

    2014-01-01

    Being able to see a talking face confers a considerable advantage for speech perception in adulthood. However, behavioural data currently suggest that children fail to make full use of these available visual speech cues until age 8 or 9. This is particularly surprising given the potential utility of multiple informational cues during language…

  5. Audio-Visual Speech Intelligibility Benefits with Bilateral Cochlear Implants when Talker Location Varies

    OpenAIRE

    van Hoesel, Richard J. M.

    2015-01-01

    One of the key benefits of using cochlear implants (CIs) in both ears rather than just one is improved localization. It is likely that in complex listening scenes, improved localization allows bilateral CI users to orient toward talkers to improve signal-to-noise ratios and gain access to visual cues, but to date, that conjecture has not been tested. To obtain an objective measure of that benefit, seven bilateral CI users were assessed for both auditory-only and audio-visual speech intelligib...

  6. Effects of Audio-Visual Information on the Intelligibility of Alaryngeal Speech

    Science.gov (United States)

    Evitts, Paul M.; Portugal, Lindsay; Van Dine, Ami; Holler, Aline

    2010-01-01

    Background: There is minimal research on the contribution of visual information on speech intelligibility for individuals with a laryngectomy (IWL). Aims: The purpose of this project was to determine the effects of mode of presentation (audio-only, audio-visual) on alaryngeal speech intelligibility. Method: Twenty-three naive listeners were…

  7. Speech Enhancement and Recognition in Meetings with an Audio-Visual Sensor Array

    OpenAIRE

    Maganti, Hari Krishna; Gatica-Perez, Daniel; McCowan, Iain A.

    2006-01-01

    We address the problem of distant speech acquisition in multi-party meetings, using multiple microphones and cameras. Microphone array beamforming techniques present a potential alternative to close-talking microphones by providing speech enhancement through spatial filtering and directional discrimination. Beamforming techniques rely on the knowledge of a speaker location. In this paper, we present an integrated approach, in which an audio-visual multi-person tracker is used to track active ...

  8. Audio-Visual Speech Recognition Using MPEG-4 Compliant Visual Features

    Directory of Open Access Journals (Sweden)

    Aleksic Petar S

    2002-01-01

    Full Text Available We describe an audio-visual automatic continuous speech recognition system, which significantly improves speech recognition performance over a wide range of acoustic noise levels, as well as under clean audio conditions. The system utilizes facial animation parameters (FAPs supported by the MPEG-4 standard for the visual representation of speech. We also describe a robust and automatic algorithm we have developed to extract FAPs from visual data, which does not require hand labeling or extensive training procedures. The principal component analysis (PCA was performed on the FAPs in order to decrease the dimensionality of the visual feature vectors, and the derived projection weights were used as visual features in the audio-visual automatic speech recognition (ASR experiments. Both single-stream and multistream hidden Markov models (HMMs were used to model the ASR system, integrate audio and visual information, and perform a relatively large vocabulary (approximately 1000 words speech recognition experiments. The experiments performed use clean audio data and audio data corrupted by stationary white Gaussian noise at various SNRs. The proposed system reduces the word error rate (WER by 20% to 23% relatively to audio-only speech recognition WERs, at various SNRs (0–30 dB with additive white Gaussian noise, and by 19% relatively to audio-only speech recognition WER under clean audio conditions.

  9. Noise Adaptive Stream Weighting in Audio-Visual Speech Recognition

    Directory of Open Access Journals (Sweden)

    Martin Heckmann

    2002-11-01

    Full Text Available It has been shown that integration of acoustic and visual information especially in noisy conditions yields improved speech recognition results. This raises the question of how to weight the two modalities in different noise conditions. Throughout this paper we develop a weighting process adaptive to various background noise situations. In the presented recognition system, audio and video data are combined following a Separate Integration (SI architecture. A hybrid Artificial Neural Network/Hidden Markov Model (ANN/HMM system is used for the experiments. The neural networks were in all cases trained on clean data. Firstly, we evaluate the performance of different weighting schemes in a manually controlled recognition task with different types of noise. Next, we compare different criteria to estimate the reliability of the audio stream. Based on this, a mapping between the measurements and the free parameter of the fusion process is derived and its applicability is demonstrated. Finally, the possibilities and limitations of adaptive weighting are compared and discussed.

  10. ANALYSIS OF MULTIMODAL FUSION TECHNIQUES FOR AUDIO-VISUAL SPEECH RECOGNITION

    Directory of Open Access Journals (Sweden)

    D.V. Ivanko

    2016-05-01

    Full Text Available The paper deals with analytical review, covering the latest achievements in the field of audio-visual (AV fusion (integration of multimodal information. We discuss the main challenges and report on approaches to address them. One of the most important tasks of the AV integration is to understand how the modalities interact and influence each other. The paper addresses this problem in the context of AV speech processing and speech recognition. In the first part of the review we set out the basic principles of AV speech recognition and give the classification of audio and visual features of speech. Special attention is paid to the systematization of the existing techniques and the AV data fusion methods. In the second part we provide a consolidated list of tasks and applications that use the AV fusion based on carried out analysis of research area. We also indicate used methods, techniques, audio and video features. We propose classification of the AV integration, and discuss the advantages and disadvantages of different approaches. We draw conclusions and offer our assessment of the future in the field of AV fusion. In the further research we plan to implement a system of audio-visual Russian continuous speech recognition using advanced methods of multimodal fusion.

  11. Audio-Visual Speech Recognition Using Lip Information Extracted from Side-Face Images

    Directory of Open Access Journals (Sweden)

    Koji Iwano

    2007-03-01

    Full Text Available This paper proposes an audio-visual speech recognition method using lip information extracted from side-face images as an attempt to increase noise robustness in mobile environments. Our proposed method assumes that lip images can be captured using a small camera installed in a handset. Two different kinds of lip features, lip-contour geometric features and lip-motion velocity features, are used individually or jointly, in combination with audio features. Phoneme HMMs modeling the audio and visual features are built based on the multistream HMM technique. Experiments conducted using Japanese connected digit speech contaminated with white noise in various SNR conditions show effectiveness of the proposed method. Recognition accuracy is improved by using the visual information in all SNR conditions. These visual features were confirmed to be effective even when the audio HMM was adapted to noise by the MLLR method.

  12. The effect of combined sensory and semantic components on audio-visual speech perception in older adults

    Directory of Open Access Journals (Sweden)

    Corrina Maguinness

    2011-12-01

    Full Text Available Previous studies have found that perception in older people benefits from multisensory over uni-sensory information. As normal speech recognition is affected by both the auditory input and the visual lip-movements of the speaker, we investigated the efficiency of audio and visual integration in an older population by manipulating the relative reliability of the auditory and visual information in speech. We also investigated the role of the semantic context of the sentence to assess whether audio-visual integration is affected by top-down semantic processing. We presented participants with audio-visual sentences in which the visual component was either blurred or not blurred. We found that there was a greater cost in recall performance for semantically meaningless speech in the audio-visual blur compared to audio-visual no blur condition and this effect was specific to the older group. Our findings have implications for understanding how aging affects efficient multisensory integration for the perception of speech and suggests that multisensory inputs may benefit speech perception in older adults when the semantic content of the speech is unpredictable.

  13. Audio-visual speech timing sensitivity is enhanced in cluttered conditions.

    Directory of Open Access Journals (Sweden)

    Warrick Roseboom

    Full Text Available Events encoded in separate sensory modalities, such as audition and vision, can seem to be synchronous across a relatively broad range of physical timing differences. This may suggest that the precision of audio-visual timing judgments is inherently poor. Here we show that this is not necessarily true. We contrast timing sensitivity for isolated streams of audio and visual speech, and for streams of audio and visual speech accompanied by additional, temporally offset, visual speech streams. We find that the precision with which synchronous streams of audio and visual speech are identified is enhanced by the presence of additional streams of asynchronous visual speech. Our data suggest that timing perception is shaped by selective grouping processes, which can result in enhanced precision in temporally cluttered environments. The imprecision suggested by previous studies might therefore be a consequence of examining isolated pairs of audio and visual events. We argue that when an isolated pair of cross-modal events is presented, they tend to group perceptually and to seem synchronous as a consequence. We have revealed greater precision by providing multiple visual signals, possibly allowing a single auditory speech stream to group selectively with the most synchronous visual candidate. The grouping processes we have identified might be important in daily life, such as when we attempt to follow a conversation in a crowded room.

  14. Neural entrainment to rhythmically-presented auditory, visual and audio-visual speech in children

    Directory of Open Access Journals (Sweden)

    Alan James Power

    2012-07-01

    Full Text Available Auditory cortical oscillations have been proposed to play an important role in speech perception. It is suggested that the brain may take temporal ‘samples’ of information from the speech stream at different rates, phase-resetting ongoing oscillations so that they are aligned with similar frequency bands in the input (‘phase locking’. Information from these frequency bands is then bound together for speech perception. To date, there are no explorations of neural phase-locking and entrainment to speech input in children. However, it is clear from studies of language acquisition that infants use both visual speech information and auditory speech information in learning. In order to study neural entrainment to speech in typically-developing children, we use a rhythmic entrainment paradigm (underlying 2 Hz or delta rate based on repetition of the syllable ba, presented in either the auditory modality alone, the visual modality alone, or as auditory-visual speech (via a talking head. To ensure attention to the task, children aged 13 years were asked to press a button as fast as possible when the ba stimulus violated the rhythm for each stream type. Rhythmic violation depended on delaying the occurrence of a ba in the isochronous stream. Neural entrainment was demonstrated for all stream types, and individual differences in standardized measures of language processing were related to auditory entrainment at the theta rate. Further, there was significant modulation of the preferred phase of auditory entrainment in the theta band when visual speech cues were present, indicating cross-modal phase resetting. The rhythmic entrainment paradigm developed here offers a method for exploring individual differences in oscillatory phase locking during development. In particular, a method for assessing neural entrainment and cross-modal phase resetting would be useful for exploring developmental learning difficulties thought to involve temporal sampling

  15. Superior temporal activation in response to dynamic audio-visual emotional cues

    OpenAIRE

    Robins, Diana L.; Hunyadi, Elinora; Schultz, Robert T.

    2008-01-01

    Perception of emotion is critical for successful social interaction, yet the neural mechanisms underlying the perception of dynamic, audiovisual emotional cues are poorly understood. Evidence from language and sensory paradigms suggests that the superior temporal sulcus and gyrus (STS/STG) play a key role in the integration of auditory and visual cues. Emotion perception research has focused on static facial cues; however, dynamic audiovisual (AV) cues mimic real-world social cues more accura...

  16. A Novel Algorithm for Acoustic and Visual Classifiers Decision Fusion in Audio-Visual Speech Recognition System

    Directory of Open Access Journals (Sweden)

    P.S. Sathidevi

    2010-03-01

    Full Text Available Audio-visual speech recognition (AVSR using acoustic and visual signals of speech have received attention recently because of its robustness in noisy environments. Perceptual studies also support this approach by emphasizing the importance of visual information for speech recognition in humans. An important issue in decision fusion based AVSR system is how to obtain the appropriate integration weight for the speech modalities to integrate and ensure the combined AVSR system’s performances better than that of the audio-only and visual-only systems under various noise conditions. To solve this issue, we present a genetic algorithm (GA based optimization scheme to obtain the appropriate integration weight from the relative reliability of each modality. The performance of the proposed GA optimized reliability-ratio based weight estimation scheme is demonstrated via single speaker, mobile functions isolated word recognition experiments. The results show that the proposed scheme improves robust recognition accuracy over the conventional unimodal systems and the baseline reliability ratio-based AVSR system under various signal to noise ratio conditions.

  17. A Cognitive Science Reasoning in Recognition of Emotions in Audio-Visual Speech

    OpenAIRE

    Slavova, Velina; Verhelst, Werner; Sahli, Hichem

    2008-01-01

    In this report we summarize the state-of-the-art of speech emotion recognition from the signal processing point of view. On the bases of multi-corporal experiments with machine-learning classifiers, the observation is made that existing approaches for supervised machine learning lead to database dependent classifiers which can not be applied for multi-language speech emotion recognition without additional training because they discriminate the emotion classes following the use...

  18. The Effect of Onset Asynchrony in Audio Visual Speech and the Uncanny Valley in Virtual Characters

    DEFF Research Database (Denmark)

    Tinwell, Angela; Grimshaw, Mark; Abdel Nabi, Deborah

    2015-01-01

    This study investigates if the Uncanny Valley phenomenon is increased for realistic, human-like characters with an asynchrony of lip movement during speech. An experiment was conducted in which 113 participants rated, a human and a realistic, talking-head, human-like, virtual character over a ran...

  19. Audio-visual speech perception in infants and toddlers with Down syndrome, fragile X syndrome, and Williams syndrome.

    Science.gov (United States)

    D'Souza, Dean; D'Souza, Hana; Johnson, Mark H; Karmiloff-Smith, Annette

    2016-08-01

    Typically-developing (TD) infants can construct unified cross-modal percepts, such as a speaking face, by integrating auditory-visual (AV) information. This skill is a key building block upon which higher-level skills, such as word learning, are built. Because word learning is seriously delayed in most children with neurodevelopmental disorders, we assessed the hypothesis that this delay partly results from a deficit in integrating AV speech cues. AV speech integration has rarely been investigated in neurodevelopmental disorders, and never previously in infants. We probed for the McGurk effect, which occurs when the auditory component of one sound (/ba/) is paired with the visual component of another sound (/ga/), leading to the perception of an illusory third sound (/da/ or /tha/). We measured AV integration in 95 infants/toddlers with Down, fragile X, or Williams syndrome, whom we matched on Chronological and Mental Age to 25 TD infants. We also assessed a more basic AV perceptual ability: sensitivity to matching vs. mismatching AV speech stimuli. Infants with Williams syndrome failed to demonstrate a McGurk effect, indicating poor AV speech integration. Moreover, while the TD children discriminated between matching and mismatching AV stimuli, none of the other groups did, hinting at a basic deficit or delay in AV speech processing, which is likely to constrain subsequent language development. PMID:27498221

  20. Audio-visual gender recognition

    Science.gov (United States)

    Liu, Ming; Xu, Xun; Huang, Thomas S.

    2007-11-01

    Combining different modalities for pattern recognition task is a very promising field. Basically, human always fuse information from different modalities to recognize object and perform inference, etc. Audio-Visual gender recognition is one of the most common task in human social communication. Human can identify the gender by facial appearance, by speech and also by body gait. Indeed, human gender recognition is a multi-modal data acquisition and processing procedure. However, computational multimodal gender recognition has not been extensively investigated in the literature. In this paper, speech and facial image are fused to perform a mutli-modal gender recognition for exploring the improvement of combining different modalities.

  1. Audio-Visual Tibetan Speech Recognition Based on a Deep Dynamic Bayesian Network for Natural Human Robot Interaction

    Directory of Open Access Journals (Sweden)

    Yue Zhao

    2012-12-01

    Full Text Available Audio‐visual speech recognition is a natural and robust approach to improving human-robot interaction in noisy environments. Although multi‐stream Dynamic Bayesian Network and coupled HMM are widely used for audio‐visual speech recognition, they fail to learn the shared features between modalities and ignore the dependency of features among the frames within each discrete state. In this paper, we propose a Deep Dynamic Bayesian Network (DDBN to perform unsupervised extraction of spatial‐temporal multimodal features from Tibetan audio‐visual speech data and build an accurate audio‐visual speech recognition model under a no frame‐independency assumption. The experiment results on Tibetan speech data from some real‐world environments showed the proposed DDBN outperforms the state‐of‐art methods in word recognition accuracy.

  2. Basic to Applied Research: The Benefits of Audio-Visual Speech Perception Research in Teaching Foreign Languages

    Science.gov (United States)

    Erdener, Dogu

    2016-01-01

    Traditionally, second language (L2) instruction has emphasised auditory-based instruction methods. However, this approach is restrictive in the sense that speech perception by humans is not just an auditory phenomenon but a multimodal one, and specifically, a visual one as well. In the past decade, experimental studies have shown that the…

  3. Audio-Visual Aids: Historians in Blunderland.

    Science.gov (United States)

    Decarie, Graeme

    1988-01-01

    A history professor relates his experiences producing and using audio-visual material and warns teachers not to rely on audio-visual aids for classroom presentations. Includes examples of popular audio-visual aids on Canada that communicate unintended, inaccurate, or unclear ideas. Urges teachers to exercise caution in the selection and use of…

  4. Alfasecuencialización: la enseñanza del cine en la era del audiovisual Sequential literacy: the teaching of cinema in the age of audio-visual speech

    Directory of Open Access Journals (Sweden)

    José Antonio Palao Errando

    2007-10-01

    Full Text Available En la llamada «sociedad de la información» los estudios sobre cine se han visto diluidos en el abordaje pragmático y tecnológico del discurso audiovisual, así como la propia fruición del cine se ha visto atrapada en la red del DVD y del hipertexto. El propio cine reacciona ante ello a través de estructuras narrativas complejas que lo alejan del discurso audiovisual estándar. La función de los estudios sobre cine y de su enseñanza universitaria debe ser la reintroducción del sujeto rechazado del saber informativo por medio de la interpretación del texto fílmico. In the so called «information society», film studies have been diluted in the pragmatic and technological approaching of the audiovisual speech, as well as the own fruition of the cinema has been caught in the net of DVD and hypertext. The cinema itself reacts in the face of it through complex narrative structures that take it away from the standard audio-visual speech. The function of film studies at the university education should be the reintroduction of the rejected subject of the informative knowledge by means of the interpretation of film text.

  5. Audio-Visual Aids in Universities

    Science.gov (United States)

    Douglas, Jackie

    1970-01-01

    A report on the proceedings and ideas expressed at a one day seminar on "Audio-Visual Equipment--Its Uses and Applications for Teaching and Research in Universities." The seminar was organized by England's National Committee for Audio-Visual Aids in Education in conjunction with the British Universities Film Council. (LS)

  6. Temporal structure and complexity affect audio-visual correspondence detection

    Directory of Open Access Journals (Sweden)

    Rachel N Denison

    2013-01-01

    Full Text Available Synchrony between events in different senses has long been considered the critical temporal cue for multisensory integration. Here, using rapid streams of auditory and visual events, we demonstrate how humans can use temporal structure (rather than mere temporal coincidence to detect multisensory relatedness. We find psychophysically that participants can detect matching auditory and visual streams via shared temporal structure for crossmodal lags of up to 200 ms. Performance on this task reproduced features of past findings based on explicit timing judgments but did not show any special advantage for perfectly synchronous streams. Importantly, the complexity of temporal patterns influences sensitivity to correspondence. Stochastic, irregular streams – with richer temporal pattern information – led to higher audio-visual matching sensitivity than predictable, rhythmic streams. Our results reveal that temporal structure and its complexity are key determinants for human detection of audio-visual correspondence. The distinctive emphasis of our new paradigms on temporal patterning could be useful for studying special populations with suspected abnormalities in audio-visual temporal perception and multisensory integration.

  7. Crossmodal and incremental perception of audiovisual cues to emotional speech

    NARCIS (Netherlands)

    Barkhuysen, Pashiera; Krahmer, E.J.; Swerts, M.G.J.

    2010-01-01

    In this article we report on two experiments about the perception of audiovisual cues to emotional speech. The article addresses two questions: (1) how do visual cues from a speaker's face to emotion relate to auditory cues, and (2) what is the recognition speed for various facial cues to emotion? B

  8. Crossmodal and Incremental Perception of Audiovisual Cues to Emotional Speech

    Science.gov (United States)

    Barkhuysen, Pashiera; Krahmer, Emiel; Swerts, Marc

    2010-01-01

    In this article we report on two experiments about the perception of audiovisual cues to emotional speech. The article addresses two questions: (1) how do visual cues from a speaker's face to emotion relate to auditory cues, and (2) what is the recognition speed for various facial cues to emotion? Both experiments reported below are based on tests…

  9. Joint evaluation of communication quality and user experience in an audio-visual virtual reality meeting

    DEFF Research Database (Denmark)

    Møller, Anders Kalsgaard; Hoffmann, Pablo F.; Carrozzino, Marcello;

    2013-01-01

    The state-of-the-art speech intelligibility tests are created with the purpose of evaluating acoustic communication devices and not for evaluating audio-visual virtual reality systems. This paper present a novel method to evaluate a communication situation based on both the speech intelligibility...

  10. Audio-visual classification video browser

    OpenAIRE

    Scott, David; Zhang, ZhenXing; Albatal, Rami; McGuinness, Kevin; Acar, Esra; Hopfgartner, Frank; Gurrin, Cathal; O'Connor, Noel; Smeaton, Alan

    2014-01-01

    This paper presents our third participation in the Video Browser Showdown. Building on the experience that we gained while participating in this event, we compete in the 2014 showdown with a more advanced browsing system based on incorporating several audio- visual retrieval techniques. This paper provides a short overview of the features and functionality of our new system.

  11. Segmentation cues in conversational speech: Robust semantics and fragile phonotactics

    Directory of Open Access Journals (Sweden)

    Laurence eWhite

    2012-10-01

    Full Text Available Multiple cues influence listeners’ segmentation of connected speech into words, but most previous studies have used stimuli elicited in careful readings rather than natural conversation. Discerning word boundaries in conversational speech may differ from the laboratory setting. In particular, a speaker’s articulatory effort – hyperarticulation vs hypoarticulation (H&H – may vary according to communicative demands, suggesting a compensatory relationship whereby acoustic-phonetic cues are attenuated when other information sources strongly guide segmentation. We examined how listeners’ interpretation of segmentation cues is affected by speech style (spontaneous conversation vs read, using cross-modal identity priming. To elicit spontaneous stimuli, we used a map task in which speakers discussed routes around stylised landmarks. These landmarks were two-word phrases in which the strength of potential segmentation cues – semantic likelihood and cross-boundary diphone phonotactics – was systematically varied. Landmark-carrying utterances were transcribed and later re-recorded as read speech.Independent of speech style, we found an interaction between cue valence (favourable/unfavourable and cue type (phonotactics/semantics. Thus, there was an effect of semantic plausibility, but no effect of cross-boundary phonotactics, indicating that the importance of phonotactic segmentation may have been overstated in studies where lexical information was artificially suppressed. These patterns were unaffected by whether the stimuli were elicited in a spontaneous or read context, even though the difference in speech styles was evident in a main effect. Durational analyses suggested speaker-driven cue trade-offs congruent with an H&H account, but these modulations did not impact on listener behaviour. We conclude that previous research exploiting read speech is reliable in indicating the primacy of lexically-based cues in the segmentation of natural

  12. Stream Weight Training Based on MCE for Audio-Visual LVCSR

    Institute of Scientific and Technical Information of China (English)

    LIU Peng; WANG Zuoying

    2005-01-01

    In this paper we address the problem of audio-visual speech recognition in the framework of the multi-stream hidden Markov model. Stream weight training based on minimum classification error criterion is discussed for use in large vocabulary continuous speech recognition (LVCSR). We present the lattice re-scoring and Viterbi approaches for calculating the loss function of continuous speech. The experimental results show that in the case of clean audio, the system performance can be improved by 36.1% in relative word error rate reduction when using state-based stream weights trained by a Viterbi approach, compared to an audio only speech recognition system. Further experimental results demonstrate that our audio-visual LVCSR system provides significant enhancement of robustness in noisy environments.

  13. Audio-visual affective expression recognition

    Science.gov (United States)

    Huang, Thomas S.; Zeng, Zhihong

    2007-11-01

    Automatic affective expression recognition has attracted more and more attention of researchers from different disciplines, which will significantly contribute to a new paradigm for human computer interaction (affect-sensitive interfaces, socially intelligent environments) and advance the research in the affect-related fields including psychology, psychiatry, and education. Multimodal information integration is a process that enables human to assess affective states robustly and flexibly. In order to understand the richness and subtleness of human emotion behavior, the computer should be able to integrate information from multiple sensors. We introduce in this paper our efforts toward machine understanding of audio-visual affective behavior, based on both deliberate and spontaneous displays. Some promising methods are presented to integrate information from both audio and visual modalities. Our experiments show the advantage of audio-visual fusion in affective expression recognition over audio-only or visual-only approaches.

  14. Learning bimodal structure in audio-visual data

    OpenAIRE

    Monaci, Gianluca; Vandergheynst, Pierre; Sommer, Friederich T.

    2009-01-01

    A novel model is presented to learn bimodally informative structures from audio-visual signals. The signal is represented as a sparse sum of audio- visual kernels. Each kernel is a bimodal function consisting of synchronous snippets of an audio waveform and a spatio-temporal visual basis function. To represent an audio-visual signal, the kernels can be positioned independently and arbitrarily in space and time. The proposed algorithm uses unsupervised learning to form dicti...

  15. Cues That Language Users Exploit to Segment Speech

    Institute of Scientific and Technical Information of China (English)

    陈冰茹

    2015-01-01

    <正>The capability to segment words from fluent speech is an important step for learning and acquiring a language(Jusczyk,1999).Therefore,a number of researches and studies have focused on various cues that language learners exploit to locate word boundaries.During the half century,it has been discussed that there are mainly four crucial cues can be used by listeners to segment words in speech.Particularly,they are:(1)Prosody(Echols et al.1997;Jusczyk et al.1996):(2)Statistical and distributional regularities(Brent et al.1996;Saffran et al.1996);(3)Phonotactics(Brent et al.1996;Myers et al.1996);

  16. Omnidirectional Audio-Visual Talker Localization Based on Dynamic Fusion of Audio-Visual Features Using Validity and Reliability Criteria

    Science.gov (United States)

    Denda, Yuki; Nishiura, Takanobu; Yamashita, Yoichi

    This paper proposes a robust omnidirectional audio-visual (AV) talker localizer for AV applications. The proposed localizer consists of two innovations. One of them is robust omnidirectional audio and visual features. The direction of arrival (DOA) estimation using an equilateral triangular microphone array, and human position estimation using an omnidirectional video camera extract the AV features. The other is a dynamic fusion of the AV features. The validity criterion, called the audioor visual-localization counter, validates each audio- or visual-feature. The reliability criterion, called the speech arriving evaluator, acts as a dynamic weight to eliminate any prior statistical properties from its fusion procedure. The proposed localizer can compatibly achieve talker localization in a speech activity and user localization in a non-speech activity under the identical fusion rule. Talker localization experiments were conducted in an actual room to evaluate the effectiveness of the proposed localizer. The results confirmed that the talker localization performance of the proposed AV localizer using the validity and reliability criteria is superior to that of conventional localizers.

  17. The Practical Audio-Visual Handbook for Teachers.

    Science.gov (United States)

    Scuorzo, Herbert E.

    The use of audio/visual media as an aid to instruction is a common practice in today's classroom. Most teachers, however, have little or no formal training in this field and rarely a knowledgeable coordinator to help them. "The Practical Audio-Visual Handbook for Teachers" discusses the types and mechanics of many of these media forms and proposes…

  18. Audio visual information materials for risk communication

    International Nuclear Information System (INIS)

    Japan Nuclear Cycle Development Institute (JNC), Tokai Works set up the Risk Communication Study Team in January, 2001 to promote mutual understanding between the local residents and JNC. The Team has studied risk communication from various viewpoints and developed new methods of public relations which are useful for the local residents' risk perception toward nuclear issues. We aim to develop more effective risk communication which promotes a better mutual understanding of the local residents, by providing the risk information of the nuclear fuel facilities such a Reprocessing Plant and other research and development facilities. We explain the development process of audio visual information materials which describe our actual activities and devices for the risk management in nuclear fuel facilities, and our discussion through the effectiveness measurement. (author)

  19. Audio-Visual Perception System for a Humanoid Robotic Head

    Directory of Open Access Journals (Sweden)

    Raquel Viciana-Abad

    2014-05-01

    Full Text Available One of the main issues within the field of social robotics is to endow robots with the ability to direct attention to people with whom they are interacting. Different approaches follow bio-inspired mechanisms, merging audio and visual cues to localize a person using multiple sensors. However, most of these fusion mechanisms have been used in fixed systems, such as those used in video-conference rooms, and thus, they may incur difficulties when constrained to the sensors with which a robot can be equipped. Besides, within the scope of interactive autonomous robots, there is a lack in terms of evaluating the benefits of audio-visual attention mechanisms, compared to only audio or visual approaches, in real scenarios. Most of the tests conducted have been within controlled environments, at short distances and/or with off-line performance measurements. With the goal of demonstrating the benefit of fusing sensory information with a Bayes inference for interactive robotics, this paper presents a system for localizing a person by processing visual and audio data. Moreover, the performance of this system is evaluated and compared via considering the technical limitations of unimodal systems. The experiments show the promise of the proposed approach for the proactive detection and tracking of speakers in a human-robot interactive framework.

  20. Audio-visual perception system for a humanoid robotic head.

    Science.gov (United States)

    Viciana-Abad, Raquel; Marfil, Rebeca; Perez-Lorenzo, Jose M; Bandera, Juan P; Romero-Garces, Adrian; Reche-Lopez, Pedro

    2014-01-01

    One of the main issues within the field of social robotics is to endow robots with the ability to direct attention to people with whom they are interacting. Different approaches follow bio-inspired mechanisms, merging audio and visual cues to localize a person using multiple sensors. However, most of these fusion mechanisms have been used in fixed systems, such as those used in video-conference rooms, and thus, they may incur difficulties when constrained to the sensors with which a robot can be equipped. Besides, within the scope of interactive autonomous robots, there is a lack in terms of evaluating the benefits of audio-visual attention mechanisms, compared to only audio or visual approaches, in real scenarios. Most of the tests conducted have been within controlled environments, at short distances and/or with off-line performance measurements. With the goal of demonstrating the benefit of fusing sensory information with a Bayes inference for interactive robotics, this paper presents a system for localizing a person by processing visual and audio data. Moreover, the performance of this system is evaluated and compared via considering the technical limitations of unimodal systems. The experiments show the promise of the proposed approach for the proactive detection and tracking of speakers in a human-robot interactive framework. PMID:24878593

  1. HIFI-AV: An Audio-visual Corpus for Spoken Language Human-Machine Dialogue Research in Spanish

    OpenAIRE

    Fernández Martínez, Fernando; Lucas Cuesta, Juan Manuel; Barra Chicote, Roberto; Ferreiros López, Javier; Macías Guarasa, Javier

    2010-01-01

    In this paper, we describe a new multi-purpose audio-visual database on the context of speech interfaces for controlling household electronic devices. The database comprises speech and video recordings of 19 speakers interacting with a HIFI audio box by means of a spoken dialogue system. Dialogue management is based on Bayesian Networks and the system is provided with contextual information handling strategies. Each speaker was requested to fulfil different sets of specific goals following pred...

  2. Semantic Framing of Speech : Emotional and Topical Cues in Perception of Poorly Specified Speech

    OpenAIRE

    Lidestam, Björn

    2003-01-01

    The general aim of this thesis was to test the effects of paralinguistic (emotional) and prior contextual (topical) cues on perception of poorly specified visual, auditory, and audiovisual speech. The specific purposes were to (1) examine if facially displayed emotions can facilitate speechreading performance; (2) to study the mechanism for such facilitation; (3) to map information-processing factors that are involved in processing of poorly specified speech; and (4) to present a comprehensiv...

  3. Proper Use of Audio-Visual Aids: Essential for Educators.

    Science.gov (United States)

    Dejardin, Conrad

    1989-01-01

    Criticizes educators as the worst users of audio-visual aids and among the worst public speakers. Offers guidelines for the proper use of an overhead projector and the development of transparencies. (DMM)

  4. CAVA (human Communication: an Audio-Visual Archive)

    OpenAIRE

    Mahon, M. S.

    2009-01-01

    In order to investigate human communication and interaction, researchers need hours of audio-visual data, sometimes recorded over periods of months or years. The process of collecting, cataloguing and transcribing such valuable data is time-consuming and expensive. Once it is collected and ready to use, it makes sense to get the maximum value from it by reusing it and sharing it among the research community. But unlike highly-controlled experimental data, natural audio-visual data tends t...

  5. The Fungible Audio-Visual Mapping and its Experience

    Directory of Open Access Journals (Sweden)

    Adriana Sa

    2014-12-01

    Full Text Available This article questions how different sorts of audio-visual mappings may be perceived. Clearly perceivable cause and effect relationships can be problematic if one desires the audience to experience the music. Indeed perception would bias those sonic qualities that fit previous concepts of causation, subordinating other sonic qualities, which may form the relations between the sounds themselves. The question is, how can an audio-visual mapping produce a sense of causation, and simultaneously confound the actual cause-effect relationships. We call this a fungible audio-visual mapping; the present investigation seeks to glean its constitution and aspect. We report a study, which draws upon methods from experimental psychology to inform audio-visual instrument design and composition. The participants are shown several audio-visual mapping prototypes, and posed quantitative and qualitative questions. These questions respect to their sense of causation, and their sense of understanding the cause-effect relationships. The study shows that a fungible mapping requires both synchronized and seemingly non-related components – sufficient complexity to be confusing. As the specific cause-effect concepts remain inconclusive, the sense of causation embraces the whole. 

  6. The unity assumption facilitates cross-modal binding of musical, non-speech stimuli: The role of spectral and amplitude envelope cues.

    Science.gov (United States)

    Chuen, Lorraine; Schutz, Michael

    2016-07-01

    An observer's inference that multimodal signals originate from a common underlying source facilitates cross-modal binding. This 'unity assumption' causes asynchronous auditory and visual speech streams to seem simultaneous (Vatakis & Spence, Perception & Psychophysics, 69(5), 744-756, 2007). Subsequent tests of non-speech stimuli such as musical and impact events found no evidence for the unity assumption, suggesting the effect is speech-specific (Vatakis & Spence, Acta Psychologica, 127(1), 12-23, 2008). However, the role of amplitude envelope (the changes in energy of a sound over time) was not previously appreciated within this paradigm. Here, we explore whether previous findings suggesting speech-specificity of the unity assumption were confounded by similarities in the amplitude envelopes of the contrasted auditory stimuli. Experiment 1 used natural events with clearly differentiated envelopes: single notes played on either a cello (bowing motion) or marimba (striking motion). Participants performed an un-speeded temporal order judgments task; viewing audio-visually matched (e.g., marimba auditory with marimba video) and mismatched (e.g., cello auditory with marimba video) versions of stimuli at various stimulus onset asynchronies, and were required to indicate which modality was presented first. As predicted, participants were less sensitive to temporal order in matched conditions, demonstrating that the unity assumption can facilitate the perception of synchrony outside of speech stimuli. Results from Experiments 2 and 3 revealed that when spectral information was removed from the original auditory stimuli, amplitude envelope alone could not facilitate the influence of audiovisual unity. We propose that both amplitude envelope and spectral acoustic cues affect the percept of audiovisual unity, working in concert to help an observer determine when to integrate across modalities. PMID:27084701

  7. Learning to Match Auditory and Visual Speech Cues: Social Influences on Acquisition of Phonological Categories

    Science.gov (United States)

    Altvater-Mackensen, Nicole; Grossmann, Tobias

    2015-01-01

    Infants' language exposure largely involves face-to-face interactions providing acoustic and visual speech cues but also social cues that might foster language learning. Yet, both audiovisual speech information and social information have so far received little attention in research on infants' early language development. Using a preferential…

  8. Audio-Visual Aid in Teaching "Fatty Liver"

    Science.gov (United States)

    Dash, Sambit; Kamath, Ullas; Rao, Guruprasad; Prakash, Jay; Mishra, Snigdha

    2016-01-01

    Use of audio visual tools to aid in medical education is ever on a rise. Our study intends to find the efficacy of a video prepared on "fatty liver," a topic that is often a challenge for pre-clinical teachers, in enhancing cognitive processing and ultimately learning. We prepared a video presentation of 11:36 min, incorporating various…

  9. Audio/Visual Aids: A Study of the Effect of Audio/Visual Aids on the Comprehension Recall of Students.

    Science.gov (United States)

    Bavaro, Sandra

    A study investigated whether the use of audio/visual aids had an effect upon comprehension recall. Thirty fourth-grade students from an urban public school were randomly divided into two equal samples of 15. One group was given a story to read (print only), while the other group viewed a filmstrip of the same story, thereby utilizing audio/visual…

  10. Audio-visual voice activity detection

    Institute of Scientific and Technical Information of China (English)

    LIU Peng; WANG Zuo-ying

    2006-01-01

    In speech signal processing systems,frame-energy based voice activity detection (VAD) method may be interfered with the background noise and non-stationary characteristic of the frame-energy in voice segment.The purpose of this paper is to improve the performance and robustness of VAD by introducing visual information.Meanwhile,data-driven linear transformation is adopted in visual feature extraction,and a general statistical VAD model is designed.Using the general model and a two-stage fusion strategy presented in this paper,a concrete multimodal VAD system is built.Experiments show that a 55.0% relative reduction in frame error rate and a 98.5% relative reduction in sentence-breaking error rate are obtained when using multimodal VAD,compared to frame-energy based audio VAD.The results show that using multimodal method,sentence-breaking errors are almost avoided,and flame-detection performance is clearly improved, which proves the effectiveness of the visual modal in VAD.

  11. Audio-visual feedback improves the BCI performance in the navigational control of a humanoid robot.

    Science.gov (United States)

    Tidoni, Emmanuele; Gergondet, Pierre; Kheddar, Abderrahmane; Aglioti, Salvatore M

    2014-01-01

    Advancement in brain computer interfaces (BCI) technology allows people to actively interact in the world through surrogates. Controlling real humanoid robots using BCI as intuitively as we control our body represents a challenge for current research in robotics and neuroscience. In order to successfully interact with the environment the brain integrates multiple sensory cues to form a coherent representation of the world. Cognitive neuroscience studies demonstrate that multisensory integration may imply a gain with respect to a single modality and ultimately improve the overall sensorimotor performance. For example, reactivity to simultaneous visual and auditory stimuli may be higher than to the sum of the same stimuli delivered in isolation or in temporal sequence. Yet, knowledge about whether audio-visual integration may improve the control of a surrogate is meager. To explore this issue, we provided human footstep sounds as audio feedback to BCI users while controlling a humanoid robot. Participants were asked to steer their robot surrogate and perform a pick-and-place task through BCI-SSVEPs. We found that audio-visual synchrony between footsteps sound and actual humanoid's walk reduces the time required for steering the robot. Thus, auditory feedback congruent with the humanoid actions may improve motor decisions of the BCI's user and help in the feeling of control over it. Our results shed light on the possibility to increase robot's control through the combination of multisensory feedback to a BCI user. PMID:24987350

  12. Acoustic Features and Perceptive Cues of Songs and Dialogues in Whistled Speech: Convergences with Sung Speech

    CERN Document Server

    Meyer, Julien

    2007-01-01

    Whistled speech is a little studied local use of language shaped by several cultures of the world either for distant dialogues or for rendering traditional songs. This practice consists of an emulation of the voice thanks to a simple modulated pitch. It is therefore the result of a transformation of the vocal signal that implies simplifications in the frequency domain. The whistlers adapt their productions to the way each language combines the qualities of height perceived simultaneously by the human ear in the complex frequency spectrum of the spoken or sung voice (pitch, timbre). As a consequence, this practice underlines key acoustic cues for the intelligibility of the concerned languages. The present study provides an analysis of the acoustic and phonetic features selected by whistled speech in several traditions either in purely oral whistles (Spanish, Turkish, Mazatec) or in whistles produced with an instrument like a leaf (Akha, Hmong). It underlines the convergences with the strategies of the singing ...

  13. Phase Synchronization in Human EEG During Audio-Visual Stimulation

    Czech Academy of Sciences Publication Activity Database

    Teplan, M.; Šušmáková, K.; Paluš, Milan; Vejmelka, Martin

    2009-01-01

    Roč. 28, - (2009), s. 80-84. ISSN 1536-8378 Grant ostatní: Bilateral project between Slovak AS and AS CR(CZ-SK) Modern methods for evaluation of electrophysiological signals Source of funding: V - iné verejné zdroje Keywords : synchronization * EEG * wavelet * audio-visual stimulation Subject RIV: FH - Neurology Impact factor: 0.729, year: 2009

  14. Normal-Hearing Listeners’ and Cochlear Implant Users’ Perception of Pitch Cues in Emotional Speech

    OpenAIRE

    Gilbers, Steven; Fuller, Christina; Gilbers, Dicky; Broersma, Mirjam; Goudbeek, Martijn; Free, Rolien; Başkent, Deniz

    2015-01-01

    In cochlear implants (CIs), acoustic speech cues, especially for pitch, are delivered in a degraded form. This study’s aim is to assess whether due to degraded pitch cues, normal-hearing listeners and CI users employ different perceptual strategies to recognize vocal emotions, and, if so, how these differ. Voice actors were recorded pronouncing a nonce word in four different emotions: anger, sadness, joy, and relief. These recordings’ pitch cues were phonetically analyzed. The recordings were...

  15. Learning one-to-many mapping functions for audio-visual integrated perception

    Science.gov (United States)

    Lim, Jung-Hui; Oh, Do-Kwan; Lee, Soo-Young

    2010-04-01

    In noisy environment the human speech perception utilizes visual lip-reading as well as audio phonetic classification. This audio-visual integration may be done by combining the two sensory features at the early stage. Also, the top-down attention may integrate the two modalities. For the sensory feature fusion we introduce mapping functions between the audio and visual manifolds. Especially, we present an algorithm to provide one-to-many mapping function for the videoto- audio mapping. The top-down attention is also presented to integrate both the sensory features and classification results of both modalities, which is able to explain McGurk effect. Each classifier is separately implemented by the Hidden-Markov Model (HMM), but the two classifiers are combined at the top level and interact by the top-down attention.

  16. Audio-Visual Based Multi-Sample Fusion to Enhance Correlation Filters Speaker Verification System

    Directory of Open Access Journals (Sweden)

    Dzati Athiar Ramli

    2010-07-01

    Full Text Available In this study, we propose a novel approach for speaker verification system that uses a spectrogram image as features and Unconstrained Minimum Average Correlation Energy (UMACE filters as classifiers. Since speech signal is a behavioral signal, the speech data has a tendency not to consistently reproduce due to the change of speaking rates, health, emotional conditions, temperature and humidity. In order to overcome this problem, a modification of UMACE filters architecture is proposed by executing a multi-sample fusion using speech and lipreading data. So as to evaluate the outstanding fusion scheme, five multisample fusion strategies, i.e. maximum, minimum, median, average and majority vote are first experimented using thespeech signal data. Afterward, the performance of the audiovisualsystem using the enhanced UMACE filters is then tested. Here, lipreading data is combined to the audio samples pool and the outstanding fusion scheme that found in prior experiment is used as multi-sample fusion scheme. The Digit Database had been used for performance evaluation and the performance up to 99.64% is achieved by using the enhanced UMACE filters for the speech only system which is 6.89% improvement compared with the base line approach. Subsequently, the implementation of the audio-visual system is observed to be significant in order to broaden the PSR score interval between the authentic and imposter data as well as to further improve the performance of audio only system that offer toward a robust verification system.

  17. Voice activity detection using audio-visual information

    DEFF Research Database (Denmark)

    Petsatodis, Theodore; Pnevmatikakis, Aristodemos; Boukis, Christos

    2009-01-01

    An audio-visual voice activity detector that uses sensors positioned distantly from the speaker is presented. Its constituting unimodal detectors are based on the modeling of the temporal variation of audio and visual features using Hidden Markov Models; their outcomes are fused using a post......-decision scheme. The Mel-Frequency Cepstral Coefficients and the vertical mouth opening are the chosen audio and visual features respectively, both augmented with their first-order derivatives. The proposed system is assessed using far-field recordings from four different speakers and under various levels of...

  18. Effects of virtual speaker density and room reverberation on spatiotemporal thresholds of audio-visual motion coherence.

    Directory of Open Access Journals (Sweden)

    Narayan Sankaran

    Full Text Available The present study examined the effects of spatial sound-source density and reverberation on the spatiotemporal window for audio-visual motion coherence. Three different acoustic stimuli were generated in Virtual Auditory Space: two acoustically "dry" stimuli via the measurement of anechoic head-related impulse responses recorded at either 1° or 5° spatial intervals (Experiment 1, and a reverberant stimulus rendered from binaural room impulse responses recorded at 5° intervals in situ in order to capture reverberant acoustics in addition to head-related cues (Experiment 2. A moving visual stimulus with invariant localization cues was generated by sequentially activating LED's along the same radial path as the virtual auditory motion. Stimuli were presented at 25°/s, 50°/s and 100°/s with a random spatial offset between audition and vision. In a 2AFC task, subjects made a judgment of the leading modality (auditory or visual. No significant differences were observed in the spatial threshold based on the point of subjective equivalence (PSE or the slope of psychometric functions (β across all three acoustic conditions. Additionally, both the PSE and β did not significantly differ across velocity, suggesting a fixed spatial window of audio-visual separation. Findings suggest that there was no loss in spatial information accompanying the reduction in spatial cues and reverberation levels tested, and establish a perceptual measure for assessing the veracity of motion generated from discrete locations and in echoic environments.

  19. Perceiving speech in context: Compensation for contextual variability during acoustic cue encoding and categorization

    Science.gov (United States)

    Toscano, Joseph Christopher

    Several fundamental questions about speech perception concern how listeners understand spoken language despite considerable variability in speech sounds across different contexts (the problem of lack of invariance in speech). This contextual variability is caused by several factors, including differences between individual talkers' voices, variation in speaking rate, and effects of coarticulatory context. A number of models have been proposed to describe how the speech system handles differences across contexts. Critically, these models make different predictions about (1) whether contextual variability is handled at the level of acoustic cue encoding or categorization, (2) whether it is driven by feedback from category-level processes or interactions between cues, and (3) whether listeners discard fine-grained acoustic information to compensate for contextual variability. Separating the effects of cue- and category-level processing has been difficult because behavioral measures tap processes that occur well after initial cue encoding and are influenced by task demands and linguistic information. Recently, we have used the event-related brain potential (ERP) technique to examine cue encoding and online categorization. Specifically, we have looked at differences in the auditory N1 as a measure of acoustic cue encoding and the P3 as a measure of categorization. This allows us to examine multiple levels of processing during speech perception and can provide a useful tool for studying effects of contextual variability. Here, I apply this approach to determine the point in processing at which context has an effect on speech perception and to examine whether acoustic cues are encoded continuously. Several types of contextual variability (talker gender, speaking rate, and coarticulation), as well as several acoustic cues (voice onset time, formant frequencies, and bandwidths), are examined in a series of experiments. The results suggest that (1) at early stages of speech

  20. Aided and Unaided Speech Supplementation Strategies: Effect of Alphabet Cues and Iconic Hand Gestures on Dysarthric Speech

    Science.gov (United States)

    Hustad, Katherine C.; Garcia, Jane Mertz

    2005-01-01

    Purpose: This study compared the influence of speaker-implemented iconic hand gestures and alphabet cues on speech intelligibility scores and strategy helpfulness ratings for 3 adults with cerebral palsy and dysarthria who differed from one another in their overall motor abilities. Method: A total of 144 listeners (48 per speaker) orthographically…

  1. Durational cues to word boundaries in clear speech

    OpenAIRE

    Cutler, A.; Butterfield, S.

    1990-01-01

    One of a listener’s major tasks in understanding continuous speech in segmenting the speech signal into separate words. When listening conditions are difficult, speakers can help listeners by deliberately clear speech. We found that speakers do indeed attempt to makr word boundaries; moreover, they differentiate between word boundaries in a way which suggest they are sensitive to listener needs. Application of heuristic segmentation strategies makes word boundaries before strong syllables eas...

  2. The sound of your lips: electrophysiological cross-modal interactions during hand-to-face and face-to-face speech perception

    Directory of Open Access Journals (Sweden)

    Avrill eTreille

    2014-05-01

    Full Text Available Recent magneto-encephalographic and electro-encephalographic studies provide evidence for cross-modal integration during audio-visual and audio-haptic speech perception, with speech gestures viewed or felt from manual tactile contact with the speaker’s face. Given the temporal precedence of the haptic and visual signals on the acoustic signal in these studies, the observed modulation of N1/P2 auditory evoked responses during bimodal compared to unimodal speech perception suggest that relevant and predictive visual and haptic cues may facilitate auditory speech processing. To further investigate this hypothesis, auditory evoked potentials were here compared during auditory-only, audio-visual and audio-haptic speech perception in live dyadic interactions between a listener and a speaker. In line with previous studies, auditory evoked potentials were attenuated and speeded up during both audio-haptic and audio-visual compared to auditory speech perception. Importantly, the observed latency and amplitude reduction did not significantly depend on the degree of visual and haptic recognition of the speech targets. Altogether, these results further demonstrate cross-modal interactions between the auditory, visual and haptic speech signals. Although they do not contradict the hypothesis that visual and haptic sensory inputs convey predictive information with respect to the incoming auditory speech input, these results suggest that, at least in live conversational interactions, systematic conclusions on sensory predictability in bimodal speech integration have to be taken with caution, with the extraction of predictive cues likely depending on the variability of the speech stimuli.

  3. Increasing observer objectivity with audio-visual technology: the Sphygmocorder.

    Science.gov (United States)

    Atkins; O'Brien; Wesseling; Guelen

    1997-10-01

    The most fallible component of blood pressure measurement is the human observer. The traditional technique of measuring blood pressure does not allow the result of the measurement to be checked by independent observers, thereby leaving the method open to bias. In the Sphygmocorder, several components used to measure blood pressure have been combined innovatively with audio-visual recording technology to produce a system consisting of a mercury sphygmomanometer, an occluding cuff, an automatic inflation-deflation source, a stethoscope, a microphone capable of detecting Korotkoff sounds, a camcorder and a display screen. The accuracy of the Sphygmocorder against the trained human observer has been confirmed previously using the protocol of the British Hypertension Society and in this article the updated system incorporating a number of innovations is described. PMID:10234128

  4. Psychoacoustic cues to emotion in speech prosody and music.

    Science.gov (United States)

    Coutinho, Eduardo; Dibben, Nicola

    2013-01-01

    There is strong evidence of shared acoustic profiles common to the expression of emotions in music and speech, yet relatively limited understanding of the specific psychoacoustic features involved. This study combined a controlled experiment and computational modelling to investigate the perceptual codes associated with the expression of emotion in the acoustic domain. The empirical stage of the study provided continuous human ratings of emotions perceived in excerpts of film music and natural speech samples. The computational stage created a computer model that retrieves the relevant information from the acoustic stimuli and makes predictions about the emotional expressiveness of speech and music close to the responses of human subjects. We show that a significant part of the listeners' second-by-second reported emotions to music and speech prosody can be predicted from a set of seven psychoacoustic features: loudness, tempo/speech rate, melody/prosody contour, spectral centroid, spectral flux, sharpness, and roughness. The implications of these results are discussed in the context of cross-modal similarities in the communication of emotion in the acoustic domain. PMID:23057507

  5. Something for Everyone? An Evaluation of the Use of Audio-Visual Resources in Geographical Learning in the UK.

    Science.gov (United States)

    McKendrick, John H.; Bowden, Annabel

    1999-01-01

    Reports from a survey of geographers that canvassed experiences using audio-visual resources to support teaching. Suggests that geographical learning has embraced audio-visual resources and that they are employed effectively. Concludes that integration of audio-visual resources into mainstream curriculum is essential to ensure effective and…

  6. Paragraph-based Prosodic Cues for Speech Synthesis Applications

    OpenAIRE

    Farrús, Mireia; Lai, Catherine; Moore, Johanna

    2016-01-01

    Speech synthesis has improved in both expressiveness and voice quality inrecent years. However, obtaining full expressiveness when dealing with largemulti-sentential synthesized discourse is still a challenge, since speechsynthesizers do not take into account the prosodic differences that have beenobserved in discourse units such as paragraphs. The current study validatesand extends previous work by analyzing the prosody of paragraph units in alarge and diverse corpus of TED Talks using autom...

  7. Real-time decreased sensitivity to an audio-visual illusion during goal-directed reaching.

    Directory of Open Access Journals (Sweden)

    Luc Tremblay

    Full Text Available In humans, sensory afferences are combined and integrated by the central nervous system (Ernst MO, Bülthoff HH (2004 Trends Cogn. Sci. 8: 162-169 and appear to provide a holistic representation of the environment. Empirical studies have repeatedly shown that vision dominates the other senses, especially for tasks with spatial demands. In contrast, it has also been observed that sound can strongly alter the perception of visual events. For example, when presented with 2 flashes and 1 beep in a very brief period of time, humans often report seeing 1 flash (i.e. fusion illusion, Andersen TS, Tiippana K, Sams M (2004 Brain Res. Cogn. Brain Res. 21: 301-308. However, it is not known how an unfolding movement modulates the contribution of vision to perception. Here, we used the audio-visual illusion to demonstrate that goal-directed movements can alter visual information processing in real-time. Specifically, the fusion illusion was linearly reduced as a function of limb velocity. These results suggest that cue combination and integration can be modulated in real-time by goal-directed behaviors; perhaps through sensory gating (Chapman CE, Beauchamp E (2006 J. Neurophysiol. 96: 1664-1675 and/or altered sensory noise (Ernst MO, Bülthoff HH (2004 Trends Cogn. Sci. 8: 162-169 during limb movements.

  8. Audio-visual assistance in co-creating transition knowledge

    Science.gov (United States)

    Hezel, Bernd; Broschkowski, Ephraim; Kropp, Jürgen P.

    2013-04-01

    Earth system and climate impact research results point to the tremendous ecologic, economic and societal implications of climate change. Specifically people will have to adopt lifestyles that are very different from those they currently strive for in order to mitigate severe changes of our known environment. It will most likely not suffice to transfer the scientific findings into international agreements and appropriate legislation. A transition is rather reliant on pioneers that define new role models, on change agents that mainstream the concept of sufficiency and on narratives that make different futures appealing. In order for the research community to be able to provide sustainable transition pathways that are viable, an integration of the physical constraints and the societal dynamics is needed. Hence the necessary transition knowledge is to be co-created by social and natural science and society. To this end, the Climate Media Factory - in itself a massively transdisciplinary venture - strives to provide an audio-visual connection between the different scientific cultures and a bi-directional link to stake holders and society. Since methodology, particular language and knowledge level of the involved is not the same, we develop new entertaining formats on the basis of a "complexity on demand" approach. They present scientific information in an integrated and entertaining way with different levels of detail that provide entry points to users with different requirements. Two examples shall illustrate the advantages and restrictions of the approach.

  9. Audio-visual aid in teaching "fatty liver".

    Science.gov (United States)

    Dash, Sambit; Kamath, Ullas; Rao, Guruprasad; Prakash, Jay; Mishra, Snigdha

    2016-05-01

    Use of audio visual tools to aid in medical education is ever on a rise. Our study intends to find the efficacy of a video prepared on "fatty liver," a topic that is often a challenge for pre-clinical teachers, in enhancing cognitive processing and ultimately learning. We prepared a video presentation of 11:36 min, incorporating various concepts of the topic, while keeping in view Mayer's and Ellaway guidelines for multimedia presentation. A pre-post test study on subject knowledge was conducted for 100 students with the video shown as intervention. A retrospective pre study was conducted as a survey which inquired about students understanding of the key concepts of the topic and a feedback on our video was taken. Students performed significantly better in the post test (mean score 8.52 vs. 5.45 in pre-test), positively responded in the retrospective pre-test and gave a positive feedback for our video presentation. Well-designed multimedia tools can aid in cognitive processing and enhance working memory capacity as shown in our study. In times when "smart" device penetration is high, information and communication tools in medical education, which can act as essential aid and not as replacement for traditional curriculums, can be beneficial to the students. © 2015 by The International Union of Biochemistry and Molecular Biology, 44:241-245, 2016. PMID:26625860

  10. User requirements for multimedia indexing and retrieval of unedited audio-visual footage - RUSHES

    OpenAIRE

    Schreer, O; Fuentes Ardeo, L; Sotiriou, D.; Sadka, A.H.; Izquierdo, E

    2008-01-01

    Multimedia analysis and reuse of raw un-edited audio visual content known as rushes is gaining acceptance by a large number of research labs and companies. A set of research projects are considering multimedia indexing, annotation, search and retrieval in the context of European funded research, but only the FP6 project RUSHES is focusing on automatic semantic annotation, indexing and retrieval of raw and un-edited audio-visual content. Even professional content creators and providers as well...

  11. Audio/visual analysis for high-speed TV advertisement detection from MPEG bitstream

    OpenAIRE

    Sadlier, David A.

    2002-01-01

    Advertisement breaks dunng or between television programmes are typically flagged by senes of black-and-silent video frames, which recurrendy occur in order to audio-visually separate individual advertisement spots from one another. It is the regular prevalence of these flags that enables automatic differentiauon between what is programme content and what is advertisement break. Detection of these audio-visual depressions within broadcast television content provides a basis on which advertise...

  12. Listener deficits in hypokinetic dysarthria: Which cues are most important in speech segmentation?

    Science.gov (United States)

    Wade, Carolyn Ann

    Listeners use prosodic cues to help them quickly process running speech. In English, listeners effortlessly use strong syllables to help them to find words in the continuous stream of speech produced by neurologically-intact individuals. However, listeners are not always presented with speech under such ideal circumstances. This thesis explores the question of word segmentation of English speech under one of these less ideal conditions; specifically, when the speaker may be impaired in his/her production of strong syllables, as in the case of hypokinetic dysarthria. Further, we attempt to discern which acoustic cue(s) are most degraded in hypokinetic dysarthria and the effect that this degradation has on listeners' segmentation when no additional semantic or pragmatic cues are present. Two individuals with Parkinson's disease, one with a rate disturbance and one with articulatory disruption, along with a typically aging control, were recorded repeating a series of nonsense syllables. Young adult listeners were then presented with recordings from one of these three speakers producing non-words (imprecise consonant articulation, rate disturbance, and control). After familiarization, the listeners were asked to rate the familiarity of the non-words produced by a second typically aging speaker. Results indicated speakers with hypokinetic dysarthria were able to modulate their intensity and duration for stressed and unstressed syllables in a way similar to that of control speakers. In addition, their mean and peak fundamental frequency for both stressed and unstressed syllables were significantly higher than that of the normally aging controls. ANOVA results revealed a marginal main effect of frequency in normal and consonant conditions for word versus nonwords listener ratings.

  13. A Psychophysical Imaging Method Evidencing Auditory Cue Extraction during Speech Perception: A Group Analysis of Auditory Classification Images

    OpenAIRE

    Varnet, Léo; Knoblauch, Kenneth; Serniclaes, Willy; Meunier, Fanny; Hoen, Michel

    2015-01-01

    Although there is a large consensus regarding the involvement of specific acoustic cues in speech perception, the precise mechanisms underlying the transformation from continuous acoustical properties into discrete perceptual units remains undetermined. This gap in knowledge is partially due to the lack of a turnkey solution for isolating critical speech cues from natural stimuli. In this paper, we describe a psychoacoustic imaging method known as the Auditory Classification Image technique t...

  14. Suprasegmental lexical stress cues in visual speech can guide spoken-word recognition.

    Science.gov (United States)

    Jesse, Alexandra; McQueen, James M

    2014-01-01

    Visual cues to the individual segments of speech and to sentence prosody guide speech recognition. The present study tested whether visual suprasegmental cues to the stress patterns of words can also constrain recognition. Dutch listeners use acoustic suprasegmental cues to lexical stress (changes in duration, amplitude, and pitch) in spoken-word recognition. We asked here whether they can also use visual suprasegmental cues. In two categorization experiments, Dutch participants saw a speaker say fragments of word pairs that were segmentally identical but differed in their stress realization (e.g., 'ca-vi from cavia "guinea pig" vs. 'ka-vi from kaviaar "caviar"). Participants were able to distinguish between these pairs from seeing a speaker alone. Only the presence of primary stress in the fragment, not its absence, was informative. Participants were able to distinguish visually primary from secondary stress on first syllables, but only when the fragment-bearing target word carried phrase-level emphasis. Furthermore, participants distinguished fragments with primary stress on their second syllable from those with secondary stress on their first syllable (e.g., pro-'jec from projector "projector" vs. 'pro-jec from projectiel "projectile"), independently of phrase-level emphasis. Seeing a speaker thus contributes to spoken-word recognition by providing suprasegmental information about the presence of primary lexical stress. PMID:24134065

  15. Audio-visual biofeedback for respiratory-gated radiotherapy: Impact of audio instruction and audio-visual biofeedback on respiratory-gated radiotherapy

    International Nuclear Information System (INIS)

    Purpose: Respiratory gating is a commercially available technology for reducing the deleterious effects of motion during imaging and treatment. The efficacy of gating is dependent on the reproducibility within and between respiratory cycles during imaging and treatment. The aim of this study was to determine whether audio-visual biofeedback can improve respiratory reproducibility by decreasing residual motion and therefore increasing the accuracy of gated radiotherapy. Methods and Materials: A total of 331 respiratory traces were collected from 24 lung cancer patients. The protocol consisted of five breathing training sessions spaced about a week apart. Within each session the patients initially breathed without any instruction (free breathing), with audio instructions and with audio-visual biofeedback. Residual motion was quantified by the standard deviation of the respiratory signal within the gating window. Results: Audio-visual biofeedback significantly reduced residual motion compared with free breathing and audio instruction. Displacement-based gating has lower residual motion than phase-based gating. Little reduction in residual motion was found for duty cycles less than 30%; for duty cycles above 50% there was a sharp increase in residual motion. Conclusions: The efficiency and reproducibility of gating can be improved by: incorporating audio-visual biofeedback, using a 30-50% duty cycle, gating during exhalation, and using displacement-based gating

  16. Pitch and spectral resolution: A systematic comparison of bottom-up cues for top-down repair of degraded speech.

    Science.gov (United States)

    Clarke, Jeanne; Başkent, Deniz; Gaudrain, Etienne

    2016-01-01

    The brain is capable of restoring missing parts of speech, a top-down repair mechanism that enhances speech understanding in noisy environments. This enhancement can be quantified using the phonemic restoration paradigm, i.e., the improvement in intelligibility when silent interruptions of interrupted speech are filled with noise. Benefit from top-down repair of speech differs between cochlear implant (CI) users and normal-hearing (NH) listeners. This difference could be due to poorer spectral resolution and/or weaker pitch cues inherent to CI transmitted speech. In CIs, those two degradations cannot be teased apart because spectral degradation leads to weaker pitch representation. A vocoding method was developed to evaluate independently the roles of pitch and spectral resolution for restoration in NH individuals. Sentences were resynthesized with different spectral resolutions and with either retaining the original pitch cues or discarding them all. The addition of pitch significantly improved restoration only at six-bands spectral resolution. However, overall intelligibility of interrupted speech was improved both with the addition of pitch and with the increase in spectral resolution. This improvement may be due to better discrimination of speech segments from the filler noise, better grouping of speech segments together, and/or better bottom-up cues available in the speech segments. PMID:26827034

  17. The contribution of visual information to the perception of speech in noise with and without informative temporal fine structure.

    Science.gov (United States)

    Stacey, Paula C; Kitterick, Pádraig T; Morris, Saffron D; Sumner, Christian J

    2016-06-01

    Understanding what is said in demanding listening situations is assisted greatly by looking at the face of a talker. Previous studies have observed that normal-hearing listeners can benefit from this visual information when a talker's voice is presented in background noise. These benefits have also been observed in quiet listening conditions in cochlear-implant users, whose device does not convey the informative temporal fine structure cues in speech, and when normal-hearing individuals listen to speech processed to remove these informative temporal fine structure cues. The current study (1) characterised the benefits of visual information when listening in background noise; and (2) used sine-wave vocoding to compare the size of the visual benefit when speech is presented with or without informative temporal fine structure. The accuracy with which normal-hearing individuals reported words in spoken sentences was assessed across three experiments. The availability of visual information and informative temporal fine structure cues was varied within and across the experiments. The results showed that visual benefit was observed using open- and closed-set tests of speech perception. The size of the benefit increased when informative temporal fine structure cues were removed. This finding suggests that visual information may play an important role in the ability of cochlear-implant users to understand speech in many everyday situations. Models of audio-visual integration were able to account for the additional benefit of visual information when speech was degraded and suggested that auditory and visual information was being integrated in a similar way in all conditions. The modelling results were consistent with the notion that audio-visual benefit is derived from the optimal combination of auditory and visual sensory cues. PMID:27085797

  18. The development and use of audio-visual technology in terms of economy and socio-economic trends in society

    OpenAIRE

    Mikšík, Jan

    2014-01-01

    The aim of this work is to describe history of audio-visual technology and to analyse the influence of digitalization. The text describes the history of cinematography, television and also the introduction of audio-visual technology to people's homes. It contains information on present situation as well as new trends and the influence of the Internet on audio-visual making. There is a comparison of past and present technologies. The new technologies are accessible even for amateur creators wh...

  19. Acoustic Features and Perceptive Cues of Songs and Dialogues in Whistled Speech: Convergences with Sung Speech

    OpenAIRE

    Meyer, Julien

    2007-01-01

    International audience Whistled speech is a little studied local use of language shaped by several cultures of the world either for distant dialogues or for rendering traditional songs. This practice consists of an emulation of the voice thanks to a simple modulated pitch. It is therefore the result of a transformation of the vocal signal that implies simplifications in the frequency domain. The whistlers adapt their productions to the way each language combines the qualities of height per...

  20. An Audio-Visual Resource Notebook for Adult Consumer Education. An Annotated Bibliography of Selected Audio-Visual Aids for Adult Consumer Education, with Special Emphasis on Materials for Elderly, Low-Income and Handicapped Consumers.

    Science.gov (United States)

    Virginia State Dept. of Agriculture and Consumer Services, Richmond, VA.

    This document is an annotated bibliography of audio-visual aids in the field of consumer education, intended especially for use among low-income, elderly, and handicapped consumers. It was developed to aid consumer education program planners in finding audio-visual resources to enhance their presentations. Materials listed include 293 resources…

  1. Temporal structure and complexity affect audio-visual correspondence detection

    OpenAIRE

    Denison, Rachel N.; Driver, Jon; Ruff, Christian C.

    2013-01-01

    Synchrony between events in different senses has long been considered the critical temporal cue for multisensory integration. Here, using rapid streams of auditory and visual events, we demonstrate how humans can use temporal structure (rather than mere temporal coincidence) to detect multisensory relatedness. We find psychophysically that participants can detect matching auditory and visual streams via shared temporal structure for crossmodal lags of up to 200 ms. Performance on this task re...

  2. Temporal structure and complexity affect audio-visual correspondence detection

    OpenAIRE

    Denison, Rachel N.; Jon eDriver; Ruff, Christian C.

    2013-01-01

    Synchrony between events in different senses has long been considered the critical temporal cue for multisensory integration. Here, using rapid streams of auditory and visual events, we demonstrate how humans can use temporal structure (rather than mere temporal coincidence) to detect multisensory relatedness. We find psychophysically that participants can detect matching auditory and visual streams via shared temporal structure for crossmodal lags of up to 200 ms. Performance on this task re...

  3. Temporal Structure and Complexity Affect Audio-Visual Correspondence Detection

    OpenAIRE

    Denison, Rachel N.; Driver, Jon; Ruff, Christian C.

    2013-01-01

    Synchrony between events in different senses has long been considered the critical temporal cue for multisensory integration. Here, using rapid streams of auditory and visual events, we demonstrate how humans can use temporal structure (rather than mere temporal coincidence) to detect multisensory relatedness. We find psychophysically that participants can detect matching auditory and visual streams via shared temporal structure for crossmodal lags of up to 200 ms. Performance on this task re...

  4. Audio-visual training-aid for speechreading

    DEFF Research Database (Denmark)

    Bothe, Hans-Heinrich; Gebert, H.

    2011-01-01

    employment of computer‐based communication aids for hearing‐impaired, deaf and deaf‐blind people [6]. This paper presents the complete system that is composed of a 3D‐facial animation with synchronized speech synthesis, a natural language dialogue unit and a student‐teacher‐training module. Due to the very...... is important for hard‐of‐hearing students and acoustic reverberation effects of the prospective roomfor people with low residual hearing. Speechreading requires thorough understanding of spoken language but first and foremost, also of the situational context and the pragmatic meaning of an utterance...... without the need of fundamental knowledge of other words. The present version of the training aid can be used for the training of speechreading in English, this as a consequence of the integrated English language models for facial animation and speech synthesis. Nevertheless, the training aid is prepared...

  5. Training the brain to weight speech cues differently: a study of Finnish second-language users of English.

    Science.gov (United States)

    Ylinen, Sari; Uther, Maria; Latvala, Antti; Vepsäläinen, Sara; Iverson, Paul; Akahane-Yamada, Reiko; Näätänen, Risto

    2010-06-01

    Foreign-language learning is a prime example of a task that entails perceptual learning. The correct comprehension of foreign-language speech requires the correct recognition of speech sounds. The most difficult speech-sound contrasts for foreign-language learners often are the ones that have multiple phonetic cues, especially if the cues are weighted differently in the foreign and native languages. The present study aimed to determine whether non-native-like cue weighting could be changed by using phonetic training. Before the training, we compared the use of spectral and duration cues of English /i/ and /I/ vowels (e.g., beat vs. bit) between native Finnish and English speakers. In Finnish, duration is used phonologically to separate short and long phonemes, and therefore Finns were expected to weight duration cues more than native English speakers. The cross-linguistic differences and training effects were investigated with behavioral and electrophysiological methods, in particular by measuring the MMN brain response that has been used to probe long-term memory representations for speech sounds. The behavioral results suggested that before the training, the Finns indeed relied more on duration in vowel recognition than the native English speakers did. After the training, however, the Finns were able to use the spectral cues of the vowels more reliably than before. Accordingly, the MMN brain responses revealed that the training had enhanced the Finns' ability to preattentively process the spectral cues of the English vowels. This suggests that as a result of training, plastic changes had occurred in the weighting of phonetic cues at early processing stages in the cortex. PMID:19445609

  6. Automated Apprenticeship Training (AAT). A Systematized Audio-Visual Approach to Self-Paced Job Training.

    Science.gov (United States)

    Pieper, William J.; And Others

    Two Automated Apprenticeship Training (AAT) courses were developed for Air Force Security Police Law Enforcement and Security specialists. The AAT was a systematized audio-visual approach to self-paced job training employing an easily operated teaching device. AAT courses were job specific and based on a behavioral task analysis of the two…

  7. A comparative study on automatic audio-visual fusion for aggression detection using meta-information

    NARCIS (Netherlands)

    Lefter, I.; Rothkrantz, L.J.M.; Burghouts, G.J.

    2013-01-01

    Multimodal fusion is a complex topic. For surveillance applications audio-visual fusion is very promising given the complementary nature of the two streams. However, drawing the correct conclusion from multi-sensor data is not straightforward. In previous work we have analysed a database with audio-

  8. Automatic audio-visual fusion for aggression detection using meta-information

    NARCIS (Netherlands)

    Lefter, I.; Burghouts, G.J.; Rothkrantz, L.J.M.

    2012-01-01

    We propose a new method for audio-visual sensor fusion and apply it to automatic aggression detection. While a variety of definitions of aggression exist, in this paper we see it as any kind of behavior that has a disturbing effect on others. We have collected multi- and unimodal assessments by huma

  9. Multimodal indexing of digital audio-visual documents: A case study for cultural heritage data

    NARCIS (Netherlands)

    J. Carmichael; M. Larson; J. Marlow; E. Newman; P. Clough; J. Oomen; S. Sav

    2008-01-01

    This paper describes a multimedia multimodal information access sub-system (MIAS) for digital audio-visual documents, typically presented in streaming media format. The system is designed to provide both professional and general users with entry points into video documents that are relevant to their

  10. Technical Considerations in the Delivery of Audio-Visual Course Content.

    Science.gov (United States)

    Lightfoot, Jay M.

    2002-01-01

    In an attempt to provide students with the benefit of the latest technology, some instructors include multimedia content on their class Web sites. This article introduces the basic terms and concepts needed to understand the multimedia domain. Provides a brief tutorial designed to help instructors create good, consistent audio-visual content. (AEF)

  11. The Use of Video as an Audio-visual Material in Foreign Language Teaching Classroom

    Science.gov (United States)

    Cakir, Ismail

    2006-01-01

    In recent years, a great tendency towards the use of technology and its integration into the curriculum has gained a great importance. Particularly, the use of video as an audio-visual material in foreign language teaching classrooms has grown rapidly because of the increasing emphasis on communicative techniques, and it is obvious that the use of…

  12. Audio-visual synchrony and feature-selective attention co-amplify early visual processing.

    Science.gov (United States)

    Keitel, Christian; Müller, Matthias M

    2016-05-01

    Our brain relies on neural mechanisms of selective attention and converging sensory processing to efficiently cope with rich and unceasing multisensory inputs. One prominent assumption holds that audio-visual synchrony can act as a strong attractor for spatial attention. Here, we tested for a similar effect of audio-visual synchrony on feature-selective attention. We presented two superimposed Gabor patches that differed in colour and orientation. On each trial, participants were cued to selectively attend to one of the two patches. Over time, spatial frequencies of both patches varied sinusoidally at distinct rates (3.14 and 3.63 Hz), giving rise to pulse-like percepts. A simultaneously presented pure tone carried a frequency modulation at the pulse rate of one of the two visual stimuli to introduce audio-visual synchrony. Pulsed stimulation elicited distinct time-locked oscillatory electrophysiological brain responses. These steady-state responses were quantified in the spectral domain to examine individual stimulus processing under conditions of synchronous versus asynchronous tone presentation and when respective stimuli were attended versus unattended. We found that both, attending to the colour of a stimulus and its synchrony with the tone, enhanced its processing. Moreover, both gain effects combined linearly for attended in-sync stimuli. Our results suggest that audio-visual synchrony can attract attention to specific stimulus features when stimuli overlap in space. PMID:26226930

  13. Primary School Pupils' Response to Audio-Visual Learning Process in Port-Harcourt

    Science.gov (United States)

    Olube, Friday K.

    2015-01-01

    The purpose of this study is to examine primary school children's response on the use of audio-visual learning processes--a case study of Chokhmah International Academy, Port-Harcourt (owned by Salvation Ministries). It looked at the elements that enhance pupils' response to educational television programmes and their hindrances to these…

  14. Rehabilitation of balance-impaired stroke patients through audio-visual biofeedback

    DEFF Research Database (Denmark)

    Gheorghe, Cristina; Nissen, Thomas; Juul Rosengreen Christensen, Daniel;

    2015-01-01

    This study explored how audio-visual biofeedback influences physical balance of seven balance-impaired stroke patients, between 33–70 years-of-age. The setup included a bespoke balance board and a music rhythm game. The procedure was designed as follows: (1) a control group who performed a balance...

  15. Audio-visual stimulation improves oculomotor patterns in patients with hemianopia.

    Science.gov (United States)

    Passamonti, Claudia; Bertini, Caterina; Làdavas, Elisabetta

    2009-01-01

    Patients with visual field disorders often exhibit impairments in visual exploration and a typical defective oculomotor scanning behaviour. Recent evidence [Bolognini, N., Rasi, F., Coccia, M., & Làdavas, E. (2005b). Visual search improvement in hemianopic patients after audio-visual stimulation. Brain, 128, 2830-2842] suggests that systematic audio-visual stimulation of the blind hemifield can improve accuracy and search times in visual exploration, probably due to the stimulation of Superior Colliculus (SC), an important multisensory structure involved in both the initiation and execution of saccades. The aim of the present study is to verify this hypothesis by studying the effects of multisensory training on oculomotor scanning behaviour. Oculomotor responses during a visual search task and a reading task were studied before and after visual (control) or audio-visual (experimental) training, in a group of 12 patients with chronic visual field defects and 12 controls subjects. Eye movements were recorded using an infra-red technique which measured a range of spatial and temporal variables. Prior to treatment, patients' performance was significantly different from that of controls in relation to fixations and saccade parameters; after Audio-Visual Training, all patients reported an improvement in ocular exploration characterized by fewer fixations and refixations, quicker and larger saccades, and reduced scanpath length. Overall, these improvements led to a reduction of total exploration time. Similarly, reading parameters were significantly affected by the training, with respect to specific impairments observed in both left- and right-hemianopia readers. Our findings provide evidence that Audio-Visual Training, by stimulating the SC, may induce a more organized pattern of visual exploration due to an implementation of efficient oculomotor strategies. Interestingly, the improvement was found to be stable at a 1 year follow-up control session, indicating a long

  16. Enhanced audio-visual interactions in the auditory cortex of elderly cochlear-implant users.

    Science.gov (United States)

    Schierholz, Irina; Finke, Mareike; Schulte, Svenja; Hauthal, Nadine; Kantzke, Christoph; Rach, Stefan; Büchner, Andreas; Dengler, Reinhard; Sandmann, Pascale

    2015-10-01

    Auditory deprivation and the restoration of hearing via a cochlear implant (CI) can induce functional plasticity in auditory cortical areas. How these plastic changes affect the ability to integrate combined auditory (A) and visual (V) information is not yet well understood. In the present study, we used electroencephalography (EEG) to examine whether age, temporary deafness and altered sensory experience with a CI can affect audio-visual (AV) interactions in post-lingually deafened CI users. Young and elderly CI users and age-matched NH listeners performed a speeded response task on basic auditory, visual and audio-visual stimuli. Regarding the behavioral results, a redundant signals effect, that is, faster response times to cross-modal (AV) than to both of the two modality-specific stimuli (A, V), was revealed for all groups of participants. Moreover, in all four groups, we found evidence for audio-visual integration. Regarding event-related responses (ERPs), we observed a more pronounced visual modulation of the cortical auditory response at N1 latency (approximately 100 ms after stimulus onset) in the elderly CI users when compared with young CI users and elderly NH listeners. Thus, elderly CI users showed enhanced audio-visual binding which may be a consequence of compensatory strategies developed due to temporary deafness and/or degraded sensory input after implantation. These results indicate that the combination of aging, sensory deprivation and CI facilitates the coupling between the auditory and the visual modality. We suggest that this enhancement in multisensory interactions could be used to optimize auditory rehabilitation, especially in elderly CI users, by the application of strong audio-visually based rehabilitation strategies after implant switch-on. PMID:26302946

  17. Brain networks engaged in audiovisual integration during speech perception revealed by persistent homology-based network filtration.

    Science.gov (United States)

    Kim, Heejung; Hahm, Jarang; Lee, Hyekyoung; Kang, Eunjoo; Kang, Hyejin; Lee, Dong Soo

    2015-05-01

    The human brain naturally integrates audiovisual information to improve speech perception. However, in noisy environments, understanding speech is difficult and may require much effort. Although the brain network is supposed to be engaged in speech perception, it is unclear how speech-related brain regions are connected during natural bimodal audiovisual or unimodal speech perception with counterpart irrelevant noise. To investigate the topological changes of speech-related brain networks at all possible thresholds, we used a persistent homological framework through hierarchical clustering, such as single linkage distance, to analyze the connected component of the functional network during speech perception using functional magnetic resonance imaging. For speech perception, bimodal (audio-visual speech cue) or unimodal speech cues with counterpart irrelevant noise (auditory white-noise or visual gum-chewing) were delivered to 15 subjects. In terms of positive relationship, similar connected components were observed in bimodal and unimodal speech conditions during filtration. However, during speech perception by congruent audiovisual stimuli, the tighter couplings of left anterior temporal gyrus-anterior insula component and right premotor-visual components were observed than auditory or visual speech cue conditions, respectively. Interestingly, visual speech is perceived under white noise by tight negative coupling in the left inferior frontal region-right anterior cingulate, left anterior insula, and bilateral visual regions, including right middle temporal gyrus, right fusiform components. In conclusion, the speech brain network is tightly positively or negatively connected, and can reflect efficient or effortful processes during natural audiovisual integration or lip-reading, respectively, in speech perception. PMID:25495216

  18. Relative roles of consonants and vowels in perceiving phonetic versus talker cues in speech

    Science.gov (United States)

    Cardillo, Gina; Owren, Michael J.

    2002-05-01

    Perceptual experiments tested whether consonants and vowels differentially contribute to phonetic versus indexical cueing in speech. In 2 experiments, 62 total participants each heard 128 American-English word pairs recorded by 8 male and 8 female talkers. Half the pairs were synonyms, while half were nonsynonyms. Further, half the pairs were words from the same talker, and half from different, same-sex talkers. The first word heard was unaltered, while the second was edited by setting either all vowels (``Consonants-Only'') or all consonants (``Vowels-Only'') to silence. Each participant responded to half Consonants-Only and half Vowels-Only trials, always hearing the unaltered word once and the edited word twice. In experiment 1, participants judged whether the two words had the same or different meanings. Participants in experiment 2 indicated whether the word pairs were from the same or different talkers. Performance was measured as latencies and d values, and indicated significantly greater sensitivity to phonetic content when consonants rather than vowels were heard, but the converse when talker identity was judged. These outcomes suggest important functional differences in the roles played by consonants and vowels in normative speech.

  19. El tratamiento documental del mensaje audiovisual Documentary treatment of the audio-visual message

    Directory of Open Access Journals (Sweden)

    Blanca Rodríguez Bravo

    2005-06-01

    Full Text Available Se analizan las peculiaridades del documento audiovisual y el tratamiento documental que sufre en las emisoras de televisión. Observando a las particularidades de la imagen que condicionan su análisis y recuperación, se establecen las etapas y procedimientos para representar el mensaje audiovisual con vistas a su reutilización. Por último se realizan algunas consideraciones acerca del procesamiento automático del video y de los cambios introducidos por la televisión digital.Peculiarities of the audio-visual document and the treatment it undergoes in TV broadcasting stations are analyzed. The particular features of images condition their analysis and recovery; this paper establishes stages and proceedings for the representation of audio-visual messages with a view to their re-usability Also, some considerations about the automatic processing of the video and the changes introduced by digital TV are made.

  20. Prioritized MPEG-4 Audio-Visual Objects Streaming over the DiffServ

    Institute of Scientific and Technical Information of China (English)

    HUANG Tian-yun; ZHENG Chan

    2005-01-01

    The object-based scalable coding in MPEG-4 is investigated, and a prioritized transmission scheme of MPEG-4 audio-visual objects (AVOs) over the DiffServ network with the QoS guarantee is proposed. MPEG-4 AVOs are extracted and classified into different groups according to their priority values and scalable layers (visual importance). These priority values are mapped to the IP DiffServ per hop behaviors (PHB). This scheme can selectively discard packets with low importance, in order to avoid the network congestion. Simulation results show that the quality of received video can gracefully adapt to network state, as compared with the 'best-effort' manner. Also, by allowing the content provider to define prioritization of each audio-visual object, the adaptive transmission of object-based scalable video can be customized based on the content.

  1. Cueing musical emotions: An empirical analysis of 24-piece sets by Bach and Chopin documents parallels with emotional speech

    Directory of Open Access Journals (Sweden)

    Matthew ePoon

    2015-11-01

    Full Text Available Acoustic cues such as pitch height and timing are effective at communicating emotion in both music and speech. Numerous experiments altering musical passages have shown that higher and faster melodies generally sound happier than lower and slower melodies, findings consistent with corpus analyses of emotional speech. However, equivalent corpus analyses of complex time-varying cues in music are less common, due in part to the challenges of assembling an appropriate corpus. Here we describe a novel, score-based exploration of the use of pitch height and timing in a set of balanced major and minor key compositions. Our corpus contained all 24 Preludes and 24 Fugues from Bach’s Well Tempered Clavier (book 1, as well as all 24 of Chopin’s Preludes for piano. These three sets are balanced with respect to both modality (major/minor and key chroma (A, B, C, etc.. Consistent with predictions derived from speech, we found major-key (nominally happy pieces to be two semitones higher in pitch height and 29% faster than minor-key (nominally sad pieces. This demonstrates that our balanced corpus of major and minor key pieces uses low-level acoustic cues for emotion in a manner consistent with speech. A series of post-hoc analyses illustrate interesting trade-offs, with

  2. THE IMPROVEMENT OF AUDIO-VISUAL BASED DANCE APPRECIATION LEARNING AMONG PRIMARY TEACHER EDUCATION STUDENTS OF MAKASSAR STATE UNIVERSITY

    OpenAIRE

    Wahira

    2014-01-01

    This research aimed to improve the skill in appreciating dances owned by the students of Primary Teacher Education of Makassar State University, to improve the perception towards audio-visual based art appreciation, to increase the students’ interest in audio-visual based art education subject, and to increase the students’ responses to the subject. This research was classroom action research using the research design created by Kemmis & MC. Taggart, which was conducted to 42 students of Prim...

  3. Internet video telephony allows speech reading by deaf individuals and improves speech perception by cochlear implant users.

    Directory of Open Access Journals (Sweden)

    Georgios Mantokoudis

    Full Text Available OBJECTIVE: To analyze speech reading through Internet video calls by profoundly hearing-impaired individuals and cochlear implant (CI users. METHODS: Speech reading skills of 14 deaf adults and 21 CI users were assessed using the Hochmair Schulz Moser (HSM sentence test. We presented video simulations using different video resolutions (1280 × 720, 640 × 480, 320 × 240, 160 × 120 px, frame rates (30, 20, 10, 7, 5 frames per second (fps, speech velocities (three different speakers, webcameras (Logitech Pro9000, C600 and C500 and image/sound delays (0-500 ms. All video simulations were presented with and without sound and in two screen sizes. Additionally, scores for live Skype™ video connection and live face-to-face communication were assessed. RESULTS: Higher frame rate (>7 fps, higher camera resolution (>640 × 480 px and shorter picture/sound delay (<100 ms were associated with increased speech perception scores. Scores were strongly dependent on the speaker but were not influenced by physical properties of the camera optics or the full screen mode. There is a significant median gain of +8.5%pts (p = 0.009 in speech perception for all 21 CI-users if visual cues are additionally shown. CI users with poor open set speech perception scores (n = 11 showed the greatest benefit under combined audio-visual presentation (median speech perception +11.8%pts, p = 0.032. CONCLUSION: Webcameras have the potential to improve telecommunication of hearing-impaired individuals.

  4. Investigating the impact of audio instruction and audio-visual biofeedback for lung cancer radiation therapy

    Science.gov (United States)

    George, Rohini

    Lung cancer accounts for 13% of all cancers in the Unites States and is the leading cause of deaths among both men and women. The five-year survival for lung cancer patients is approximately 15%.(ACS facts & figures) Respiratory motion decreases accuracy of thoracic radiotherapy during imaging and delivery. To account for respiration, generally margins are added during radiation treatment planning, which may cause a substantial dose delivery to normal tissues and increase the normal tissue toxicity. To alleviate the above-mentioned effects of respiratory motion, several motion management techniques are available which can reduce the doses to normal tissues, thereby reducing treatment toxicity and allowing dose escalation to the tumor. This may increase the survival probability of patients who have lung cancer and are receiving radiation therapy. However the accuracy of these motion management techniques are inhibited by respiration irregularity. The rationale of this thesis was to study the improvement in regularity of respiratory motion by breathing coaching for lung cancer patients using audio instructions and audio-visual biofeedback. A total of 331 patient respiratory motion traces, each four minutes in length, were collected from 24 lung cancer patients enrolled in an IRB-approved breathing-training protocol. It was determined that audio-visual biofeedback significantly improved the regularity of respiratory motion compared to free breathing and audio instruction, thus improving the accuracy of respiratory gated radiotherapy. It was also observed that duty cycles below 30% showed insignificant reduction in residual motion while above 50% there was a sharp increase in residual motion. The reproducibility of exhale based gating was higher than that of inhale base gating. Modeling the respiratory cycles it was found that cosine and cosine 4 models had the best correlation with individual respiratory cycles. The overall respiratory motion probability distribution

  5. A Psychophysical Imaging Method Evidencing Auditory Cue Extraction during Speech Perception: A Group Analysis of Auditory Classification Images : Auditory Classification Images

    OpenAIRE

    Varnet, Léo; Knoblauch, Kenneth; Serniclaes, Willy; Meunier, Fanny; Hoen, Michel

    2015-01-01

    Although there is a large consensus regarding the involvement of specific acoustic cues in speech perception, the precise mechanisms underlying the transformation from continuous acoustical properties into discrete perceptual units remains undetermined. This gap in knowledge is partially due to the lack of a turnkey solution for isolating critical speech cues from natural stimuli. In this paper, we describe a psychoacoustic imaging method known as the Auditory Classification Image technique t...

  6. The perception of speech modulation cues in lexical tones is guided by early language-specific experience

    Directory of Open Access Journals (Sweden)

    Laurianne eCabrera

    2015-08-01

    Full Text Available A number of studies showed that infants reorganize their perception of speech sounds according to their native language categories during their first year of life. Still, information is lacking about the contribution of basic auditory mechanisms to this process. This study aimed to evaluate when native language experience starts to noticeably affect the perceptual processing of basic acoustic cues (i.e., frequency-modulation (FM and amplitude-modulation (AM information known to be crucial for speech perception in adults. The discrimination of a lexical-tone contrast (rising versus low was assessed in 6- and 10-month-old infants learning either French or Mandarin using a visual habituation paradigm. The lexical tones were presented in two conditions designed to either keep intact or to severely degrade the FM and fine spectral cues needed to accurately perceive voice-pitch trajectory. A third condition was designed to assess the discrimination of the same voice-pitch trajectories using click trains containing only the FM cues related to the fundamental-frequency (F0 in French- and Mandarin-learning 10-month-old infants. Results showed that the younger infants of both language groups and the Mandarin-learning 10-month-olds discriminated the intact lexical-tone contrast while French-learning 10-month-olds failed. However, only the French 10-month-olds discriminated degraded lexical tones when FM, and thus voice-pitch cues were reduced. Moreover, Mandarin-learning 10-month-olds were found to discriminate the pitch trajectories as presented in click trains better than French infants. Altogether, these results reveal that the perceptual reorganization occurring during the first year of life for lexical tones is coupled with changes in the auditory ability to use speech modulation cues.

  7. PHYSIOLOGICAL MONITORING OPERATORS ACS IN AUDIO-VISUAL SIMULATION OF AN EMERGENCY

    Directory of Open Access Journals (Sweden)

    S. S. Aleksanin

    2016-01-01

    Full Text Available In terms of ship simulator automated control systems we have investigated the information content of physiological monitoring cardiac rhythm to assess the reliability and noise immunity of operators of various specializations with audio-visual simulation of an emergency. In parallel, studied the effectiveness of protection against the adverse effects of electromagnetic fields. Monitoring of cardiac rhythm in a virtual crash it is possible to differentiate the degree of voltage regulation systems of body functions of operators on specialization and note the positive effect of the use of means of protection from exposure of electromagnetic fields.

  8. Simulating Synesthesia In Spatially-Based Real-Time Audio-Visual Performance

    OpenAIRE

    Gibson, Stephen

    2013-01-01

    In this paper I will describe and present examples of my live audio-visual work for 3D spatial environments. These projects use motion-tracking technology to enable users to interact with sound, light and video using their body movements in 3D space. Specific video examples of one past project (Virtual DJ) and one current project (Virtual VJ) will be shown to illustrate how flexible user interaction is enabled through a complex and precise mapping of 3D space to media control. In these projec...

  9. The influence of infant-directed speech on 12-month-olds' intersensory perception of fluent speech

    OpenAIRE

    Kubicek, Claudia; Gervain, Judit; Hillairet de Boisferon, Anne; Pascalis, Olivier; Lœvenbruck, Hélène; Schwarzer, Gudrun

    2015-01-01

    The present study examined whether infant-directed (ID) speech facilitates intersensory matching of audio--visual fluent speech in 12-month-old infants. German-learning infants’ audio--visual matching ability of German and French fluent speech was assessed by using a variant of the intermodal matching procedure, with auditory and visual speech information presented sequentially. In Experiment 1, the sentences were spoken in an adult-directed (AD) manner. Results showed that 12-month-old ...

  10. Attitude of medical students towards the use of audio visual aids during didactic lectures in pharmacology in a medical college of central India

    OpenAIRE

    Mehul Agrawal; Rajanish Kumar Sankdia

    2016-01-01

    Background: Students favour teaching methods employing audio visual aids over didactic lectures not using these aids. However, the optimum use of audio visual aids is essential for deriving their benefits. During a lecture, both the visual and auditory senses are used to absorb information. Different methods of lecture are and ndash; chalk and board, power point presentations (PPT) and mix of aids. This study was done to know the students' preference regarding the various audio visual aids, ...

  11. Effects of audio-visual aids on foreign language test anxiety, reading and listening comprehension, and retention in EFL learners.

    Science.gov (United States)

    Lee, Shu-Ping; Lee, Shin-Da; Liao, Yuan-Lin; Wang, An-Chi

    2015-04-01

    This study examined the effects of audio-visual aids on anxiety, comprehension test scores, and retention in reading and listening to short stories in English as a Foreign Language (EFL) classrooms. Reading and listening tests, general and test anxiety, and retention were measured in English-major college students in an experimental group with audio-visual aids (n=83) and a control group without audio-visual aids (n=94) with similar general English proficiency. Lower reading test anxiety, unchanged reading comprehension scores, and better reading short-term and long-term retention after four weeks were evident in the audiovisual group relative to the control group. In addition, lower listening test anxiety, higher listening comprehension scores, and unchanged short-term and long-term retention were found in the audiovisual group relative to the control group after the intervention. Audio-visual aids may help to reduce EFL learners' listening test anxiety and enhance their listening comprehension scores without facilitating retention of such materials. Although audio-visual aids did not increase reading comprehension scores, they helped reduce EFL learners' reading test anxiety and facilitated retention of reading materials. PMID:25914939

  12. Role of contextual cues on the perception of spectrally reduced interrupted speech.

    Science.gov (United States)

    Patro, Chhayakanta; Mendel, Lisa Lucks

    2016-08-01

    Understanding speech within an auditory scene is constantly challenged by interfering noise in suboptimal listening environments when noise hinders the continuity of the speech stream. In such instances, a typical auditory-cognitive system perceptually integrates available speech information and "fills in" missing information in the light of semantic context. However, individuals with cochlear implants (CIs) find it difficult and effortful to understand interrupted speech compared to their normal hearing counterparts. This inefficiency in perceptual integration of speech could be attributed to further degradations in the spectral-temporal domain imposed by CIs making it difficult to utilize the contextual evidence effectively. To address these issues, 20 normal hearing adults listened to speech that was spectrally reduced and spectrally reduced interrupted in a manner similar to CI processing. The Revised Speech Perception in Noise test, which includes contextually rich and contextually poor sentences, was used to evaluate the influence of semantic context on speech perception. Results indicated that listeners benefited more from semantic context when they listened to spectrally reduced speech alone. For the spectrally reduced interrupted speech, contextual information was not as helpful under significant spectral reductions, but became beneficial as the spectral resolution improved. These results suggest top-down processing facilitates speech perception up to a point, and it fails to facilitate speech understanding when the speech signals are significantly degraded. PMID:27586760

  13. Open-Loop Audio-Visual Stimulation (AVS): A Useful Tool for Management of Insomnia?

    Science.gov (United States)

    Tang, Hsin-Yi Jean; Riegel, Barbara; McCurry, Susan M; Vitiello, Michael V

    2016-03-01

    Audio Visual Stimulation (AVS), a form of neurofeedback, is a non-pharmacological intervention that has been used for both performance enhancement and symptom management. We review the history of AVS, its two sub-types (close- and open-loop), and discuss its clinical implications. We also describe a promising new application of AVS to improve sleep, and potentially decrease pain. AVS research can be traced back to the late 1800s. AVS's efficacy has been demonstrated for both performance enhancement and symptom management. Although AVS is commonly used in clinical settings, there is limited literature evaluating clinical outcomes and mechanisms of action. One of the challenges to AVS research is the lack of standardized terms, which makes systematic review and literature consolidation difficult. Future studies using AVS as an intervention should; (1) use operational definitions that are consistent with the existing literature, such as AVS, Audio-visual Entrainment, or Light and Sound Stimulation, (2) provide a clear rationale for the chosen training frequency modality, (3) use a randomized controlled design, and (4) follow the Consolidated Standards of Reporting Trials and/or related guidelines when disseminating results. PMID:26294268

  14. The presentation of expert testimony via live audio-visual communication.

    Science.gov (United States)

    Miller, R D

    1991-01-01

    As part of a national effort to improve efficiency in court procedures, the American Bar Association has recommended, on the basis of a number of pilot studies, increased use of current audio-visual technology, such as telephone and live video communication, to eliminate delays caused by unavailability of participants in both civil and criminal procedures. Although these recommendations were made to facilitate court proceedings, and for the convenience of attorneys and judges, they also have the potential to save significant time for clinical expert witnesses as well. The author reviews the studies of telephone testimony that were done by the American Bar Association and other legal research groups, as well as the experience in one state forensic evaluation and treatment center. He also reviewed the case law on the issue of remote testimony. He then presents data from a national survey of state attorneys general concerning the admissibility of testimony via audio-visual means, including video depositions. Finally, he concludes that the option to testify by telephone provides a significant savings in precious clinical time for forensic clinicians in public facilities, and urges that such clinicians work actively to convince courts and/or legislatures in states that do not permit such testimony (currently the majority), to consider accepting it, to improve the effective use of scarce clinical resources in public facilities. PMID:2039847

  15. A new alley in Opinion Mining using Senti Audio Visual Algorithm

    Directory of Open Access Journals (Sweden)

    Mukesh Rawat,

    2016-02-01

    Full Text Available People share their views about products and services over social media, blogs, forums etc. If someone is willing to spend resources and money over these products and services will definitely learn about them from the past experiences of their peers. Opinion mining plays vital role in knowing increasing interests of a particular community, social and political events, making business strategies, marketing campaigns etc. This data is in unstructured form over internet but analyzed properly can be of great use. Sentiment analysis focuses on polarity detection of emotions like happy, sad or neutral. In this paper we proposed an algorithm i.e. Senti Audio Visual for examining Video as well as Audio sentiments. A review in the form of video/audio may contain several opinions/emotions, this algorithm will classify the reviews with the help of Baye’s Classifiers to three different classes i.e., positive, negative or neutral. The algorithm will use smiles, cries, gazes, pauses, pitch, and intensity as relevant Audio Visual features.

  16. Designing Promotion Strategy of Malang Raya’s Tourism Destination Branding through Audio Visual Media

    Directory of Open Access Journals (Sweden)

    Chanira Nuansa

    2014-04-01

    Full Text Available This study examines the suitability concept of destination branding with existing models of Malang tourism promotion. This research is qualitative by taking the data directly in the form of existing promotional models of Malang, namely: information portal sites, blogs, social networking, and video via the Internet. This study used SWOT analysis to find strengths, weaknesses, opportunities, and threats on existing models of the tourism promotion. The data is analyzed based on destination branding’s concept indicators. Results of analysis are used as a basis in designing solutions for Malang tourism promotion through a new integrated tourism advertising model. Through the analysis we found that video is the most suitable media that used to promote Malang tourism in the form of advertisements. Videos are able to show the objectivity of the fact that intact better through audio-visual form, making it easier to associate the viewer thoughts on the phenomenon of destination. Moreover, video creation of Malang tourism as well as conceptualized ad is still rare. This is an opportunity, because later models of audio-visual advertisements made of this study is expected to be an example for concerned parties to conceptualize the next Malang tourism advertising.Keywords: Advertise, SWOT Analysis, Malang City, tourism promotion

  17. THE IMPROVEMENT OF AUDIO-VISUAL BASED DANCE APPRECIATION LEARNING AMONG PRIMARY TEACHER EDUCATION STUDENTS OF MAKASSAR STATE UNIVERSITY

    Directory of Open Access Journals (Sweden)

    Wahira

    2014-06-01

    Full Text Available This research aimed to improve the skill in appreciating dances owned by the students of Primary Teacher Education of Makassar State University, to improve the perception towards audio-visual based art appreciation, to increase the students’ interest in audio-visual based art education subject, and to increase the students’ responses to the subject. This research was classroom action research using the research design created by Kemmis & MC. Taggart, which was conducted to 42 students of Primary Teacher Education of Makassar State University. The data collection was conducted using observation, questionnaire, and interview. The techniques of data analysis applied in this research were descriptive qualitative and quantitative. The results of this research were: (1 the students’ achievement in audio-visual based dance appreciation improved: precycle 33,33%, cycle I 42,85% and cycle II 83,33%, (2 the students’ perception towards the audio-visual based dance appreciation improved: cycle I 59,52%, and cycle II 71,42%. The students’ perception towards the subject obtained through structured interview in cycle I and II was 69,83% in a high category, (3 the interest of the students in the art education subject, especially audio-visual based dance appreciation, increased: cycle I 52,38% and cycle II 64,28%, and the students’ interest in the subject obtained through structured interview was 69,50 % in a high category. (3 the students’ response to audio-visual based dance appreciation increased: cycle I 54,76% and cycle II 69,04% in a good category.

  18. Natural speech cues to word segmentation under difficult listening conditions

    OpenAIRE

    Cutler, A.; Butterfield, S.

    1989-01-01

    One of a listener's major tasks in understanding continuous speech is segmenting the speech signal into separate words. When listening conditions are difficult, speakers can help listeners by deliberately speaking more clearly. In three experiments, we examined how word boundaries are produced in deliberately clear speech. We found that speakers do indeed attempt to mark word boundaries; moreover, they differentiate between word boundaries in a way which suggests they are sensitive to listene...

  19. Integrating Audio-Visual Features and Text Information for Story Segmentation of News Video

    Institute of Scientific and Technical Information of China (English)

    LiuHua-yong; ZhouDong-ru

    2003-01-01

    Video data are composed of multimodal information streams including visual, auditory and textual streams, an approach of story segmentation for news video using multimodal analysis is described in this paper. The proposed approach detects the topic-caption frames, and integrates them with silence clips detection results, as well as shot segmentation results to locate the news story boundaries. The integration of audio-visual features and text information overcomes the weakness of the approach using only image analysis techniques. On test data with 135 400 frames, when the boundaries between news stories are detected, the accuracy rate 85.8% and the recall rate 97.5% are obtained. The experimental results show the approach is valid and robust.

  20. Integrating Audio-Visual Features and Text Information for Story Segmentation of News Video

    Institute of Scientific and Technical Information of China (English)

    Liu Hua-yong; Zhou Dong-ru

    2003-01-01

    Video data are composed of multimodal information streams including visual, auditory and textual streams, so an approach of story segmentation for news video using multimodal analysis is described in this paper. The proposed approach detects the topic-caption frames, and integrates them with silence clips detection results, as well as shot segmentation results to locate the news story boundaries. The integration of audio-visual features and text information overcomes the weakness of the approach using only image analysis techniques. On test data with 135 400 frames, when the boundaries between news stories are detected, the accuracy rate 85.8% and the recall rate 97.5% are obtained. The experimental results show the approach is valid and robust.

  1. How is the McGurk effect modulated by Cued Speech in deaf and hearing adults?

    Science.gov (United States)

    Bayard, Clémence; Colin, Cécile; Leybaert, Jacqueline

    2014-01-01

    Speech perception for both hearing and deaf people involves an integrative process between auditory and lip-reading information. In order to disambiguate information from lips, manual cues from Cued Speech may be added. Cued Speech (CS) is a system of manual aids developed to help deaf people to clearly and completely understand speech visually (Cornett, 1967). Within this system, both labial and manual information, as lone input sources, remain ambiguous. Perceivers, therefore, have to combine both types of information in order to get one coherent percept. In this study, we examined how audio-visual (AV) integration is affected by the presence of manual cues and on which form of information (auditory, labial or manual) the CS receptors primarily rely. To address this issue, we designed a unique experiment that implemented the use of AV McGurk stimuli (audio /pa/ and lip-reading /ka/) which were produced with or without manual cues. The manual cue was congruent with either auditory information, lip information or the expected fusion. Participants were asked to repeat the perceived syllable aloud. Their responses were then classified into four categories: audio (when the response was /pa/), lip-reading (when the response was /ka/), fusion (when the response was /ta/) and other (when the response was something other than /pa/, /ka/ or /ta/). Data were collected from hearing impaired individuals who were experts in CS (all of which had either cochlear implants or binaural hearing aids; N = 8), hearing-individuals who were experts in CS (N = 14) and hearing-individuals who were completely naïve of CS (N = 15). Results confirmed that, like hearing-people, deaf people can merge auditory and lip-reading information into a single unified percept. Without manual cues, McGurk stimuli induced the same percentage of fusion responses in both groups. Results also suggest that manual cues can modify the AV integration and that their impact differs between hearing and deaf people

  2. How is the McGurk effect modulated by Cued Speech in deaf and hearing adults ?

    Directory of Open Access Journals (Sweden)

    Clémence eBayard

    2014-05-01

    Full Text Available Speech perception for both hearing and deaf people involves an integrative process between auditory and lip-reading information. In order to disambiguate information from lips, manual cues from Cued Speech may be added. Cued Speech (CS is a system of manual aids developed to help deaf people to clearly and completely understand speech visually (Cornett, 1967. Within this system, both labial and manual information, as lone input sources, remain ambiguous. Perceivers, therefore, have to combine both types of information in order to get one coherent percept. In this study, we examined how audio-visual (AV integration is affected by the presence of manual cues and on which form of information (auditory, labial or manual the CS receptors primarily rely. To address this issue, we designed a unique experiment that implemented the use of AV McGurk stimuli (audio /pa/ and lip-reading /ka/ which were produced with or without manual cues. The manual cue was congruent with either auditory information, lip information or the expected fusion. Participants were asked to repeat the perceived syllable aloud. Their responses were then classified into four categories: audio (when the response was /pa/, lip-reading (when the response was /ka/, fusion (when the response was /ta/ and other (when the response was something other than /pa/, /ka/ or /ta/. Data were collected from hearing impaired individuals who were experts in CS (all of which had either cochlear implants or binaural hearing aids; N=8, hearing-individuals who were experts in CS (N = 14 and hearing-individuals who were completely naïve of CS (N = 15. Results confirmed that, like hearing-people, deaf people can merge auditory and lip-reading information into a single unified percept. Without manual cues, McGurk stimuli induced the same percentage of fusion responses in both groups. Results also suggest that manual cues can modify the AV integration and that their impact differs between hearing and deaf

  3. Equipped for the 21st Century?: Audio-Visual Resource Standards and Product Demands from Geography Departments in the UK.

    Science.gov (United States)

    McKendrick, John H.; Bowden, Annabel

    2000-01-01

    Reports on a survey of United Kingdom geography departments where data were collected on the availability, use, and opinions about the role of audio visual resources (AVRs) in teaching and learning. Reveals that AVRs are seen positively, hardware is readily available, software provision is uneven, and AVR commitment varies. (CMK)

  4. Changes in the Management of Information in Audio-Visual Archives following Digitization: Current and Future Outlook

    Science.gov (United States)

    Caldera-Serrano, Jorge

    2008-01-01

    This article attempts to offer an overview of the current changes that are being experienced in the management of audio-visual documentation and those that can be forecast in the future as a result of the migration from analogue to digital information. For this purpose the documentary chain will be used as a basis to analyse individually the tasks…

  5. The role of vowel perceptual cues in compensatory responses to perturbations of speech auditory feedback

    OpenAIRE

    Reilly, Kevin J.; Dougherty, Kathleen E.

    2013-01-01

    The perturbation of acoustic features in a speaker's auditory feedback elicits rapid compensatory responses that demonstrate the importance of auditory feedback for control of speech output. The current study investigated whether responses to a perturbation of speech auditory feedback vary depending on the importance of the perturbed feature to perception of the vowel being produced. Auditory feedback of speakers' first formant frequency (F1) was shifted upward by 130 mels in randomly selecte...

  6. Standard operating procedure for audio visual recording of informed consent: An initiative to facilitate regulatory compliance

    Directory of Open Access Journals (Sweden)

    P M Parikh

    2014-01-01

    Full Text Available The office of the Drugs Controller General (India vide order dated 19 th November 2013 has made audio visual (AV recording of the informed consent mandatory for the conduct of all clinical trials in India. We therefore developed a standard operating procedure (SOP to ensure that this is performed in compliance with the regulatory requirements, internationally accepted ethical standards and that the recording is stored as well as archived in an appropriate manner. The SOP was developed keeping in mind all relevant orders, regulations, laws and guidelines and have been made available online. Since, we are faced with unique legal and regulatory requirements that are unprecedented globally, this SOP will allow the AV recording of the informed consent to be performed, archived and retrieved to demonstrate ethical, legal and regulatory compliance. We also compared this to the draft guidelines for AV recording dated 9 th January 2014 developed by Central Drugs Standard Control Organization. Our future efforts will include regular testing, feedback and update of the SOP.

  7. Effects of hearing loss on the subcortical representation of speech cues.

    Science.gov (United States)

    Anderson, Samira; Parbery-Clark, Alexandra; White-Schwoch, Travis; Drehobl, Sarah; Kraus, Nina

    2013-05-01

    Individuals with sensorineural hearing loss often report frustration with speech being loud but not clear, especially in background noise. Despite advanced digital technology, hearing aid users may resort to removing their hearing aids in noisy environments due to the perception of excessive loudness. In an animal model, sensorineural hearing loss results in greater auditory nerve coding of the stimulus envelope, leading to a relative deficit of stimulus fine structure. Based on the hypothesis that brainstem encoding of the temporal envelope is greater in humans with sensorineural hearing loss, speech-evoked brainstem responses were recorded in normal hearing and hearing impaired age-matched groups of older adults. In the hearing impaired group, there was a disruption in the balance of envelope-to-fine structure representation compared to that of the normal hearing group. This imbalance may underlie the difficulty experienced by individuals with sensorineural hearing loss when trying to understand speech in background noise. This finding advances the understanding of the effects of sensorineural hearing loss on central auditory processing of speech in humans. Moreover, this finding has clinical potential for developing new amplification or implantation technologies, and in developing new training regimens to address this relative deficit of fine structure representation. PMID:23654406

  8. Universal and language-specific sublexical cues in speech perception: a novel electroencephalography-lesion approach.

    Science.gov (United States)

    Obrig, Hellmuth; Mentzel, Julia; Rossi, Sonja

    2016-06-01

    SEE CAPPA DOI101093/BRAIN/AWW090 FOR A SCIENTIFIC COMMENTARY ON THIS ARTICLE  : The phonological structure of speech supports the highly automatic mapping of sound to meaning. While it is uncontroversial that phonotactic knowledge acts upon lexical access, it is unclear at what stage these combinatorial rules, governing phonological well-formedness in a given language, shape speech comprehension. Moreover few studies have investigated the neuronal network affording this important step in speech comprehension. Therefore we asked 70 participants-half of whom suffered from a chronic left hemispheric lesion-to listen to 252 different monosyllabic pseudowords. The material models universal preferences of phonotactic well-formedness by including naturally spoken pseudowords and digitally reversed exemplars. The latter partially violate phonological structure of all human speech and are rich in universally dispreferred phoneme sequences while preserving basic auditory parameters. Language-specific constraints were modelled in that half of the naturally spoken pseudowords complied with the phonotactics of the native language of the monolingual participants (German) while the other half did not. To ensure universal well-formedness and naturalness, the latter stimuli comply with Slovak phonotactics and all stimuli were produced by an early bilingual speaker. To maximally attenuate lexico-semantic influences, transparent pseudowords were avoided and participants had to detect immediate repetitions, a task orthogonal to the contrasts of interest. The results show that phonological 'well-formedness' modulates implicit processing of speech at different levels: universally dispreferred phonological structure elicits early, medium and late latency differences in the evoked potential. On the contrary, the language-specific phonotactic contrast selectively modulates a medium latency component of the event-related potentials around 400 ms. Using a novel event-related potential

  9. The efectiveness of mnemonic audio-visual aids in teaching content words to EFL students at a Turkish university

    OpenAIRE

    Kılınç, A Reha

    1996-01-01

    Ankara : Institute of Economics and Social Sciences, Bilkent University, 1996. Thesis(Master's) -- Bilkent University, 1996. Includes bibliographical references leaves 63-67 This experimental study aimed at investigating the effects of mnemonic audio-visual aids on recognition and recall of vocabulary items in comparison to a dictionary using control group. The study was conducted at Middle East Technical University Department of Basic English. The participants were 64 beginner and u...

  10. From vibration to perception: using Large Multi-Actuator Panels (LaMAPs) to create coherent audio-visual environments

    OpenAIRE

    Rébillat, Marc; Corteel, Etienne; Katz, Brian,; Boutillon, Xavier

    2012-01-01

    International audience Virtual reality aims at providing users with audio-visual worlds where they will behave and learn as if they were in the real world. In this context, specific acoustic transducers are needed to fulfill simultaneous spatial requirements on visual and audio rendering in order to make them coherent. Large multi-actuator panels (LaMAPs) allow for the combined construction of a projection screen and loudspeaker array, and thus allows for the coherent creation of an audio ...

  11. Classification of cooperative and competitive overlaps in speech using cues from the context, overlapper, and overlappee

    OpenAIRE

    Truong, Khiet P.

    2013-01-01

    One of the major properties of overlapping speech is that it can be perceived as competitive or cooperative. For the development of real-time spoken dialog systems and the analysis of affective and social human behavior in conversations, it is important to (automatically) distinguish between these two types of overlap. We investigate acoustic characteristics of cooperative and competitive overlaps with the aim to develop automatic classifiers for the classification of overlaps. In addition to...

  12. Speech recall and word recognition depending on prosodic and musical cues as well as voice pitch

    OpenAIRE

    Rozanovskaya, Anna; Sokolova, Taisia

    2011-01-01

    Within this study, speech perception in different conditions was examined. The aim of the research was to compare perception results based on stimuli mode (plain spoken, rhythmic spoken or rhythmic sung stimuli) and pitch (normal, lower and higher). In the study, an experiment was conducted on 44 participants who had been asked to listen to 9 recorded sentences in Russian language (unknown to them) and write them down using Latin letters. These 9 sentences were specially prepared using differ...

  13. Effects of hearing loss on the subcortical representation of speech cues

    OpenAIRE

    Anderson, Samira; Parbery-Clark, Alexandra; White-Schwoch, Travis; Drehobl, Sarah; Kraus, Nina

    2013-01-01

    Individuals with sensorineural hearing loss often report frustration with speech being loud but not clear, especially in background noise. Despite advanced digital technology, hearing aid users may resort to removing their hearing aids in noisy environments due to the perception of excessive loudness. In an animal model, sensorineural hearing loss results in greater auditory nerve coding of the stimulus envelope, leading to a relative deficit of stimulus fine structure. Based on the hypothesi...

  14. Normal Gaze Cueing in Children with Autism Is Disrupted by Simultaneous Speech Utterances in “Live” Face-to-Face Interactions

    Directory of Open Access Journals (Sweden)

    Douglas D. Potter

    2011-01-01

    Full Text Available Gaze cueing was assessed in children with autism and in typically developing children, using a computer-controlled “live” face-to-face procedure. Sensitivity to gaze direction was assessed using a Posner cuing paradigm. Both static and dynamic directional gaze cues were used. Consistent with many previous studies, using photographic and cartoon faces, gaze cueing was present in children with autism and was not developmentally delayed. However, in the same children, gaze cueing was abolished when a mouth movement occurred at the same time as the gaze cue. In contrast, typical children were able to use gaze cues in all conditions. The findings indicate that gaze cueing develops successfully in some children with autism but that their attention is disrupted by speech utterances. Their ability to learn to read nonverbal emotional and intentional signals provided by the eyes may therefore be significantly impaired. This may indicate a problem with cross-modal attention control or an abnormal sensitivity to peripheral motion in general or the mouth region in particular.

  15. Audibility, speech perception and processing of temporal cues in ribbon synaptic disorders due to OTOF mutations.

    Science.gov (United States)

    Santarelli, Rosamaria; del Castillo, Ignacio; Cama, Elona; Scimemi, Pietro; Starr, Arnold

    2015-12-01

    Mutations in the OTOF gene encoding otoferlin result in a disrupted function of the ribbon synapses with impairment of the multivesicular glutamate release. Most affected subjects present with congenital hearing loss and abnormal auditory brainstem potentials associated with preserved cochlear hair cell activities (otoacoustic emissions, cochlear microphonics [CMs]). Transtympanic electrocochleography (ECochG) has recently been proposed for defining the details of potentials arising in both the cochlea and auditory nerve in this disorder, and with a view to shedding light on the pathophysiological mechanisms underlying auditory dysfunction. We review the audiological and electrophysiological findings in children with congenital profound deafness carrying two mutant alleles of the OTOF gene. We show that cochlear microphonic (CM) amplitude and summating potential (SP) amplitude and latency are normal, consistently with a preserved outer and inner hair cell function. In the majority of OTOF children, the SP component is followed by a markedly prolonged low-amplitude negative potential replacing the compound action potential (CAP) recorded in normally-hearing children. This potential is identified at intensities as low as 90 dB below the behavioral threshold. In some ears, a synchronized CAP is superimposed on the prolonged responses at high intensity. Stimulation at high rates reduces the amplitude and duration of the prolonged potentials, consistently with their neural generation. In some children, however, the ECochG response only consists of the SP, with no prolonged potential. Cochlear implants restore hearing sensitivity, speech perception and neural CAP by electrically stimulating the auditory nerve fibers. These findings indicate that an impaired multivesicular glutamate release in OTOF-related disorders leads to abnormal auditory nerve fiber activation and a consequent impairment of spike generation. The magnitude of these effects seems to vary, ranging from

  16. Audio-Visual and Autogenic Relaxation Alter Amplitude of Alpha EEG Band, Causing Improvements in Mental Work Performance in Athletes.

    Science.gov (United States)

    Mikicin, Mirosław; Kowalczyk, Marek

    2015-09-01

    The aim of the present study was to investigate the effect of regular audio-visual relaxation combined with Schultz's autogenic training on: (1) the results of behavioral tests that evaluate work performance during burdensome cognitive tasks (Kraepelin test), (2) changes in classical EEG alpha frequency band, neocortex (frontal, temporal, occipital, parietal), hemisphere (left, right) versus condition (only relaxation 7-12 Hz). Both experimental (EG) and age-and skill-matched control group (CG) consisted of eighteen athletes (ten males and eight females). After 7-month training EG demonstrated changes in the amplitude of mean electrical activity of the EEG alpha bend at rest and an improvement was significantly changing and an improvement in almost all components of Kraepelin test. The same examined variables in CG were unchanged following the period without the intervention. Summing up, combining audio-visual relaxation with autogenic training significantly improves athlete's ability to perform a prolonged mental effort. These changes are accompanied by greater amplitude of waves in alpha band in the state of relax. The results suggest usefulness of relaxation techniques during performance of mentally difficult sports tasks (sports based on speed and stamina, sports games, combat sports) and during relax of athletes. PMID:26016588

  17. Speech emotion recognition in emotional feedback for Human-Robot Interaction

    Directory of Open Access Journals (Sweden)

    Javier G. R´azuri

    2015-02-01

    Full Text Available For robots to plan their actions autonomously and interact with people, recognizing human emotions is crucial. For most humans nonverbal cues such as pitch, loudness, spectrum, speech rate are efficient carriers of emotions. The features of the sound of a spoken voice probably contains crucial information on the emotional state of the speaker, within this framework, a machine might use such properties of sound to recognize emotions. This work evaluated six different kinds of classifiers to predict six basic universal emotions from non-verbal features of human speech. The classification techniques used information from six audio files extracted from the eNTERFACE05 audio-visual emotion database. The information gain from a decision tree was also used in order to choose the most significant speech features, from a set of acoustic features commonly extracted in emotion analysis. The classifiers were evaluated with the proposed features and the features selected by the decision tree. With this feature selection could be observed that each one of compared classifiers increased the global accuracy and the recall. The best performance was obtained with Support Vector Machine and bayesNet.

  18. Audio-visual feedback improves the BCI performance in the navigational control of a humanoid robot

    OpenAIRE

    Emmanuele Tidoni; Pierre Gergondet

    2014-01-01

    Advancement in brain computer interfaces (BCI) technology allows people to actively interact in the world through surrogates. Controlling real humanoid robots using BCI as intuitively as we control our body represents a challenge for current research in robotics and neuroscience. In order to successfully interact with the environment the brain integrates multiple sensory cues to form a coherent representation of the world. Cognitive neuroscience studies demonstrate that multisensory integrati...

  19. Hand gestures as visual prosody: BOLD responses to audio-visual alignment are modulated by the communicative nature of the stimuli.

    Science.gov (United States)

    Biau, Emmanuel; Morís Fernández, Luis; Holle, Henning; Avila, César; Soto-Faraco, Salvador

    2016-05-15

    During public addresses, speakers accompany their discourse with spontaneous hand gestures (beats) that are tightly synchronized with the prosodic contour of the discourse. It has been proposed that speech and beat gestures originate from a common underlying linguistic process whereby both speech prosody and beats serve to emphasize relevant information. We hypothesized that breaking the consistency between beats and prosody by temporal desynchronization, would modulate activity of brain areas sensitive to speech-gesture integration. To this aim, we measured BOLD responses as participants watched a natural discourse where the speaker used beat gestures. In order to identify brain areas specifically involved in processing hand gestures with communicative intention, beat synchrony was evaluated against arbitrary visual cues bearing equivalent rhythmic and spatial properties as the gestures. Our results revealed that left MTG and IFG were specifically sensitive to speech synchronized with beats, compared to the arbitrary vision-speech pairing. Our results suggest that listeners confer beats a function of visual prosody, complementary to the prosodic structure of speech. We conclude that the emphasizing function of beat gestures in speech perception is instantiated through a specialized brain network sensitive to the communicative intent conveyed by a speaker with his/her hands. PMID:26892858

  20. Brain responses and looking behaviour during audiovisual speech integration in infants predict auditory speech comprehension in the second year of life.

    Directory of Open Access Journals (Sweden)

    ElenaVKushnerenko

    2013-07-01

    Full Text Available The use of visual cues during the processing of audiovisual speech is known to be less efficient in children and adults with language difficulties and difficulties are known to be more prevalent in children from low-income populations. In the present study, we followed an economically diverse group of thirty-seven infants longitudinally from 6-9 months to 14-16 months of age. We used eye-tracking to examine whether individual differences in visual attention during audiovisual processing of speech in 6 to 9 month old infants, particularly when processing congruent and incongruent auditory and visual speech cues, might be indicative of their later language development. Twenty-two of these 6-9 month old infants also participated in an event-related potential (ERP audiovisual task within the same experimental session. Language development was then followed-up at the age of 14-16 months, using two measures of language development, the Preschool Language Scale (PLS and the Oxford Communicative Development Inventory (CDI. The results show that those infants who were less efficient in auditory speech processing at the age of 6-9 months had lower receptive language scores at 14-16 months. A correlational analysis revealed that the pattern of face scanning and ERP responses to audio-visually incongruent stimuli at 6-9 months were both significantly associated with language development at 14-16 months. These findings add to the understanding of individual differences in neural signatures of audiovisual processing and associated looking behaviour in infants.

  1. Audiovisual Speech Perception in Infancy: The Influence of Vowel Identity and Infants' Productive Abilities on Sensitivity to (Mis)Matches between Auditory and Visual Speech Cues

    Science.gov (United States)

    Altvater-Mackensen, Nicole; Mani, Nivedita; Grossmann, Tobias

    2016-01-01

    Recent studies suggest that infants' audiovisual speech perception is influenced by articulatory experience (Mugitani et al., 2008; Yeung & Werker, 2013). The current study extends these findings by testing if infants' emerging ability to produce native sounds in babbling impacts their audiovisual speech perception. We tested 44 6-month-olds…

  2. New method for mathematical modelling of human visual speech

    OpenAIRE

    Sadaghiani, Mohammad Hossein/Mr.

    2015-01-01

    Audio-visual speech recognition and visual speech synthesisers are used as interfaces between humans and machines. Such interactions specifically rely on the analysis and synthesis of both audio and visual information, which humans use for face-to-face communication. Currently, there is no global standard to describe these interactions nor is there a standard mathematical tool to describe lip movements. Furthermore, the visual lip movement for each phoneme is considered in isolation rather th...

  3. Clever Use of Audio-visual Media to Promote the Teaching of History%巧用电教媒体推进历史教学

    Institute of Scientific and Technical Information of China (English)

    刘艳丽

    2012-01-01

    把电教手段引入课堂教学是一类比较新的教学方式创新,特别是运用在历史的教学实践中,在历史课堂上使用电教媒体,不但能增加学生对过去历史的具体感知,同时也可以通过对历史事实的客观描述强化思维力度.本文笔者对电教媒体教学的特点进行了详细介绍,并就如何利用电教媒体推动历史教学方面谈了自己的一些感受.%Audio-visual means of introduction of classroom teaching a class of relatively new way of teaching innovation, especially the use of in the history of teaching practice, the use of audio-visual media in the history classroom, not only to in- crease the students' past history perception, and also throughobjective description of historical facts are efforts to suengthen the thinking. This article the author described in detail the characteristics of the audio-visual media teaching, and on how to promote the teaching of history in the use of audio-visual media to talk about their own feelings.

  4. Twenty-Fifth Annual Audio-Visual Aids Conference, Wednesday 9th to Friday 11th July 1975, Whitelands College, Putney SW15. Conference Preprints.

    Science.gov (United States)

    National Committee for Audio-Visual Aids in Education, London (England).

    Preprints of papers to be presented at the 25th annual Audio-Visual Aids Conference are collected along with the conference program. Papers include official messages, a review of the conference's history, and presentations on photography in education, using school broadcasts, flexibility in the use of television, the "communications generation,"…

  5. Attitude of medical students towards the use of audio visual aids during didactic lectures in pharmacology in a medical college of central India

    Directory of Open Access Journals (Sweden)

    Mehul Agrawal

    2016-04-01

    Conclusions: In our study we found that students preferred mixture of audio visual aids over other teaching methods. Teachers should consider the suggestions given by the students while preparing their lectures. [Int J Basic Clin Pharmacol 2016; 5(2.000: 416-422

  6. Arousal and valence prediction in spontaneous emotional speech: felt versus perceived emotion

    OpenAIRE

    Truong, K. P.; Leeuwen, D.A. van; Neerincx, M.A.; de Jong, F.M.G.

    2009-01-01

    In this paper, we describe emotion recognition experiments car- ried out for spontaneous affective speech with the aim to com- pare the added value of annotation of felt emotion versus an- notation of perceived emotion. Using speech material avail- able in the TNO-GAMING corpus (a corpus containing audio- visual recordings of people playing videogames), speech-based affect recognizers were developed that can predict Arousal and Valence scalar values. Two types of recognizers were devel- oped ...

  7. Design Foundations for Content-Rich Acoustic Interfaces: Investigating Audemes as Referential Non-Speech Audio Cues

    Science.gov (United States)

    Ferati, Mexhid Adem

    2012-01-01

    To access interactive systems, blind and visually impaired users can leverage their auditory senses by using non-speech sounds. The current structure of non-speech sounds, however, is geared toward conveying user interface operations (e.g., opening a file) rather than large theme-based information (e.g., a history passage) and, thus, is ill-suited…

  8. Pitch and spectral resolution: A systematic comparison of bottom-up cues for top-down repair of degraded speech

    NARCIS (Netherlands)

    Clarke, Jeanne; Başkent, Deniz; Gaudrain, Etienne

    2016-01-01

    The brain is capable of restoring missing parts of speech, a top-down repair mechanism that enhances speech understanding in noisy environments. This enhancement can be quantified using the phonemic restoration paradigm, i.e., the improvement in intelligibility when silent interruptions of interrupt

  9. A scheme for racquet sports video analysis with the combination of audio-visual information

    Science.gov (United States)

    Xing, Liyuan; Ye, Qixiang; Zhang, Weigang; Huang, Qingming; Yu, Hua

    2005-07-01

    As a very important category in sports video, racquet sports video, e.g. table tennis, tennis and badminton, has been paid little attention in the past years. Considering the characteristics of this kind of sports video, we propose a new scheme for structure indexing and highlight generating based on the combination of audio and visual information. Firstly, a supervised classification method is employed to detect important audio symbols including impact (ball hit), audience cheers, commentator speech, etc. Meanwhile an unsupervised algorithm is proposed to group video shots into various clusters. Then, by taking advantage of temporal relationship between audio and visual signals, we can specify the scene clusters with semantic labels including rally scenes and break scenes. Thirdly, a refinement procedure is developed to reduce false rally scenes by further audio analysis. Finally, an exciting model is proposed to rank the detected rally scenes from which many exciting video clips such as game (match) points can be correctly retrieved. Experiments on two types of representative racquet sports video, table tennis video and tennis video, demonstrate encouraging results.

  10. MENINGKATKAN KEMAMPUAN SISWA KELAS B DALAM PENGENALAN HURUF (AKSARA) DENGAN MENGGUNAKAN MEDIA AUDIO-VISUAL DI TK NEGERI PEMBINA 3 TARAKAN

    OpenAIRE

    Salbiah

    2011-01-01

    SALBIAH. 2011. Improve students' ability in identifying class Letter B (Script) by using audio visual media learning in TK Negeri Pembina 3 Tarakan.Thesis. Teacher Education Pedagogy Faculty of Education University of Tarakan. Main supervisor: Zulkifli, Assistant Supervisor: Wiwit Ike. Early childhood literacy in diverse ways. Children associated with various forms of communication with the familiar forms of symbols long before they can read and write. Development of early literacy learning a...

  11. THE EFFECT OF USING AUDIO-VISUAL AIDS VERSUS PICTURES ON FOREIGN LANGUAGE VOCABULARY LEARNING OF INDIVIDUALS WITH MILD INTELLECTUAL DISABILITY

    Directory of Open Access Journals (Sweden)

    Zahra Sadat NOORI

    2016-04-01

    Full Text Available This study aimed to examine the effect of using audio-visual aids and pictures on foreign language vocabulary learning of individuals with mild intellectual disability. Method: To this end, a comparison group quasi-experimental study was conducted along with a pre-test and a post-test. The participants were 16 individuals with mild intellectual disability living in a center for mentally disabled individuals in Dezfoul, Iran. They were all male individuals with the age range of 20 to 30. Their mother tongue was Persian, and they did not have any English background. In order to ensure that all participants were within the same IQ level, a standard IQ test, i.e. Colored Progressive Matrices test, was run. Afterwards, the participants were randomly assigned to two experimental groups; one group received the instruction through audio-visual aids, while the other group was taught through pictures. The treatment lasted for four weeks, 20 sessions on aggregate. A total number of 60 English words selected from the English package named 'The Smart Child' were taught. After the treatment, the participants took the posttest in which the researchers randomly selected 40 words from among the 60 target words. Results: The results of Mann-Whitney U-test indicated that using audio-visual aids was more effective than pictures in foreign language vocabulary learning of individuals with mild intellectual disability. Conclusions: It can be concluded that the use of audio-visual aids can be more effective than pictures in foreign language vocabulary learning of individuals with mild intellectual disability.

  12. UNDERSTANDING PROSE THROUGH TASK ORIENTED AUDIO-VISUAL ACTIVITY: AN AMERICAN MODERN PROSE COURSE AT THE FACULTY OF LETTERS, PETRA CHRISTIAN UNIVERSITY

    Directory of Open Access Journals (Sweden)

    Sarah Prasasti

    2001-01-01

    Full Text Available The method presented here provides the basis for a course in American prose for EFL students. Understanding and appreciation of American prose is a difficult task for the students because they come into contact with works that are full of cultural baggage and far apart from their own world. The audio visual aid is one of the alternatives to sensitize the students to the topic and the cultural background. Instead of proving the ready-made audio visual aids, teachers can involve students to actively engage in a more task oriented audiovisual project. Here, the teachers encourage their students to create their own audio visual aids using colors, pictures, sound and gestures as a point of initiation for further discussion. The students can use color that has become a strong element of fiction to help them calling up a forceful visual representation. Pictures can also stimulate the students to build their mental image. Sound and silence, which are a part of the fabric of literature, may also help them to increase the emotional impact.

  13. Characterizing sensory and cognitive factors of human speech processing through eye movements

    OpenAIRE

    Wendt, Dorothea Christine

    2013-01-01

    The primary goal of this thesis is to gain a better insight into any impediments in speech processing that occur due to sensory and cognitive factors. To achieve this, a new audio-visual paradigm based on the analysis of eye-movements is developed here which allows for an online analysis of the speech understanding process with possible applications in the field of audiology. The proposed paradigm is used to investigate the influence of background noise and linguistic complexity on the proces...

  14. The Multi-Channel Wall Street Journal Audio Visual Corpus (MC-WSJ-AV): Specification and Initial Experiments

    OpenAIRE

    Lincoln, Mike; McCowan, Iain A.; Vepa, Jithendra; Maganti, Hari Krishna

    2005-01-01

    The recognition of speech in meetings poses a number of challenges to current Automatic Speech Recognition (ASR) techniques. Meetings typically take place in rooms with non-ideal acoustic conditions and significant background noise, and may contain large sections of overlapping speech. In such circumstances, headset microphones have to date provided the best recognition performance, however participants are often reluctant to wear them. Microphone arrays provide an alternative to close-talkin...

  15. Multisensory and Modality Specific Processing of Visual Speech in Different Regions of the Premotor Cortex

    Directory of Open Access Journals (Sweden)

    Daniel eCallan

    2014-05-01

    Full Text Available Behavioral and neuroimaging studies have demonstrated that brain regions involved with speech production also support speech perception, especially under degraded conditions. The premotor cortex has been shown to be active during both observation and execution of action (‘Mirror System’ properties, and may facilitate speech perception by mapping unimodal and multimodal sensory features onto articulatory speech gestures. For this functional magnetic resonance imaging (fMRI study, participants identified vowels produced by a speaker in audio-visual (saw the speaker’s articulating face and heard her voice, visual only (only saw the speaker’s articulating face, and audio only (only heard the speaker’s voice conditions with varying audio signal-to-noise ratios in order to determine the regions of the premotor cortex involved with multisensory and modality specific processing of visual speech gestures. The task was designed so that identification could be made with a high level of accuracy from visual only stimuli to control for task difficulty and differences in intelligibility. The results of the fMRI analysis for visual only and audio-visual conditions showed overlapping activity in inferior frontal gyrus and premotor cortex. The left ventral inferior premotor cortex showed properties of multimodal (audio-visual enhancement with a degraded auditory signal. The left inferior parietal lobule and right cerebellum also showed these properties. The left ventral superior and dorsal premotor cortex did not show this multisensory enhancement effect, but there was greater activity for the visual only over audio-visual conditions in these areas. The results suggest that the inferior regions of the ventral premotor cortex are involved with integrating multisensory information, whereas, more superior and dorsal regions of the premotor cortex are involved with mapping unimodal (in this case visual sensory features of the speech signal with

  16. AUDIO VISUAL MATERIALS.

    Science.gov (United States)

    ROBINAULT, ISABEL P.

    THIS PUBLICATION LISTS 127 FILMS AND FILMSTRIPS RELATED TO THE DIAGNOSIS AND HABILITATION OF CEREBRAL PALSIED PERSONS WITH VARYING AGES, NEEDS, AND CIRCUMSTANCES. THE TITLES ARE LISTED ALPHABETICALLY IN SECTIONS--BASIC SCIENCES AND BASIC INFORMATION, ACTIVITIES OF DAILY LIVING, MEDICAL ASPECTS AND THERAPEUTIC MANAGEMENT, EVALUATION AND…

  17. A framework for event detection in field-sports video broadcasts based on SVM generated audio-visual feature model. Case-study: soccer video

    OpenAIRE

    Sadlier, David A.; O''Connor, Noel E.; Murphy, Noel; Marlow, Seán

    2004-01-01

    In this paper we propose a novel audio-visual feature-based framework, for event detection in field sports broadcast video. The system is evaluated via a case-study involving MPEG encoded soccer video. Specifically, the evidence gathered by various feature detectors is combined by means of a learning algorithm (a support vector machine), which infers the occurrence of an event, based on a model generated during a training phase, utilizing a corpus of 25 hours of content. The system is evaluat...

  18. THE EFFECT OF USING AUDIO-VISUAL AIDS VERSUS PICTURES ON FOREIGN LANGUAGE VOCABULARY LEARNING OF INDIVIDUALS WITH MILD INTELLECTUAL DISABILITY

    OpenAIRE

    Zahra Sadat NOORI; FARVARDIN Mohammad Taghi

    2016-01-01

    This study aimed to examine the effect of using audio-visual aids and pictures on foreign language vocabulary learning of individuals with mild intellectual disability. Method: To this end, a comparison group quasi-experimental study was conducted along with a pre-test and a post-test. The participants were 16 individuals with mild intellectual disability living in a center for mentally disabled individuals in Dezfoul, Iran. They were all male individuals with the age range of 20 to 30. Th...

  19. Audio-visual speechreading in a group of hearing aid users. The effects of onset age, handicap age, and degree of hearing loss.

    Science.gov (United States)

    Tillberg, I; Rönnberg, J; Svärd, I; Ahlner, B

    1996-01-01

    Speechreading ability was investigated among hearing aid users with different time of onset and different degree of hearing loss. Audio-visual and visual-only performance were assessed. One group of subjects had been hearing-impaired for a large part of their lives, and the impairments appeared early in life. The other group of subjects had been impaired for a fewer number of years, and the impairments appeared later in life. Differences between the groups were obtained. There was no significant difference on the audio-visual test between the groups in spite of the fact that the early onset group scored very poorly auditorily. However, the early-onset group performed significantly better on the visual test. It was concluded that the visual information constituted the dominant coding strategy for the early onset group. An interpretation chiefly in terms of early onset may be the most appropriate, since dB loss variations as such are not related to speechreading skill. PMID:8976000

  20. Cardiac and pulmonary dose reduction for tangentially irradiated breast cancer, utilizing deep inspiration breath-hold with audio-visual guidance, without compromising target coverage

    International Nuclear Information System (INIS)

    Background and purpose. Cardiac disease and pulmonary complications are documented risk factors in tangential breast irradiation. Respiratory gating radiotherapy provides a possibility to substantially reduce cardiopulmonary doses. This CT planning study quantifies the reduction of radiation doses to the heart and lung, using deep inspiration breath-hold (DIBH). Patients and methods. Seventeen patients with early breast cancer, referred for adjuvant radiotherapy, were included. For each patient two CT scans were acquired; the first during free breathing (FB) and the second during DIBH. The scans were monitored by the Varian RPM respiratory gating system. Audio coaching and visual feedback (audio-visual guidance) were used. The treatment planning of the two CT studies was performed with conformal tangential fields, focusing on good coverage (V95>98%) of the planning target volume (PTV). Dose-volume histograms were calculated and compared. Doses to the heart, left anterior descending (LAD) coronary artery, ipsilateral lung and the contralateral breast were assessed. Results. Compared to FB, the DIBH-plans obtained lower cardiac and pulmonary doses, with equal coverage of PTV. The average mean heart dose was reduced from 3.7 to 1.7 Gy and the number of patients with >5% heart volume receiving 25 Gy or more was reduced from four to one of the 17 patients. With DIBH the heart was completely out of the beam portals for ten patients, with FB this could not be achieved for any of the 17 patients. The average mean dose to the LAD coronary artery was reduced from 18.1 to 6.4 Gy. The average ipsilateral lung volume receiving more than 20 Gy was reduced from 12.2 to 10.0%. Conclusion. Respiratory gating with DIBH, utilizing audio-visual guidance, reduces cardiac and pulmonary doses for tangentially treated left sided breast cancer patients without compromising the target coverage

  1. Audio-Visual Biofeedback Does Not Improve the Reliability of Target Delineation Using Maximum Intensity Projection in 4-Dimensional Computed Tomography Radiation Therapy Planning

    International Nuclear Information System (INIS)

    Purpose: To investigate whether coaching patients' breathing would improve the match between ITVMIP (internal target volume generated by contouring in the maximum intensity projection scan) and ITV10 (generated by combining the gross tumor volumes contoured in 10 phases of a 4-dimensional CT [4DCT] scan). Methods and Materials: Eight patients with a thoracic tumor and 5 patients with an abdominal tumor were included in an institutional review board-approved prospective study. Patients underwent 3 4DCT scans with: (1) free breathing (FB); (2) coaching using audio-visual (AV) biofeedback via the Real-Time Position Management system; and (3) coaching via a spirometer system (Active Breathing Coordinator or ABC). One physician contoured all scans to generate the ITV10 and ITVMIP. The match between ITVMIP and ITV10 was quantitatively assessed with volume ratio, centroid distance, root mean squared distance, and overlap/Dice coefficient. We investigated whether coaching (AV or ABC) or uniform expansions (1, 2, 3, or 5 mm) of ITVMIP improved the match. Results: Although both AV and ABC coaching techniques improved frequency reproducibility and ABC improved displacement regularity, neither improved the match between ITVMIP and ITV10 over FB. On average, ITVMIP underestimated ITV10 by 19%, 19%, and 21%, with centroid distance of 1.9, 2.3, and 1.7 mm and Dice coefficient of 0.87, 0.86, and 0.88 for FB, AV, and ABC, respectively. Separate analyses indicated a better match for lung cancers or tumors not adjacent to high-intensity tissues. Uniform expansions of ITVMIP did not correct for the mismatch between ITVMIP and ITV10. Conclusions: In this pilot study, audio-visual biofeedback did not improve the match between ITVMIP and ITV10. In general, ITVMIP should be limited to lung cancers, and modification of ITVMIP in each phase of the 4DCT data set is recommended

  2. Cardiac and pulmonary dose reduction for tangentially irradiated breast cancer, utilizing deep inspiration breath-hold with audio-visual guidance, without compromising target coverage

    Energy Technology Data Exchange (ETDEWEB)

    Vikstroem, Johan; Hjelstuen, Mari H.B.; Mjaaland, Ingvil; Dybvik, Kjell Ivar (Dept. of Radiotherapy, Stavanger Univ. Hospital, Stavanger (Norway)), e-mail: vijo@sus.no

    2011-01-15

    Background and purpose. Cardiac disease and pulmonary complications are documented risk factors in tangential breast irradiation. Respiratory gating radiotherapy provides a possibility to substantially reduce cardiopulmonary doses. This CT planning study quantifies the reduction of radiation doses to the heart and lung, using deep inspiration breath-hold (DIBH). Patients and methods. Seventeen patients with early breast cancer, referred for adjuvant radiotherapy, were included. For each patient two CT scans were acquired; the first during free breathing (FB) and the second during DIBH. The scans were monitored by the Varian RPM respiratory gating system. Audio coaching and visual feedback (audio-visual guidance) were used. The treatment planning of the two CT studies was performed with conformal tangential fields, focusing on good coverage (V95>98%) of the planning target volume (PTV). Dose-volume histograms were calculated and compared. Doses to the heart, left anterior descending (LAD) coronary artery, ipsilateral lung and the contralateral breast were assessed. Results. Compared to FB, the DIBH-plans obtained lower cardiac and pulmonary doses, with equal coverage of PTV. The average mean heart dose was reduced from 3.7 to 1.7 Gy and the number of patients with >5% heart volume receiving 25 Gy or more was reduced from four to one of the 17 patients. With DIBH the heart was completely out of the beam portals for ten patients, with FB this could not be achieved for any of the 17 patients. The average mean dose to the LAD coronary artery was reduced from 18.1 to 6.4 Gy. The average ipsilateral lung volume receiving more than 20 Gy was reduced from 12.2 to 10.0%. Conclusion. Respiratory gating with DIBH, utilizing audio-visual guidance, reduces cardiac and pulmonary doses for tangentially treated left sided breast cancer patients without compromising the target coverage

  3. 音像制品出版数量变化对图书馆音像资源建设的影响%The influence of publication quantifies of audio-visual products on the consstruction of library audio-visual resources

    Institute of Scientific and Technical Information of China (English)

    宾锋

    2012-01-01

    According to the statistics and analyses of audio-visual products publications from 2005 to 2010, the publication number and annual newly publication number of sound recordings and video products decreased obviously. The amount of CD, VCD and other media carriers reduced, but DVD-A, DVD-V and other new carriers increased. The publication quantities of education, language and other subjects of DVD-A and social sciences, education, comprehensive, music and dance and other subjects raised. Those changes have effects on the construction of library audio-visual resources. On the basis of demand changes, libraries should revise the proettrement rules of au- dio-visual materials in time, they should increase procurement efforts, form emphases and characteristics of collections, optimize the construction of collections, strengthen audio-visual database constructions, expand procurement channels and methods, try to establish a system of audio-visual materials purchase librarian.%对2005年-2010年音像制品出版数量统计分析表明:录音制品和录像制品年出版数量和年新出版数量已呈现明显下降;CD、VCD等载体出版数量下降,DVD-A、DVD-V等新载体出版数量上升;DVD-A中教育、语言等学科以及DVD-V中社会科学、教育、综合、音乐舞蹈等学科的出版量都在增加。这些变化对图书馆音像资源建设产生了影响,建议图书馆应根据需求变化及时修改音像资料采购细则,加大采购力度,形成重点及特色馆藏,优化馆藏建设,加强音像资料数据库建设,拓展采购渠道和采购方式,尝试建立音像资料采访馆员等。

  4. Contour identification with pitch and loudness cues using cochlear implants

    OpenAIRE

    Luo, Xin; Masterson, Megan E.; Wu, Ching-Chih

    2013-01-01

    Different from speech, pitch and loudness cues may or may not co-vary in music. Cochlear implant (CI) users with poor pitch perception may use loudness contour cues more than normal-hearing (NH) listeners. Contour identification was tested in CI users and NH listeners; the five-note contours contained either pitch cues alone, loudness cues alone, or both. Results showed that NH listeners' contour identification was better with pitch cues than with loudness cues; CI users performed similarly w...

  5. Contour identification with pitch and loudness cues using cochlear implants.

    Science.gov (United States)

    Luo, Xin; Masterson, Megan E; Wu, Ching-Chih

    2014-01-01

    Different from speech, pitch and loudness cues may or may not co-vary in music. Cochlear implant (CI) users with poor pitch perception may use loudness contour cues more than normal-hearing (NH) listeners. Contour identification was tested in CI users and NH listeners; the five-note contours contained either pitch cues alone, loudness cues alone, or both. Results showed that NH listeners' contour identification was better with pitch cues than with loudness cues; CI users performed similarly with either cues. When pitch and loudness cues were co-varied, CI performance significantly improved, suggesting that CI users were able to integrate the two cues. PMID:24437857

  6. Audio-Visual Biofeedback Does Not Improve the Reliability of Target Delineation Using Maximum Intensity Projection in 4-Dimensional Computed Tomography Radiation Therapy Planning

    Energy Technology Data Exchange (ETDEWEB)

    Lu, Wei, E-mail: wlu@umm.edu [Department of Radiation Oncology, University of Maryland School of Medicine, Baltimore, Maryland (United States); Neuner, Geoffrey A.; George, Rohini; Wang, Zhendong; Sasor, Sarah [Department of Radiation Oncology, University of Maryland School of Medicine, Baltimore, Maryland (United States); Huang, Xuan [Research and Development, Care Management Department, Johns Hopkins HealthCare LLC, Glen Burnie, Maryland (United States); Regine, William F.; Feigenberg, Steven J.; D' Souza, Warren D. [Department of Radiation Oncology, University of Maryland School of Medicine, Baltimore, Maryland (United States)

    2014-01-01

    Purpose: To investigate whether coaching patients' breathing would improve the match between ITV{sub MIP} (internal target volume generated by contouring in the maximum intensity projection scan) and ITV{sub 10} (generated by combining the gross tumor volumes contoured in 10 phases of a 4-dimensional CT [4DCT] scan). Methods and Materials: Eight patients with a thoracic tumor and 5 patients with an abdominal tumor were included in an institutional review board-approved prospective study. Patients underwent 3 4DCT scans with: (1) free breathing (FB); (2) coaching using audio-visual (AV) biofeedback via the Real-Time Position Management system; and (3) coaching via a spirometer system (Active Breathing Coordinator or ABC). One physician contoured all scans to generate the ITV{sub 10} and ITV{sub MIP}. The match between ITV{sub MIP} and ITV{sub 10} was quantitatively assessed with volume ratio, centroid distance, root mean squared distance, and overlap/Dice coefficient. We investigated whether coaching (AV or ABC) or uniform expansions (1, 2, 3, or 5 mm) of ITV{sub MIP} improved the match. Results: Although both AV and ABC coaching techniques improved frequency reproducibility and ABC improved displacement regularity, neither improved the match between ITV{sub MIP} and ITV{sub 10} over FB. On average, ITV{sub MIP} underestimated ITV{sub 10} by 19%, 19%, and 21%, with centroid distance of 1.9, 2.3, and 1.7 mm and Dice coefficient of 0.87, 0.86, and 0.88 for FB, AV, and ABC, respectively. Separate analyses indicated a better match for lung cancers or tumors not adjacent to high-intensity tissues. Uniform expansions of ITV{sub MIP} did not correct for the mismatch between ITV{sub MIP} and ITV{sub 10}. Conclusions: In this pilot study, audio-visual biofeedback did not improve the match between ITV{sub MIP} and ITV{sub 10}. In general, ITV{sub MIP} should be limited to lung cancers, and modification of ITV{sub MIP} in each phase of the 4DCT data set is recommended.

  7. The challenge of reducing scientific complexity for different target groups (without losing the essence) - experiences from interdisciplinary audio-visual media production

    Science.gov (United States)

    Hezel, Bernd; Broschkowski, Ephraim; Kropp, Jürgen

    2013-04-01

    The Climate Media Factory originates from an interdisciplinary media lab run by the Film and Television University "Konrad Wolf" Potsdam-Babelsberg (HFF) and the Potsdam Institute for Climate Impact Research (PIK). Climate scientists, authors, producers and media scholars work together to develop media products on climate change and sustainability. We strive towards communicating scientific content via different media platforms reconciling the communication needs of scientists and the audience's need to understand the complexity of topics that are relevant in their everyday life. By presenting four audio-visual examples, that have been designed for very different target groups, we show (i) the interdisciplinary challenges during the production process and the lessons learnt and (ii) possibilities to reach the required degree of simplification without the need for dumbing down the content. "We know enough about climate change" is a short animated film that was produced for the German Agency for International Cooperation (GIZ) for training programs and conferences on adaptation in the target countries including Indonesia, Tunisia and Mexico. "Earthbook" is a short animation produced for "The Year of Science" to raise awareness for the topics of sustainability among digital natives. "What is Climate Engineering?". Produced for the Institute for Advanced Sustainability Studies (IASS) the film is meant for an informed and interested public. "Wimmelwelt Energie!" is a prototype of an iPad application for children from 4-6 years of age to help them learn about different forms of energy and related greenhouse gas emissions.

  8. 运用电教手段优化竞技健美操专业教学%Improvement of Sports Aerobics Teaching by Electrical Audio-visual Aids

    Institute of Scientific and Technical Information of China (English)

    赵静

    2011-01-01

    This paper discusses the better effects on teaching methods, teaching course, content of courses, teaching purpose and teaching results by education with electrical audio-visual aids in Sports Aerobics teaching, It provides the basis to the use of electrical audio-visual aids in Sports Aerobics teaching.%文章主要针对在竞技健美操专业课教学中运用电教手段,以达到优化教学方法,优化教学过程、优化教学内容、优化教学目的及优化教学效果等进行阐述,旨在为竞技健美操专业教学过程中合理运用电教手段提供科学依据.

  9. Learning Words' Sounds before Learning How Words Sound: 9-Month-Olds Use Distinct Objects as Cues to Categorize Speech Information

    Science.gov (United States)

    Yeung, H. Henny; Werker, Janet F.

    2009-01-01

    One of the central themes in the study of language acquisition is the gap between the linguistic knowledge that learners demonstrate, and the apparent inadequacy of linguistic input to support induction of this knowledge. One of the first linguistic abilities in the course of development to exemplify this problem is in speech perception:…

  10. 巧用电教媒体开拓课改的新渠道%Using Audio-Visual Media to Open Up New Channels for Curriculum Reform

    Institute of Scientific and Technical Information of China (English)

    冯力

    2011-01-01

    This paper mainly from four aspects,to describe audio-visual media in the role of the new curriculum reform:First,create situations,stimulated interest in luring read;Second,the introduction of immigrants,familiar with fine thinking;Third,by virtue of situations,learn to accumulate;Fourth,the use of scenarios,stimulated interest,said guide;clever use of the network and optimize the combination of multimedia,the creation of scenarios;coaching teachers to allow students to read,Ziwu,self-training, since that;the use of information technology to develop students'self-channels for the new curriculum reform to open up new channels.%全文主要从四个方面,来论述电教媒体在新课程改革中的作用:一、创设情境,激趣诱读;二、引人入境。熟读精思;三、凭借情境,学会积累;四、运用情景,激趣导说;巧妙地运用网络及多媒体优化组合,创设情景;在教师的点拨下,让学生自读、自悟、自练、自说;利用信息技术手段开拓学生自学的渠道,为新课程改革开拓新渠道。

  11. La regulación audiovisual: argumentos a favor y en contra The audio-visual regulation: the arguments for and against

    Directory of Open Access Journals (Sweden)

    Jordi Sopena Palomar

    2008-03-01

    Full Text Available El artículo analiza la efectividad de la regulación audiovisual y valora los diversos argumentos a favor y en contra de la existencia de consejos reguladores a nivel estatal. El debate sobre la necesidad de un organismo de este calado en España todavía persiste. La mayoría de los países comunitarios se han dotado de consejos competentes en esta materia, como es el caso del OFCOM en el Reino Unido o el CSA en Francia. En España, la regulación audiovisual se limita a organismos de alcance autonómico, como son el Consejo Audiovisual de Navarra, el de Andalucía y el Consell de l’Audiovisual de Catalunya (CAC, cuyo modelo también es abordado en este artículo. The article analyzes the effectiveness of the audio-visual regulation and assesses the different arguments for and against the existence of the broadcasting authorities at the state level. The debate of the necessity of a Spanish organism of regulation is still active. Most of the European countries have created some competent authorities, like the OFCOM in United Kingdom and the CSA in France. In Spain, the broadcasting regulation is developed by regional organisms, like the Consejo Audiovisual de Navarra, the Consejo Audiovisual de Andalucía and the Consell de l’Audiovisual de Catalunya (CAC, whose case is also studied in this article.

  12. Audio-visual perception of 3D cinematography: an fMRI study using condition-based and computation-based analyses.

    Science.gov (United States)

    Ogawa, Akitoshi; Bordier, Cecile; Macaluso, Emiliano

    2013-01-01

    The use of naturalistic stimuli to probe sensory functions in the human brain is gaining increasing interest. Previous imaging studies examined brain activity associated with the processing of cinematographic material using both standard "condition-based" designs, as well as "computational" methods based on the extraction of time-varying features of the stimuli (e.g. motion). Here, we exploited both approaches to investigate the neural correlates of complex visual and auditory spatial signals in cinematography. In the first experiment, the participants watched a piece of a commercial movie presented in four blocked conditions: 3D vision with surround sounds (3D-Surround), 3D with monaural sound (3D-Mono), 2D-Surround, and 2D-Mono. In the second experiment, they watched two different segments of the movie both presented continuously in 3D-Surround. The blocked presentation served for standard condition-based analyses, while all datasets were submitted to computation-based analyses. The latter assessed where activity co-varied with visual disparity signals and the complexity of auditory multi-sources signals. The blocked analyses associated 3D viewing with the activation of the dorsal and lateral occipital cortex and superior parietal lobule, while the surround sounds activated the superior and middle temporal gyri (S/MTG). The computation-based analyses revealed the effects of absolute disparity in dorsal occipital and posterior parietal cortices and of disparity gradients in the posterior middle temporal gyrus plus the inferior frontal gyrus. The complexity of the surround sounds was associated with activity in specific sub-regions of S/MTG, even after accounting for changes of sound intensity. These results demonstrate that the processing of naturalistic audio-visual signals entails an extensive set of visual and auditory areas, and that computation-based analyses can track the contribution of complex spatial aspects characterizing such life-like stimuli. PMID

  13. Audio-visual perception of 3D cinematography: an fMRI study using condition-based and computation-based analyses.

    Directory of Open Access Journals (Sweden)

    Akitoshi Ogawa

    Full Text Available The use of naturalistic stimuli to probe sensory functions in the human brain is gaining increasing interest. Previous imaging studies examined brain activity associated with the processing of cinematographic material using both standard "condition-based" designs, as well as "computational" methods based on the extraction of time-varying features of the stimuli (e.g. motion. Here, we exploited both approaches to investigate the neural correlates of complex visual and auditory spatial signals in cinematography. In the first experiment, the participants watched a piece of a commercial movie presented in four blocked conditions: 3D vision with surround sounds (3D-Surround, 3D with monaural sound (3D-Mono, 2D-Surround, and 2D-Mono. In the second experiment, they watched two different segments of the movie both presented continuously in 3D-Surround. The blocked presentation served for standard condition-based analyses, while all datasets were submitted to computation-based analyses. The latter assessed where activity co-varied with visual disparity signals and the complexity of auditory multi-sources signals. The blocked analyses associated 3D viewing with the activation of the dorsal and lateral occipital cortex and superior parietal lobule, while the surround sounds activated the superior and middle temporal gyri (S/MTG. The computation-based analyses revealed the effects of absolute disparity in dorsal occipital and posterior parietal cortices and of disparity gradients in the posterior middle temporal gyrus plus the inferior frontal gyrus. The complexity of the surround sounds was associated with activity in specific sub-regions of S/MTG, even after accounting for changes of sound intensity. These results demonstrate that the processing of naturalistic audio-visual signals entails an extensive set of visual and auditory areas, and that computation-based analyses can track the contribution of complex spatial aspects characterizing such life

  14. The Galker test of speech reception in noise

    DEFF Research Database (Denmark)

    Lauritsen, Maj-Britt Glenn; Söderström, Margareta; Kreiner, Svend;

    2016-01-01

    PURPOSE: We tested "the Galker test", a speech reception in noise test developed for primary care for Danish preschool children, to explore if the children's ability to hear and understand speech was associated with gender, age, middle ear status, and the level of background noise. METHODS......: The Galker test is a 35-item audio-visual, computerized word discrimination test in background noise. Included were 370 normally developed children attending day care center. The children were examined with the Galker test, tympanometry, audiometry, and the Reynell test of verbal comprehension. Parents...... to Reynell test scores (Gamma (G)=0.35), the children's age group (G=0.33), and the day care teachers' assessment of the children's vocabulary (G=0.26). CONCLUSIONS: The Galker test of speech reception in noise appears promising as an easy and quick tool for evaluating preschool children's understanding...

  15. The effect of audibility, signal-to-noise ratio, and temporal speech cues on the benefit from fast-acting compression in modulated noise.

    Science.gov (United States)

    Olsen, Henrik L; Olofsson, Ake; Hagerman, Björn

    2005-07-01

    The objective of the experiment was to investigate three aspects that might contribute to the benefit of fast-acting compression seen in normal-hearing listeners. Six normal-hearing listeners were tested with speech recognition in a fully modulated noise (FUM) either through a fast-acting compressor or through linear amplification. In the first experiment, three different presentation levels of the FUM noise (15, 30, and 45 dB SL) were tested. The second experiment manipulated the control signal of the compressor independently of the audio input signal at four signal-to-noise ratios (-15, 10, -5, and 0 dB). A signal correlated noise version of the speech signal was tested in the third experiment at three speech-to-noise ratios (-20, -15 and -10 dB). Results showed that performance was better with compression than with linear amplification through all of the tested conditions at least when the signal-to-noise ratio was negative. The results suggest that other aspects of the hearing impairment than those simulated here are involved in the degraded performance seen for some hearing-impaired listeners with fast-acting compression. PMID:16136792

  16. Acoustic cues for emotions in vocal expression and music

    OpenAIRE

    Erixon, Pauline

    2015-01-01

    Previous research shows that emotional expressions in speech and music use similar patterns of acoustic cues to communicate discrete emotions. The aim of the present study was to experimentally test if manipulation of the acoustic cues; F0, F0 variability, loudness, loudness variability and speech rate/tempo, affects the identification of discrete emotions in speech and music. Forty recordings of actors and musicians expressing anger, fear, happiness, sadness and tenderness were manipulated t...

  17. Learned audio-visual cross-modal associations in observed piano playing activate the left planum temporale. An fMRI study.

    Science.gov (United States)

    Hasegawa, Takehiro; Matsuki, Ken-Ichi; Ueno, Takashi; Maeda, Yasuhiro; Matsue, Yoshihiko; Konishi, Yukuo; Sadato, Norihiro

    2004-08-01

    Lip reading is known to activate the planum temporale (PT), a brain region which may integrate visual and auditory information. To find out whether other types of learned audio-visual integration occur in the PT, we investigated "key-touch reading" using functional magnetic resonance imaging (fMRI). As well-trained pianists are able to identify pieces of music by watching the key-touching movements of the hands, we hypothesised that the visual information of observed sequential finger movements is transformed into the auditory modality during "key-touch reading" as is the case during lip reading. We therefore predicted activation of the PT during key-touch reading. Twenty-six healthy right-handed volunteers were recruited for fMRI. Of these, 7 subjects had never experienced piano training (naïve group), 10 had a little experience of piano playing (less trained group), and the remaining 9 had been trained for more than 8 years (well trained group). During task periods, subjects were required to view the bimanual hand movements of a piano player making key presses. During control periods, subjects viewed the same hands sliding from side to side without tapping movements of the fingers. No sound was provided. Sequences of key presses during task periods consisted of pieces of familiar music, unfamiliar music, or random sequences. Well-trained subjects were able to identify the familiar music, whereas less-trained subjects were not. The left PT of the well-trained subjects was equally activated by observation of familiar music, unfamiliar music, and random sequences. The naïve and less trained groups did not show activation of the left PT during any of the tasks. These results suggest that PT activation reflects a learned process. As the activation was elicited by viewing key pressing actions regardless of whether they constituted a piece of music, the PT may be involved in processes that occur prior to the identification of a piece of music, that is, mapping the

  18. Respiratory motion management using audio-visual biofeedback for respiratory-gated radiotherapy of synchrotron-based pulsed heavy-ion beam delivery

    International Nuclear Information System (INIS)

    Purpose: To efficiently deliver respiratory-gated radiation during synchrotron-based pulsed heavy-ion radiotherapy, a novel respiratory guidance method combining a personalized audio-visual biofeedback (BFB) system, breath hold (BH), and synchrotron-based gating was designed to help patients synchronize their respiratory patterns with synchrotron pulses and to overcome typical limitations such as low efficiency, residual motion, and discomfort. Methods: In-house software was developed to acquire body surface marker positions and display BFB, gating signals, and real-time beam profiles on a LED screen. Patients were prompted to perform short BHs or short deep breath holds (SDBH) with the aid of BFB following a personalized standard BH/SDBH (stBH/stSDBH) guiding curve or their own representative BH/SDBH (reBH/reSDBH) guiding curve. A practical simulation was performed for a group of 15 volunteers to evaluate the feasibility and effectiveness of this method. Effective dose rates (EDRs), mean absolute errors between the guiding curves and the measured curves, and mean absolute deviations of the measured curves were obtained within 10%–50% duty cycles (DCs) that were synchronized with the synchrotron’s flat-top phase. Results: All maneuvers for an individual volunteer took approximately half an hour, and no one experienced discomfort during the maneuvers. Using the respiratory guidance methods, the magnitude of residual motion was almost ten times less than during nongated irradiation, and increases in the average effective dose rate by factors of 2.39–4.65, 2.39–4.59, 1.73–3.50, and 1.73–3.55 for the stBH, reBH, stSDBH, and reSDBH guiding maneuvers, respectively, were observed in contrast with conventional free breathing-based gated irradiation, depending on the respiratory-gated duty cycle settings. Conclusions: The proposed respiratory guidance method with personalized BFB was confirmed to be feasible in a group of volunteers. Increased effective dose

  19. Respiratory motion management using audio-visual biofeedback for respiratory-gated radiotherapy of synchrotron-based pulsed heavy-ion beam delivery

    Energy Technology Data Exchange (ETDEWEB)

    He, Pengbo; Ma, Yuanyuan; Huang, Qiyan; Yan, Yuanlin [Institute of Modern Physics, Chinese Academy of Sciences, Lanzhou 730000 (China); Key Laboratory of Heavy Ion Radiation Biology and Medicine of Chinese Academy of Sciences, Lanzhou 730000 (China); School of Life Sciences, University of Chinese Academy of Sciences, Beijing 100049 (China); Li, Qiang, E-mail: liqiang@impcas.ac.cn; Liu, Xinguo; Dai, Zhongying; Zhao, Ting; Fu, Tingyan; Shen, Guosheng [Institute of Modern Physics, Chinese Academy of Sciences, Lanzhou 730000 (China); Key Laboratory of Heavy Ion Radiation Biology and Medicine of Chinese Academy of Sciences, Lanzhou 730000 (China)

    2014-11-01

    Purpose: To efficiently deliver respiratory-gated radiation during synchrotron-based pulsed heavy-ion radiotherapy, a novel respiratory guidance method combining a personalized audio-visual biofeedback (BFB) system, breath hold (BH), and synchrotron-based gating was designed to help patients synchronize their respiratory patterns with synchrotron pulses and to overcome typical limitations such as low efficiency, residual motion, and discomfort. Methods: In-house software was developed to acquire body surface marker positions and display BFB, gating signals, and real-time beam profiles on a LED screen. Patients were prompted to perform short BHs or short deep breath holds (SDBH) with the aid of BFB following a personalized standard BH/SDBH (stBH/stSDBH) guiding curve or their own representative BH/SDBH (reBH/reSDBH) guiding curve. A practical simulation was performed for a group of 15 volunteers to evaluate the feasibility and effectiveness of this method. Effective dose rates (EDRs), mean absolute errors between the guiding curves and the measured curves, and mean absolute deviations of the measured curves were obtained within 10%–50% duty cycles (DCs) that were synchronized with the synchrotron’s flat-top phase. Results: All maneuvers for an individual volunteer took approximately half an hour, and no one experienced discomfort during the maneuvers. Using the respiratory guidance methods, the magnitude of residual motion was almost ten times less than during nongated irradiation, and increases in the average effective dose rate by factors of 2.39–4.65, 2.39–4.59, 1.73–3.50, and 1.73–3.55 for the stBH, reBH, stSDBH, and reSDBH guiding maneuvers, respectively, were observed in contrast with conventional free breathing-based gated irradiation, depending on the respiratory-gated duty cycle settings. Conclusions: The proposed respiratory guidance method with personalized BFB was confirmed to be feasible in a group of volunteers. Increased effective dose

  20. Colliding Cues in Word Segmentation: The Role of Cue Strength and General Cognitive Processes

    Science.gov (United States)

    Weiss, Daniel J.; Gerfen, Chip; Mitchel, Aaron D.

    2010-01-01

    The process of word segmentation is flexible, with many strategies potentially available to learners. This experiment explores how segmentation cues interact, and whether successful resolution of cue competition is related to general executive functioning. Participants listened to artificial speech streams that contained both statistical and…

  1. Implementing Speech Supplementation Strategies: Effects on Intelligibility and Speech Rate of Individuals with Chronic Severe Dysarthria.

    Science.gov (United States)

    Hustad, Katherine C.; Jones, Tabitha; Dailey, Suzanne

    2003-01-01

    A study compared intelligibility and speech rate differences following speaker implementation of 3 strategies (topic, alphabet, and combined topic and alphabet supplementation) and a habitual speech control condition for 5 speakers with severe dysarthria. Combined cues and alphabet cues yielded significantly higher intelligibility scores and…

  2. The Application of Audio-visual Media in Junior High School English Teaching%关于初中英语教学中电教手段的应用

    Institute of Scientific and Technical Information of China (English)

    江介香

    2012-01-01

    In junior high school English teaching, applying audio-visual media can motivate students' English learning interest. As a teaching aid, the application of audio-visual media in English classroom is the supplement and development of English classroom teaching, and it is helpful to the improvement of classroom teaching effect and it is of important meaning in cultivating students' comprehensive applying ability.%在初中英语教学中,运用电教手段可以激发学生英语学习的兴趣。作为一种辅助教学手段,电教手段运用于英语课堂中,是对英语课堂教学的补充和发展,有利于提高整体课堂教学效率,对于培养学生英语综合应用能力有着十分重要的意义。

  3. 以情境构建为主线的高职英语视听说课堂教学模式探究%Higher vocational English audio-visual classroom teaching mode, situation building as the main line

    Institute of Scientific and Technical Information of China (English)

    张蕾; 肖建云

    2015-01-01

    This topic research takes constructivism and situated cognition theory as the basis, through the teaching practice, and connecting with the interview and questionnaire survey from the teaching content, teaching process, to achieve the goals of teaching from three aspects discussed in the higher vocational English audio-visual classroom teaching, make the situation to build through visual, listening, said trinity, the skill training of higher vocational English audio-visual classroom teaching to achieve the desired goal, and effectively improve students' English communication ability.%本课题研究以建构主义情境论和情境认知论为依据,通过教学实践,并结合访谈及问卷调查从教学内容,教学实施过程,教学目标的实现三个方面探讨了在高职英语视听说课堂教学中,使情境构建贯穿于视,听,说三位一体的技能训练中,从而达到高职英语视听说课堂教学的预期目标,并使学生的英语交际能力切实地得到提高。

  4. 网络资源辅助高职英语视听说教学的应用研究%Application and Research of Network Resources for the English Audio-visual Course Auxiliary Teaching in Higher Vocational Colleges

    Institute of Scientific and Technical Information of China (English)

    孙敏

    2016-01-01

    By exploring auxiliary teaching for English audio-visual course in higher vocational education under the applica-tion of network resources, reforming and improving the teaching mode, with the organic combination of network resources and teaching process, with the breakthrough in teaching activity space and time limit, laying emphasis on abilities training for stu-dents' autonomous learning, cooperative learning and inquiry-based learning, paying attention to the coordinated development of students' English audio-visual skills, we definitely get the improvements on the teaching quality and efficiency.%探索应用网络资源辅助高职英语视听说教学,网络资源与教学过程有机融合,改善教学方式,突破教学活动时空限制,提升教学效率与质量,注重培养学生学习兴趣和自主学习能力,注重学生英语视听说技能的协调发展。

  5. Moving-Talker, Speaker-Independent Feature Study, and Baseline Results Using the CUAVE Multimodal Speech Corpus

    Directory of Open Access Journals (Sweden)

    Patterson Eric K

    2002-01-01

    Full Text Available Strides in computer technology and the search for deeper, more powerful techniques in signal processing have brought multimodal research to the forefront in recent years. Audio-visual speech processing has become an important part of this research because it holds great potential for overcoming certain problems of traditional audio-only methods. Difficulties, due to background noise and multiple speakers in an application environment, are significantly reduced by the additional information provided by visual features. This paper presents information on a new audio-visual database, a feature study on moving speakers, and on baseline results for the whole speaker group. Although a few databases have been collected in this area, none has emerged as a standard for comparison. Also, efforts to date have often been limited, focusing on cropped video or stationary speakers. This paper seeks to introduce a challenging audio-visual database that is flexible and fairly comprehensive, yet easily available to researchers on one DVD. The Clemson University Audio-Visual Experiments (CUAVE database is a speaker-independent corpus of both connected and continuous digit strings totaling over 7000 utterances. It contains a wide variety of speakers and is designed to meet several goals discussed in this paper. One of these goals is to allow testing of adverse conditions such as moving talkers and speaker pairs. A feature study of connected digit strings is also discussed. It compares stationary and moving talkers in a speaker-independent grouping. An image-processing-based contour technique, an image transform method, and a deformable template scheme are used in this comparison to obtain visual features. This paper also presents methods and results in an attempt to make these techniques more robust to speaker movement. Finally, initial baseline speaker-independent results are included using all speakers, and conclusions as well as suggested areas of research are

  6. New developments in speech pattern element hearing aids for the profoundly deaf.

    Science.gov (United States)

    Faulkner, A; Walliker, J R; Howard, I S; Ball, V; Fourcin, A J

    1993-01-01

    Two new developments in speech pattern processing hearing aids will be described. The first development is the use of compound speech pattern coding. Speech information which is invisible to the lipreader was encoded in terms of three acoustic speech factors; the voice fundamental frequency pattern, coded as a sinusoid, the presence of aperiodic excitation, coded as a low-frequency noise, and the wide-band amplitude envelope, coded by amplitude modulation of the sinusoid and noise signals. Each element of the compound stimulus was individually matched in frequency and intensity to the listener's receptive range. Audio-visual speech receptive assessments in five profoundly hearing-impaired listeners were performed to examine the contributions of adding voiceless and amplitude information to the voice fundamental frequency pattern, and to compare these codings to amplified speech. In both consonant recognition and connected discourse tracking (CDT), all five subjects showed an advantage from the addition of amplitude information to the fundamental frequency pattern. In consonant identification, all five subjects showed further improvements in performance when voiceless speech excitation was additionally encoded together with amplitude information, but this effect was not found in CDT. The addition of voiceless information to voice fundamental frequency information did not improve performance in the absence of amplitude information. Three of the subjects performed significantly better in at least one of the compound speech pattern conditions than with amplified speech, while the other two performed similarly with amplified speech and the best compound speech pattern condition. The three speech pattern elements encoded here may represent a near-optimal basis for an acoustic aid to lipreading for this group of listeners. The second development is the use of a trained multi-layer-perceptron (MLP) pattern classification algorithm as the basis for a robust real-time voice

  7. Constrastive study of audio-visual prosody of social affects in Mandarin Chinese vs.French : an application for foreign or second language learning

    OpenAIRE

    Lu, Yan

    2015-01-01

    In human face-to-face interaction, social affects should be distinguished from emotional expressions, triggered by innate and involuntary controls of the speaker, by their nature of voluntary controls expressed within the audiovisual prosody and by their important role in the realization of speech acts. They also put into circulation between the interlocutors the social context and social relationship information. The prosody is a main vector of social affects and its cross-language variabili...

  8. 鼻咽癌放疗患者康复视听教材的制作与应用%Development and application of audio-visual materials in radiotherapy patients with nasopharyngeal carcinoma

    Institute of Scientific and Technical Information of China (English)

    潘海卿; 席淑新; 吴沛霞; 叶向红; 王苏丹

    2015-01-01

    Objective To develop audio-visual materials and confirm their effect in patients with nasopharyngeal carcinoma receiving radiotherapy. Methods Audio-visual development were produced based on relevant literature, professional demonstration, and digital video. A total of 84 patients with nasopharyngeal carcinoma were selected from Jinhua Hospital of Zhejiang University and divided into control group (n=42) and intervention group (n=42) according to admission time. The patients of control group received usual one-to-one healthcare education. The patients of intervention group received audio-visual materials systematically for imitation excise with professional guidance. The compliance of rehabilitation exercise and patients′ satisfaction on nursing service were compared between two groups. Results The scores of compliance in the intervention group at 1 month after discharge and at 3 months after discharge ware higher than those of the control group (P<0. 05). There was significant difference between the intervention group and the control group in patients′satisfaction (P<0. 01). Conclusions Self-made audio-visual materials are intuitionistic, iconic, straightaway, imitable, receivable, and they can improve patients′ compliance on rehabilitation exercise and satisfaction on nursing service effectively.%目的:探讨康复视听教材的制作及其在鼻咽癌放疗患者中的应用效果。方法参考相关文献编写康复方案,经专人演示康复锻炼的动作和数码录像制作视听教材DV。选择在浙江大学金华医院放疗科行根治性放疗的84例鼻咽癌患者,按住院时间先后分为对照组和干预组各42例。对照组采用传统一对一口头健康宣教方法,干预组采用康复锻炼视听教材光盘的系统播放并指导患者模仿锻炼方法进行健康宣教,比较两种健康教育方法实施后两组患者康复锻炼依从性情况和患者对护理服务满意度。结果干预组出院后1,3个月依

  9. Semantic Indexing of Multimedia Content Using Visual, Audio, and Text Cues

    Directory of Open Access Journals (Sweden)

    W. H. Adams

    2003-02-01

    Full Text Available We present a learning-based approach to the semantic indexing of multimedia content using cues derived from audio, visual, and text features. We approach the problem by developing a set of statistical models for a predefined lexicon. Novel concepts are then mapped in terms of the concepts in the lexicon. To achieve robust detection of concepts, we exploit features from multiple modalities, namely, audio, video, and text. Concept representations are modeled using Gaussian mixture models (GMM, hidden Markov models (HMM, and support vector machines (SVM. Models such as Bayesian networks and SVMs are used in a late-fusion approach to model concepts that are not explicitly modeled in terms of features. Our experiments indicate promise in the proposed classification and fusion methodologies: our proposed fusion scheme achieves more than 10% relative improvement over the best unimodal concept detector.

  10. Automatic discrimination between laughter and speech

    NARCIS (Netherlands)

    Truong, K.; Leeuwen, D. van

    2007-01-01

    Emotions can be recognized by audible paralinguistic cues in speech. By detecting these paralinguistic cues that can consist of laughter, a trembling voice, coughs, changes in the intonation contour etc., information about the speaker’s state and emotion can be revealed. This paper describes the dev

  11. Temporal visual cues aid speech recognition

    DEFF Research Database (Denmark)

    Zhou, Xiang; Ross, Lars; Lehn-Schiøler, Tue;

    2006-01-01

    temporal synchronicity of the visual input that aids parsing of the auditory stream. More specifically, we expected that purely temporal information, which does not convey information such as place of articulation may facility word recognition. METHODS: To test this prediction we used temporal features of......BACKGROUND: It is well known that under noisy conditions, viewing a speaker's articulatory movement aids the recognition of spoken words. Conventionally it is thought that the visual input disambiguates otherwise confusing auditory input. HYPOTHESIS: In contrast we hypothesize that it is the...... audio to generate an artificial talking-face video and measured word recognition performance on simple monosyllabic words. RESULTS: When presenting words together with the artificial video we find that word recognition is improved over purely auditory presentation. The effect is significant (p...

  12. Application of audio visual education in health education for elderly patients with chronic hepatitis B%电化教育在老年慢性乙型肝炎病人健康教育中的应用

    Institute of Scientific and Technical Information of China (English)

    杨茜; 李雨昕; 黄艳芳; 陈燕华

    2016-01-01

    [目的]评价电化教育在老年慢性乙型肝炎病人健康教育中的应用效果。[方法]将112例老年慢性乙型肝炎病人按随机数字表法分为观察组和对照组,每组56例,对照组采取常规口头健康教育,观察组在此基础上开展电化教育,健康教育后分别对两组病人疾病知识掌握程度、肝功能及生活质量进行比较。[结果]健康教育后,两观察组病人疾病知识掌握程度高于对照组、肝功能优于对照组、生活质量评分显著高于对照组,经比较差异均有统计学意义(P <0.01或 P <0.05)。[结论]采用电化教育方式对老年乙型肝炎病人实施健康教育,能使病人主动参与学习,提高病人对疾病知识的掌握程度,提高病人治疗依从性,改善肝功能,提高老年乙型肝炎病人的生活质量。%Abstracct Objective:To evaluate the application effect of audio visual education in health education for elderly patients with chronic hepatitis B.Methods:A total of 1 12 cases of elderly patients with chronic hepatitis B were randomly divided into observation group and control group based on random number table,56 cases in each.The patients in control group were given routine oral health education,and the patients in observation group received audio visual education and routine oral health education.After the health education,the mastering of disease knowledge,liver function and quality of life were compared between both groups.Results:After health educa-tion,the mastering of disease knowledge in observation group was higher than that in control group,liver func-tion was better than that in control group,and the quality of life score was significantly higher than that in con-trol group.The differences were statistically significant (P <0.01 or P <0.05 ).Conclusion:Implementing the health education with audio visual education for elderly patients with hepatitis B could make the patients ac-tively participate in learning

  13. Promoting smoke-free homes: a novel behavioral intervention using real-time audio-visual feedback on airborne particle levels.

    Directory of Open Access Journals (Sweden)

    Neil E Klepeis

    Full Text Available Interventions are needed to protect the health of children who live with smokers. We pilot-tested a real-time intervention for promoting behavior change in homes that reduces second hand tobacco smoke (SHS levels. The intervention uses a monitor and feedback system to provide immediate auditory and visual signals triggered at defined thresholds of fine particle concentration. Dynamic graphs of real-time particle levels are also shown on a computer screen. We experimentally evaluated the system, field-tested it in homes with smokers, and conducted focus groups to obtain general opinions. Laboratory tests of the monitor demonstrated SHS sensitivity, stability, precision equivalent to at least 1 µg/m(3, and low noise. A linear relationship (R(2 = 0.98 was observed between the monitor and average SHS mass concentrations up to 150 µg/m(3. Focus groups and interviews with intervention participants showed in-home use to be acceptable and feasible. The intervention was evaluated in 3 homes with combined baseline and intervention periods lasting 9 to 15 full days. Two families modified their behavior by opening windows or doors, smoking outdoors, or smoking less. We observed evidence of lower SHS levels in these homes. The remaining household voiced reluctance to changing their smoking activity and did not exhibit lower SHS levels in main smoking areas or clear behavior change; however, family members expressed receptivity to smoking outdoors. This study established the feasibility of the real-time intervention, laying the groundwork for controlled trials with larger sample sizes. Visual and auditory cues may prompt family members to take immediate action to reduce SHS levels. Dynamic graphs of SHS levels may help families make decisions about specific mitigation approaches.

  14. Speech Problems

    Science.gov (United States)

    ... your treatment plan may include seeing a speech therapist , a person who is trained to treat speech disorders. How often you have to see the speech therapist will vary — you'll probably start out seeing ...

  15. Visual Cues, Verbal Cues and Child Development

    Science.gov (United States)

    Valentini, Nadia

    2004-01-01

    In this article, the author discusses two strategies--visual cues (modeling) and verbal cues (short, accurate phrases) which are related to teaching motor skills in maximizing learning in physical education classes. Both visual and verbal cues are strong influences in facilitating and promoting day-to-day learning. Both strategies reinforce…

  16. Perception and the temporal properties of speech

    Science.gov (United States)

    Gordon, Peter C.

    1991-11-01

    Four experiments addressing the role of attention in phonetic perception are reported. The first experiment shows that the relative importance of two cues to the voicing distinction changes when subjects must perform an arithmetic distractor task at the same time as identifying a speech stimulus. The voice onset time cue loses phonetic significance when subjects are distracted, while the F0 onset frequency cue does not. The second experiment shows a similar pattern for two cues to the distinction between the vowels /i/ (as in 'beat') and /I/ (as in 'bit'). Together these experiments indicate that careful attention to speech perception is necessary for strong acoustic cues to achieve their full phonetic impact, while weaker acoustic cues achieve their full phonetic impact without close attention. Experiment 3 shows that this pattern is obtained when the distractor task places little demand on verbal short term memory. Experiment 4 provides a large data set for testing formal models of the role of attention in speech perception. Attention is shown to influence the signal to noise ratio in phonetic encoding. This principle is instantiated in a network model in which the role of attention is to reduce noise in the phonetic encoding of acoustic cues. Implications of this work for understanding speech perception and general theories of the role of attention in perception are discussed.

  17. Speech Development

    Science.gov (United States)

    ... Spotlight Fundraising Ideas Vehicle Donation Volunteer Efforts Speech Development skip to submenu Parents & Individuals Information for Parents & Individuals Speech Development To download the PDF version of this factsheet, ...

  18. Exploring the Role of Brain Oscillations in Speech Perception in Noise: Intelligibility of Isochronously Retimed Speech

    Science.gov (United States)

    Aubanel, Vincent; Davis, Chris; Kim, Jeesun

    2016-01-01

    A growing body of evidence shows that brain oscillations track speech. This mechanism is thought to maximize processing efficiency by allocating resources to important speech information, effectively parsing speech into units of appropriate granularity for further decoding. However, some aspects of this mechanism remain unclear. First, while periodicity is an intrinsic property of this physiological mechanism, speech is only quasi-periodic, so it is not clear whether periodicity would present an advantage in processing. Second, it is still a matter of debate which aspect of speech triggers or maintains cortical entrainment, from bottom-up cues such as fluctuations of the amplitude envelope of speech to higher level linguistic cues such as syntactic structure. We present data from a behavioral experiment assessing the effect of isochronous retiming of speech on speech perception in noise. Two types of anchor points were defined for retiming speech, namely syllable onsets and amplitude envelope peaks. For each anchor point type, retiming was implemented at two hierarchical levels, a slow time scale around 2.5 Hz and a fast time scale around 4 Hz. Results show that while any temporal distortion resulted in reduced speech intelligibility, isochronous speech anchored to P-centers (approximated by stressed syllable vowel onsets) was significantly more intelligible than a matched anisochronous retiming, suggesting a facilitative role of periodicity defined on linguistically motivated units in processing speech in noise.

  19. Alfasecuencialización: la enseñanza del cine en la era del audiovisual Sequential literacy: the teaching of cinema in the age of audio-visual speech

    OpenAIRE

    José Antonio Palao Errando

    2007-01-01

    En la llamada «sociedad de la información» los estudios sobre cine se han visto diluidos en el abordaje pragmático y tecnológico del discurso audiovisual, así como la propia fruición del cine se ha visto atrapada en la red del DVD y del hipertexto. El propio cine reacciona ante ello a través de estructuras narrativas complejas que lo alejan del discurso audiovisual estándar. La función de los estudios sobre cine y de su enseñanza universitaria debe ser la reintroducción del sujeto rechazado d...

  20. Can you hear me yet? An intracranial investigation of speech and non-speech audiovisual interactions in human cortex

    Science.gov (United States)

    Rhone, Ariane E.; Nourski, Kirill V.; Oya, Hiroyuki; Kawasaki, Hiroto; Howard, Matthew A.; McMurray, Bob

    2016-01-01

    In everyday conversation, viewing a talker's face can provide information about the timing and content of an upcoming speech signal, resulting in improved intelligibility. Using electrocorticography, we tested whether human auditory cortex in Heschl's gyrus (HG) and on superior temporal gyrus (STG) and motor cortex on precentral gyrus (PreC) were responsive to visual/gestural information prior to the onset of sound and whether early stages of auditory processing were sensitive to the visual content (speech syllable versus non-speech motion). Event-related band power (ERBP) in the high gamma band was content-specific prior to acoustic onset on STG and PreC, and ERBP in the beta band differed in all three areas. Following sound onset, we found with no evidence for content-specificity in HG, evidence for visual specificity in PreC, and specificity for both modalities in STG. These results support models of audio-visual processing in which sensory information is integrated in non-primary cortical areas. PMID:27182530

  1. Prediction and constraint in audiovisual speech perception.

    Science.gov (United States)

    Peelle, Jonathan E; Sommers, Mitchell S

    2015-07-01

    During face-to-face conversational speech listeners must efficiently process a rapid and complex stream of multisensory information. Visual speech can serve as a critical complement to auditory information because it provides cues to both the timing of the incoming acoustic signal (the amplitude envelope, influencing attention and perceptual sensitivity) and its content (place and manner of articulation, constraining lexical selection). Here we review behavioral and neurophysiological evidence regarding listeners' use of visual speech information. Multisensory integration of audiovisual speech cues improves recognition accuracy, particularly for speech in noise. Even when speech is intelligible based solely on auditory information, adding visual information may reduce the cognitive demands placed on listeners through increasing the precision of prediction. Electrophysiological studies demonstrate that oscillatory cortical entrainment to speech in auditory cortex is enhanced when visual speech is present, increasing sensitivity to important acoustic cues. Neuroimaging studies also suggest increased activity in auditory cortex when congruent visual information is available, but additionally emphasize the involvement of heteromodal regions of posterior superior temporal sulcus as playing a role in integrative processing. We interpret these findings in a framework of temporally-focused lexical competition in which visual speech information affects auditory processing to increase sensitivity to acoustic information through an early integration mechanism, and a late integration stage that incorporates specific information about a speaker's articulators to constrain the number of possible candidates in a spoken utterance. Ultimately it is words compatible with both auditory and visual information that most strongly determine successful speech perception during everyday listening. Thus, audiovisual speech perception is accomplished through multiple stages of integration

  2. The Perception of Place Cues in a Second Language

    OpenAIRE

    Fujino, Mako

    2007-01-01

    Many researchers have studied second language (L2) speech perception in terms of the perceptual assimilation into first language (L1) sound categories. However, this approach does not always show whether L2 learners perceive and weigh individual acoustic cues in the same way as native speakers, when they show native-like identification/discrimination of L2 sound contrasts. This issue raises a question especially in the perception of place contrasts which have 2 important cues; a consonant and...

  3. Using the Visual Phonics System to Improve Speech Skills: A Preliminary Study.

    Science.gov (United States)

    Wilson-Favors, Vanessa

    1987-01-01

    The "Visual Phonics" system, which uses 43 hand cues and corresponding written symbols to help deaf students improve their speech and reading skills, was evaluated with six deaf upper elementary grade students in a speech therapy program. Pre- and posttesting indicated substantially improved articulation both with and without hand cues. (DB)

  4. Speech & Language Therapy for Children and Adolescents with Down Syndrome

    Science.gov (United States)

    ... some children, the written word can provide helpful cues when using expressive language. What Are the Speech ... as providing the student with written rather than verbal instructions or including fewer items on a class ...

  5. Melodic and Rhythmic Contrasts in Emotional Speech and Music

    OpenAIRE

    Quinto, Lena; Thompson, William Forde; Keating, Felicity Louise

    2013-01-01

    Many cues convey emotion similarly in speech and music. Researchers have established that acoustic cues such as pitch height, tempo, and intensity carry important emotional information in both domains. In this investigation, we examined the emotional significance of melodic and rhythmic contrasts between successive syllables or tones in speech and music, referred to as Melodic Interval Variability (MIV) and the normalized Pairwise Variability Index (nPVI). The spoken stimuli were 96 tokens ex...

  6. Audio-Visual Classification of Sports Types

    DEFF Research Database (Denmark)

    Gade, Rikke; Abou-Zleikha, Mohamed; Christensen, Mads Græsbøll;

    2015-01-01

    In this work we propose a method for classification of sports types from combined audio and visual features ex- tracted from thermal video. From audio Mel Frequency Cepstral Coefficients (MFCC) are extracted, and PCA are applied to reduce the feature space to 10 dimensions. From the visual modality...... short trajectories are constructed to rep- resent the motion of players. From these, four motion fea- tures are extracted and combined directly with audio fea- tures for classification. A k-nearest neighbour classifier is applied for classification of 180 1-minute video sequences from three sports types...

  7. Newborn Infants' Sensitivity to Perceptual Cues to Lexical and Grammatical Words.

    Science.gov (United States)

    Shi, Rushen; Werker, Janet F.; Morgan, James L.

    1999-01-01

    Presented neonates with lexical and grammatical words prepared from natural maternal speech. Found that neonates could categorically discriminate the sets based on a constellation of perceptual cues that distinguished them. Suggested that this ability to discriminate words on basis of multiple acoustic/phonological cues provides a perceptual base…

  8. Acoustic Markers of Prosodic Boundaries in Spanish Spontaneous Alaryngeal Speech

    Science.gov (United States)

    Cuenca, M. H.; Barrio, M. M.

    2010-01-01

    Prosodic information aids segmentation of the continuous speech signal and thereby facilitates auditory speech processing. Durational and pitch variations are prosodic cues especially necessary to convey prosodic boundaries, but alaryngeal speakers have inconsistent control over acoustic parameters such as F0 and duration, being as a result noisy…

  9. Speech recognition interference by the temporal and spectral properties of a single competing talker.

    Science.gov (United States)

    Fogerty, Daniel; Xu, Jiaqian

    2016-08-01

    This study investigated how speech recognition during speech-on-speech masking may be impaired due to the interaction between amplitude modulations of the target and competing talker. Young normal-hearing adults were tested in a competing talker paradigm where the target and/or competing talker was processed to primarily preserve amplitude modulation cues. Effects of talker sex and linguistic interference were also examined. Results suggest that performance patterns for natural speech-on-speech conditions are largely consistent with the same masking patterns observed for signals primarily limited to temporal amplitude modulations. However, results also suggest a role for spectral cues in talker segregation and linguistic competition. PMID:27586780

  10. The perception of isochrony and phonetic synchronisation in dubbing. An introduction to how Spanish cinema-goers perceive French and English dubbed films in terms of the audio-visual matching experience

    OpenAIRE

    Iturregui Gallardo, Gonzalo

    2014-01-01

    The McGurk-MacDonald effect explains the perception of speech as a duality separately perceived by the cognitive system. Dubbing combines two stimuli of different linguistic origin. The study is an analysis of the perception of the stimuli in speech (auditory and visual) and the dyschronies in the matching in dubbing.English and French scenes dubbed into Spanish were selected. The experiment reveals that Spanish viewers develop a great acceptance to dyschronies in dubbing. Furthermore, subjec...

  11. Performance of current models of speech recognition and resulting challenges

    OpenAIRE

    Schubotz, Wiebke

    2015-01-01

    Speech is usually perceived in background noise (masker) that can severely hamper its recognition. Nevertheless, there are mechanisms that enable speech recognition even in difficult listening conditions. Some of them, such as e.g., the combination of across-frequency information or binaural cues, are studied in this dissertation. Moreover, masking aspects such as energetic, amplitude modulation or informational masking are considered. Speech recognition in complex maskers is investigated tha...

  12. Can you hear my age? Influences of speech rate and speech spontaneity on estimation of speaker age.

    Science.gov (United States)

    Skoog Waller, Sara; Eriksson, Mårten; Sörqvist, Patrik

    2015-01-01

    Cognitive hearing science is mainly about the study of how cognitive factors contribute to speech comprehension, but cognitive factors also partake in speech processing to infer non-linguistic information from speech signals, such as the intentions of the talker and the speaker's age. Here, we report two experiments on age estimation by "naïve" listeners. The aim was to study how speech rate influences estimation of speaker age by comparing the speakers' natural speech rate with increased or decreased speech rate. In Experiment 1, listeners were presented with audio samples of read speech from three different speaker age groups (young, middle aged, and old adults). They estimated the speakers as younger when speech rate was faster than normal and as older when speech rate was slower than normal. This speech rate effect was slightly greater in magnitude for older (60-65 years) speakers in comparison with younger (20-25 years) speakers, suggesting that speech rate may gain greater importance as a perceptual age cue with increased speaker age. This pattern was more pronounced in Experiment 2, in which listeners estimated age from spontaneous speech. Faster speech rate was associated with lower age estimates, but only for older and middle aged (40-45 years) speakers. Taken together, speakers of all age groups were estimated as older when speech rate decreased, except for the youngest speakers in Experiment 2. The absence of a linear speech rate effect in estimates of younger speakers, for spontaneous speech, implies that listeners use different age estimation strategies or cues (possibly vocabulary) depending on the age of the speaker and the spontaneity of the speech. Potential implications for forensic investigations and other applied domains are discussed. PMID:26236259

  13. Recognizing intentions in infant-directed speech: evidence for universals.

    Science.gov (United States)

    Bryant, Gregory A; Barrett, H Clark

    2007-08-01

    In all languages studied to date, distinct prosodic contours characterize different intention categories of infant-directed (ID) speech. This vocal behavior likely exists universally as a species-typical trait, but little research has examined whether listeners can accurately recognize intentions in ID speech using only vocal cues, without access to semantic information. We recorded native-English-speaking mothers producing four intention categories of utterances (prohibition, approval, comfort, and attention) as both ID and adult-directed (AD) speech, and we then presented the utterances to Shuar adults (South American hunter-horticulturalists). Shuar subjects were able to reliably distinguish ID from AD speech and were able to reliably recognize the intention categories in both types of speech, although performance was significantly better with ID speech. This is the first demonstration that adult listeners in an indigenous, nonindustrialized, and nonliterate culture can accurately infer intentions from both ID speech and AD speech in a language they do not speak. PMID:17680948

  14. The Acquisition of Verbal Communication Skills by Severely Hearing-Impaired Children through the Modified Cued Speech-Phonetic Alphabet Method.

    Science.gov (United States)

    Duffy, John K.

    The paper describes the potential of cued speech to provide verbal language and intelligible speech to severely hearing impaired students. The approach, which combines auditory-visual-oral and manual cues, is designed as a visual supplement to normal speech. The paper traces the development of cued speech and discusses modifications made to the R.…

  15. Speech coding

    Energy Technology Data Exchange (ETDEWEB)

    Ravishankar, C., Hughes Network Systems, Germantown, MD

    1998-05-08

    Speech is the predominant means of communication between human beings and since the invention of the telephone by Alexander Graham Bell in 1876, speech services have remained to be the core service in almost all telecommunication systems. Original analog methods of telephony had the disadvantage of speech signal getting corrupted by noise, cross-talk and distortion Long haul transmissions which use repeaters to compensate for the loss in signal strength on transmission links also increase the associated noise and distortion. On the other hand digital transmission is relatively immune to noise, cross-talk and distortion primarily because of the capability to faithfully regenerate digital signal at each repeater purely based on a binary decision. Hence end-to-end performance of the digital link essentially becomes independent of the length and operating frequency bands of the link Hence from a transmission point of view digital transmission has been the preferred approach due to its higher immunity to noise. The need to carry digital speech became extremely important from a service provision point of view as well. Modem requirements have introduced the need for robust, flexible and secure services that can carry a multitude of signal types (such as voice, data and video) without a fundamental change in infrastructure. Such a requirement could not have been easily met without the advent of digital transmission systems, thereby requiring speech to be coded digitally. The term Speech Coding is often referred to techniques that represent or code speech signals either directly as a waveform or as a set of parameters by analyzing the speech signal. In either case, the codes are transmitted to the distant end where speech is reconstructed or synthesized using the received set of codes. A more generic term that is applicable to these techniques that is often interchangeably used with speech coding is the term voice coding. This term is more generic in the sense that the

  16. Relationship between perceptual learning in speech and statistical learning in younger and older adults

    OpenAIRE

    Thordis Marisa Neger; Esther Janse

    2014-01-01

    Within a few sentences, listeners learn to understand severely degraded speech such as noise-vocoded speech. However, individuals vary in the amount of such perceptual learning and it is unclear what underlies these differences. The present study investigates whether perceptual learning in speech relates to statistical learning, as sensitivity to probabilistic information may aid identification of relevant cues in novel speech input. If statistical learning and perceptual learning (partly) dr...

  17. How is the McGurk effect modulated by Cued Speech in deaf and hearing adults ?

    OpenAIRE

    ClémenceBayard

    2014-01-01

    Speech perception for both hearing and deaf people involves an integrative process between auditory and lip-reading information. In order to disambiguate information from lips, manual cues from Cued Speech may be added. Cued Speech (CS) is a system of manual aids developed to help deaf people to clearly and completely understand speech visually (Cornett, 1967). Within this system, both labial and manual information, as lone input sources, remain ambiguous. Perceivers, therefore, have to combi...

  18. How is the McGurk effect modulated by Cued Speech in deaf and hearing adults?

    OpenAIRE

    Bayard, Clémence; Colin, Cécile; Leybaert, Jacqueline

    2014-01-01

    Speech perception for both hearing and deaf people involves an integrative process between auditory and lip-reading information. In order to disambiguate information from lips, manual cues from Cued Speech may be added. Cued Speech (CS) is a system of manual aids developed to help deaf people to clearly and completely understand speech visually (Cornett, 1967). Within this system, both labial and manual information, as lone input sources, remain ambiguous. Perceivers, therefore, have to combi...

  19. Cue conflicts in context

    DEFF Research Database (Denmark)

    Boeg Thomsen, Ditte; Poulsen, Mads

    2015-01-01

    When learning their first language, children develop strategies for assigning semantic roles to sentence structures, depending on morphosyntactic cues such as case and word order. Traditionally, comprehension experiments have presented transitive clauses in isolation, and crosslinguistically chil...

  20. Emotional speech processing at the intersection of prosody and semantics.

    Science.gov (United States)

    Schwartz, Rachel; Pell, Marc D

    2012-01-01

    The ability to accurately perceive emotions is crucial for effective social interaction. Many questions remain regarding how different sources of emotional cues in speech (e.g., prosody, semantic information) are processed during emotional communication. Using a cross-modal emotional priming paradigm (Facial affect decision task), we compared the relative contributions of processing utterances with single-channel (prosody-only) versus multi-channel (prosody and semantic) cues on the perception of happy, sad, and angry emotional expressions. Our data show that emotional speech cues produce robust congruency effects on decisions about an emotionally related face target, although no processing advantage occurred when prime stimuli contained multi-channel as opposed to single-channel speech cues. Our data suggest that utterances with prosodic cues alone and utterances with combined prosody and semantic cues both activate knowledge that leads to emotional congruency (priming) effects, but that the convergence of these two information sources does not always heighten access to this knowledge during emotional speech processing. PMID:23118868

  1. Emotional speech processing at the intersection of prosody and semantics.

    Directory of Open Access Journals (Sweden)

    Rachel Schwartz

    Full Text Available The ability to accurately perceive emotions is crucial for effective social interaction. Many questions remain regarding how different sources of emotional cues in speech (e.g., prosody, semantic information are processed during emotional communication. Using a cross-modal emotional priming paradigm (Facial affect decision task, we compared the relative contributions of processing utterances with single-channel (prosody-only versus multi-channel (prosody and semantic cues on the perception of happy, sad, and angry emotional expressions. Our data show that emotional speech cues produce robust congruency effects on decisions about an emotionally related face target, although no processing advantage occurred when prime stimuli contained multi-channel as opposed to single-channel speech cues. Our data suggest that utterances with prosodic cues alone and utterances with combined prosody and semantic cues both activate knowledge that leads to emotional congruency (priming effects, but that the convergence of these two information sources does not always heighten access to this knowledge during emotional speech processing.

  2. Application and design of audio-visual aids in stomatology teaching cariology, endodontology and operative dentistry in non-stomatology students%直观教学法在非口腔医学专业医学生牙体牙髓病教学中的设计与应用

    Institute of Scientific and Technical Information of China (English)

    倪雪岩; 吕亚林; 曹莹; 臧滔; 董坚; 丁芳; 李若萱

    2014-01-01

    Objective To evaluate the effects of audio-visual aids on stomatology teaching cariology , end-odontology and operative dentistry among non-stomatology students .Methods Totally 77 students from 2010-2011 matriculating classes of the Preventive Medicine Department of Capital Medical University were selected .Di-versified audio-visual aids were used comprehensively in teaching .An examination of theory and a follow-up survey were carried out and analyzed to obtain the feedback of the combined teaching methods .Results The students had better theoretical knowledge of endodontics; mean score was 24.2 ±1.1; questionnaire survey showed that 89.6%(69/77) of students had positive attitude towards the improvement of teaching method .90.9% of the students (70/77) that had audio-visual aids in stomatology teaching had good learning ability .Conclusions Ap-plication of audio-visual aids for stomatology teaching increases the interest in learning and improves the teaching effect.However, the integration should be carefully prepared in combination with cross teaching method and elicit -ation pedagogy in order to accomplish optimistic teaching results .%目的:评价在非口腔医学专业医学生牙体牙髓病教学中设计并实施口腔直观教学法的教学效果。方法以首都医科大学2010、2011级预防医学专业77名学生作为研究对象,授课时综合运用多种直观教学方式与手段,教学结束后,采用理论考核和问卷调查方式评价教学效果,分析学生对口腔直观教学法的评价。结果学生对牙体牙髓病学理论知识掌握较好,平均分为(24.2±1.1)分,问卷调查结果显示,89.6%(69/77)的学生对直观教学法给予肯定。90.9%(70/77)的学生认为应用直观教学法提高了学习能力。结论直观教学法的应用,增强了学习兴趣,提高了教学效果。直观教学法适用于牙体牙髓病学教学,但需要精心设计,将直观教学

  3. Hate speech

    Directory of Open Access Journals (Sweden)

    Anne Birgitta Nilsen

    2014-03-01

    Full Text Available The manifesto of the Norwegian terrorist Anders Behring Breivik is based on the “Eurabia” conspiracy theory. This theory is a key starting point for hate speech amongst many right-wing extremists in Europe, but also has ramifications beyond these environments. In brief, proponents of the Eurabia theory claim that Muslims are occupying Europe and destroying Western culture, with the assistance of the EU and European governments. By contrast, members of Al-Qaeda and other extreme Islamists promote the conspiracy theory “the Crusade” in their hate speech directed against the West. Proponents of the latter theory argue that the West is leading a crusade to eradicate Islam and Muslims, a crusade that is similarly facilitated by their governments. This article presents analyses of texts written by right-wing extremists and Muslim extremists in an effort to shed light on how hate speech promulgates conspiracy theories in order to spread hatred and intolerance.The aim of the article is to contribute to a more thorough understanding of hate speech’s nature by applying rhetorical analysis. Rhetorical analysis is chosen because it offers a means of understanding the persuasive power of speech. It is thus a suitable tool to describe how hate speech works to convince and persuade. The concepts from rhetorical theory used in this article are ethos, logos and pathos. The concept of ethos is used to pinpoint factors that contributed to Osama bin Laden's impact, namely factors that lent credibility to his promotion of the conspiracy theory of the Crusade. In particular, Bin Laden projected common sense, good morals and good will towards his audience. He seemed to have coherent and relevant arguments; he appeared to possess moral credibility; and his use of language demonstrated that he wanted the best for his audience.The concept of pathos is used to define hate speech, since hate speech targets its audience's emotions. In hate speech it is the

  4. Alerting prefixes for speech warning messages. [in helicopters

    Science.gov (United States)

    Bucher, N. M.; Voorhees, J. W.; Karl, R. L.; Werner, E.

    1984-01-01

    A major question posed by the design of an integrated voice information display/warning system for next-generation helicopter cockpits is whether an alerting prefix should precede voice warning messages; if so, the characteristics desirable in such a cue must also be addressed. Attention is presently given to the results of a study which ascertained pilot response time and response accuracy to messages preceded by either neutral cues or the cognitively appropriate semantic cues. Both verbal cues and messages were spoken in direct, phoneme-synthesized speech, and a training manipulation was included to determine the extent to which previous exposure to speech thus produced facilitates these messages' comprehension. Results are discussed in terms of the importance of human factors research in cockpit display design.

  5. Speech Enhancement

    DEFF Research Database (Denmark)

    Benesty, Jacob; Jensen, Jesper Rindom; Christensen, Mads Græsbøll;

    and their performance bounded and assessed in terms of noise reduction and speech distortion. The book shows how various filter designs can be obtained in this framework, including the maximum SNR, Wiener, LCMV, and MVDR filters, and how these can be applied in various contexts, like in single......Speech enhancement is a classical problem in signal processing, yet still largely unsolved. Two of the conventional approaches for solving this problem are linear filtering, like the classical Wiener filter, and subspace methods. These approaches have traditionally been treated as different classes...

  6. Speech enhancement

    CERN Document Server

    Benesty, Jacob; Chen, Jingdong

    2006-01-01

    We live in a noisy world! In all applications (telecommunications, hands-free communications, recording, human-machine interfaces, etc.) that require at least one microphone, the signal of interest is usually contaminated by noise and reverberation. As a result, the microphone signal has to be ""cleaned"" with digital signal processing tools before it is played out, transmitted, or stored.This book is about speech enhancement. Different well-known and state-of-the-art methods for noise reduction, with one or multiple microphones, are discussed. By speech enhancement, we mean not only noise red

  7. Voice quality in affect cueing: does loudness matter?

    OpenAIRE

    IrenaYanushevskaya

    2013-01-01

    In emotional speech research, it has been suggested that loudness, along with other prosodic features, may be an important cue in communicating high activation affects. In earlier studies, we found different voice quality stimuli to be consistently associated with certain affective states. In these stimuli, as in typical human productions, the different voice qualities entailed differences in loudness. To examine the extent to which the loudness differences among these voice qualities might i...

  8. Sound frequency affects speech emotion perception: results from congenital amusia

    Science.gov (United States)

    Lolli, Sydney L.; Lewenstein, Ari D.; Basurto, Julian; Winnik, Sean; Loui, Psyche

    2015-01-01

    Congenital amusics, or “tone-deaf” individuals, show difficulty in perceiving and producing small pitch differences. While amusia has marked effects on music perception, its impact on speech perception is less clear. Here we test the hypothesis that individual differences in pitch perception affect judgment of emotion in speech, by applying low-pass filters to spoken statements of emotional speech. A norming study was first conducted on Mechanical Turk to ensure that the intended emotions from the Macquarie Battery for Evaluation of Prosody were reliably identifiable by US English speakers. The most reliably identified emotional speech samples were used in Experiment 1, in which subjects performed a psychophysical pitch discrimination task, and an emotion identification task under low-pass and unfiltered speech conditions. Results showed a significant correlation between pitch-discrimination threshold and emotion identification accuracy for low-pass filtered speech, with amusics (defined here as those with a pitch discrimination threshold >16 Hz) performing worse than controls. This relationship with pitch discrimination was not seen in unfiltered speech conditions. Given the dissociation between low-pass filtered and unfiltered speech conditions, we inferred that amusics may be compensating for poorer pitch perception by using speech cues that are filtered out in this manipulation. To assess this potential compensation, Experiment 2 was conducted using high-pass filtered speech samples intended to isolate non-pitch cues. No significant correlation was found between pitch discrimination and emotion identification accuracy for high-pass filtered speech. Results from these experiments suggest an influence of low frequency information in identifying emotional content of speech. PMID:26441718

  9. Sound frequency affects speech emotion perception: results from congenital amusia.

    Science.gov (United States)

    Lolli, Sydney L; Lewenstein, Ari D; Basurto, Julian; Winnik, Sean; Loui, Psyche

    2015-01-01

    Congenital amusics, or "tone-deaf" individuals, show difficulty in perceiving and producing small pitch differences. While amusia has marked effects on music perception, its impact on speech perception is less clear. Here we test the hypothesis that individual differences in pitch perception affect judgment of emotion in speech, by applying low-pass filters to spoken statements of emotional speech. A norming study was first conducted on Mechanical Turk to ensure that the intended emotions from the Macquarie Battery for Evaluation of Prosody were reliably identifiable by US English speakers. The most reliably identified emotional speech samples were used in Experiment 1, in which subjects performed a psychophysical pitch discrimination task, and an emotion identification task under low-pass and unfiltered speech conditions. Results showed a significant correlation between pitch-discrimination threshold and emotion identification accuracy for low-pass filtered speech, with amusics (defined here as those with a pitch discrimination threshold >16 Hz) performing worse than controls. This relationship with pitch discrimination was not seen in unfiltered speech conditions. Given the dissociation between low-pass filtered and unfiltered speech conditions, we inferred that amusics may be compensating for poorer pitch perception by using speech cues that are filtered out in this manipulation. To assess this potential compensation, Experiment 2 was conducted using high-pass filtered speech samples intended to isolate non-pitch cues. No significant correlation was found between pitch discrimination and emotion identification accuracy for high-pass filtered speech. Results from these experiments suggest an influence of low frequency information in identifying emotional content of speech. PMID:26441718

  10. Multisensor image cueing (MUSIC)

    Science.gov (United States)

    Rodvold, David; Patterson, Tim J.

    2002-07-01

    There have been many years of research and development in the Automatic Target Recognition (ATR) community. This development has resulted in numerous algorithms to perform target detection automatically. The morphing of the ATR acronym to Aided Target Recognition provides a succinct commentary regarding the success of the automatic target recognition research. Now that the goal is aided recognition, many of the algorithms which were not able to provide autonomous recognition may now provide valuable assistance in cueing a human analyst where to look in the images under consideration. This paper describes the MUSIC system being developed for the US Air Force to provide multisensor image cueing. The tool works across multiple image phenomenologies and fuses the evidence across the set of available imagery. MUSIC is designed to work with a wide variety of sensors and platforms, and provide cueing to an image analyst in an information-rich environment. The paper concentrates on the current integration of algorithms into an extensible infrastructure to allow cueing in multiple image types.

  11. Composition: Cue Wheel

    DEFF Research Database (Denmark)

    Bergstrøm-Nielsen, Carl

    2014-01-01

    Cue Rondo is an open composition to be realised by improvising musicians. See more about my composition practise in the entry "Composition - General Introduction". This work is licensed under a Creative Commons "by-nc" License. You may for non-commercial purposes use and distribute it, performance...

  12. Recognizing intentions in infant-directed speech: Evidence for universals

    OpenAIRE

    Bryant, GA; Barrett, HC

    2007-01-01

    In all languages studied to date, distinct prosodic contours characterize different intention categories of infant-directed (ID) speech. This vocal behavior likely exists universally as a species-typical trait, but little research has examined whether listeners can accurately recognize intentions in ID speech using only vocal cues, without access to semantic information. We recorded native-English-speaking mothers producing four intention categories of utterances (prohibition, approval, comfo...

  13. Performance evaluation of a motor-imagery-based EEG-Brain computer interface using a combined cue with heterogeneous training data in BCI-Naive subjects

    Directory of Open Access Journals (Sweden)

    Lee Youngbum

    2011-10-01

    Full Text Available Abstract Background The subjects in EEG-Brain computer interface (BCI system experience difficulties when attempting to obtain the consistent performance of the actual movement by motor imagery alone. It is necessary to find the optimal conditions and stimuli combinations that affect the performance factors of the EEG-BCI system to guarantee equipment safety and trust through the performance evaluation of using motor imagery characteristics that can be utilized in the EEG-BCI testing environment. Methods The experiment was carried out with 10 experienced subjects and 32 naive subjects on an EEG-BCI system. There were 3 experiments: The experienced homogeneous experiment, the naive homogeneous experiment and the naive heterogeneous experiment. Each experiment was compared in terms of the six audio-visual cue combinations and consisted of 50 trials. The EEG data was classified using the least square linear classifier in case of the naive subjects through the common spatial pattern filter. The accuracy was calculated using the training and test data set. The p-value of the accuracy was obtained through the statistical significance test. Results In the case in which a naive subject was trained by a heterogeneous combined cue and tested by a visual cue, the result was not only the highest accuracy (p Conclusions We propose the use of this measuring methodology of a heterogeneous combined cue for training data and a visual cue for test data by the typical EEG-BCI algorithm on the EEG-BCI system to achieve effectiveness in terms of consistence, stability, cost, time, and resources management without the need for a trial and error process.

  14. Silent Speech Interfaces

    OpenAIRE

    Denby, B; Schultz, T.; Honda, K.; Hueber, T.; Gilbert, J.M.; Brumberg, J.S.

    2010-01-01

    Abstract The possibility of speech processing in the absence of an intelligible acoustic signal has given rise to the idea of a `silent speech? interface, to be used as an aid for the speech handicapped, or as part of a communications system operating in silence-required or high-background-noise environments. The article first outlines the emergence of the silent speech interface from the fields of speech production, automatic speech processing, speech pathology research, and telec...

  15. Language and Speech Processing

    CERN Document Server

    Mariani, Joseph

    2008-01-01

    Speech processing addresses various scientific and technological areas. It includes speech analysis and variable rate coding, in order to store or transmit speech. It also covers speech synthesis, especially from text, speech recognition, including speaker and language identification, and spoken language understanding. This book covers the following topics: how to realize speech production and perception systems, how to synthesize and understand speech using state-of-the-art methods in signal processing, pattern recognition, stochastic modelling computational linguistics and human factor studi

  16. A Computational Model of Word Segmentation from Continuous Speech Using Transitional Probabilities of Atomic Acoustic Events

    Science.gov (United States)

    Rasanen, Okko

    2011-01-01

    Word segmentation from continuous speech is a difficult task that is faced by human infants when they start to learn their native language. Several studies indicate that infants might use several different cues to solve this problem, including intonation, linguistic stress, and transitional probabilities between subsequent speech sounds. In this…

  17. Speech coding

    Science.gov (United States)

    Gersho, Allen

    1990-05-01

    Recent advances in algorithms and techniques for speech coding now permit high quality voice reproduction at remarkably low bit rates. The advent of powerful single-ship signal processors has made it cost effective to implement these new and sophisticated speech coding algorithms for many important applications in voice communication and storage. Some of the main ideas underlying the algorithms of major interest today are reviewed. The concept of removing redundancy by linear prediction is reviewed, first in the context of predictive quantization or DPCM. Then linear predictive coding, adaptive predictive coding, and vector quantization are discussed. The concepts of excitation coding via analysis-by-synthesis, vector sum excitation codebooks, and adaptive postfiltering are explained. The main idea of vector excitation coding (VXC) or code excited linear prediction (CELP) are presented. Finally low-delay VXC coding and phonetic segmentation for VXC are described.

  18. Hate speech

    OpenAIRE

    Anne Birgitta Nilsen

    2014-01-01

    The manifesto of the Norwegian terrorist Anders Behring Breivik is based on the “Eurabia” conspiracy theory. This theory is a key starting point for hate speech amongst many right-wing extremists in Europe, but also has ramifications beyond these environments. In brief, proponents of the Eurabia theory claim that Muslims are occupying Europe and destroying Western culture, with the assistance of the EU and European governments. By contrast, members of Al-Qaeda and other extreme Islamists prom...

  19. Mind your pricing cues.

    Science.gov (United States)

    Anderson, Eric; Simester, Duncan

    2003-09-01

    For most of the items they buy, consumers don't have an accurate sense of what the price should be. Ask them to guess how much a four-pack of 35-mm film costs, and you'll get a variety of wrong answers: Most people will underestimate; many will only shrug. Research shows that consumers' knowledge of the market is so far from perfect that it hardly deserves to be called knowledge at all. Yet people happily buy film and other products every day. Is this because they don't care what kind of deal they're getting? No. Remarkably, it's because they rely on retailers to tell them whether they're getting a good price. In subtle and not-so-subtle ways, retailers send signals to customers, telling them whether a given price is relatively high or low. In this article, the authors review several common pricing cues retailers use--"sale" signs, prices that end in 9, signpost items, and price-matching guarantees. They also offer some surprising facts about how--and how well--those cues work. For instance, the authors' tests with several mail-order catalogs reveal that including the word "sale" beside a price can increase demand by more than 50%. The practice of using a 9 at the end of a price to denote a bargain is so common, you'd think customers would be numb to it. Yet in a study the authors did involving a women's clothing catalog, they increased demand by a third just by changing the price of a dress from $34 to $39. Pricing cues are powerful tools for guiding customers' purchasing decisions, but they must be applied judiciously. Used inappropriately, the cues may breach customers' trust, reduce brand equity, and give rise to lawsuits. PMID:12964397

  20. Entraining with another person's speech rhythm: Evidence from healthy speakers and individuals with Parkinson's disease.

    Science.gov (United States)

    Späth, Mona; Aichert, Ingrid; Ceballos-Baumann, Andrés O; Wagner-Sonntag, Edith; Miller, Nick; Ziegler, Wolfram

    2016-01-01

    This study examines entrainment of speech timing and rhythm with a model speaker in healthy persons and individuals with Parkinson's. We asked whether participants coordinate their speech initiation and rhythm with the model speaker, and whether the regularity of metrical structure of sentences influences this behaviour. Ten native German speakers with hypokinetic dysarthria following Parkinson's and 10 healthy controls heard a sentence ('prime') and subsequently read aloud another sentence ('target'). Speech material comprised 32 metrically regular and irregular sentences, respectively. Turn-taking delays and alignment of speech rhythm were measured using speech wave analyses. Results showed that healthy participants initiated speech more closely in rhythm with the model speaker than patients. Metrically regular prime sentences induced anticipatory responses relative to metrically irregular primes. Entrainment of speech rhythm was greater in metrically regular targets, especially in individuals with Parkinson's. We conclude that individuals with Parkinson's may exploit metrically regular cues in speech. PMID:26786186

  1. Tactile perception by the profoundly deaf. Speech and environmental sounds.

    Science.gov (United States)

    Plant, G L

    1982-11-01

    Four subjects fitted with single-channel vibrotactile aids and provided with training in their use took part in a testing programme aimed at assessing their aided and unaided lipreading performance, their ability to detect segmental and suprasegmental features of speech, and the discrimination of common environmental sounds. The results showed that the vibrotactile aid provided very useful information as to speech and non-speech stimuli with the subjects performing best on those tasks where time/intensity cues provided sufficient information to enable identification. The implications of the study are discussed and a comparison made with those results reported for subjects using cochlear implants. PMID:6897619

  2. Speech and Communication Disorders

    Science.gov (United States)

    ... or understand speech. Causes include Hearing disorders and deafness Voice problems, such as dysphonia or those caused by cleft lip or palate Speech problems like stuttering Developmental disabilities Learning disorders Autism spectrum disorder Brain injury Stroke Some speech and ...

  3. Speech disorders - children

    Science.gov (United States)

    ... of speech disorders may disappear on their own. Speech therapy may help with more severe symptoms or speech problems that do not improve. In therapy, the child will learn how to create certain sounds.

  4. Only pre-cueing but no retro-cueing effects emerge with masked arrow cues.

    Science.gov (United States)

    Janczyk, Markus; Reuss, Heiko

    2016-05-01

    The impact of masked stimulation on cognitive control processes is investigated with much interest. In many cases, masked stimulation suffices to initiate and employ control processes. Shifts of attention either happen in the external environment or internally, for example, in working memory. In the former, even masked cues (i.e., cues that are presented for a period too short to allow strategic use) were shown efficient for shifting attention to particular locations in pre-cue paradigms. Internal attention shifting can be investigated using retro-cues: long after encoding, a valid cue indicates the location to-be-tested via change detection, and this improves performance (retro-cue effect). In the present experiment, participants performed in both a pre- and a retro-cue task with masked and normally presented cues. While the masked cues benefitted performance in the pre-cue task, they did not in the retro-cue task. These results inform about limits of masked stimulation. PMID:26998561

  5. Anger Recognition in Speech Using Acoustic and Linguistic Cues

    OpenAIRE

    Polzehl, Tim; Alexander SCHMITT; Metze, Florian; Wagner, Michael

    2011-01-01

    Abstract The present study elaborates on the exploitation of both linguistic and acoustic feature modeling for anger classification. In terms of acoustic modeling we generate statistics from acoustic audio descriptors, e.g. pitch, loudness, spectral characteristics. Ranking our features we see that loudness and MFCC seems most promising for all databases. For the English database also pitch features are important. In terms of linguistic modeling we apply probabilistic and entropy-b...

  6. Intelligibility for Binaural Speech with Discarded Low-SNR Speech Components.

    Science.gov (United States)

    Schoenmaker, Esther; van de Par, Steven

    2016-01-01

    Speech intelligibility in multitalker settings improves when the target speaker is spatially separated from the interfering speakers. A factor that may contribute to this improvement is the improved detectability of target-speech components due to binaural interaction in analogy to the Binaural Masking Level Difference (BMLD). This would allow listeners to hear target speech components within specific time-frequency intervals that have a negative SNR, similar to the improvement in the detectability of a tone in noise when these contain disparate interaural difference cues. To investigate whether these negative-SNR target-speech components indeed contribute to speech intelligibility, a stimulus manipulation was performed where all target components were removed when local SNRs were smaller than a certain criterion value. It can be expected that for sufficiently high criterion values target speech components will be removed that do contribute to speech intelligibility. For spatially separated speakers, assuming that a BMLD-like detection advantage contributes to intelligibility, degradation in intelligibility is expected already at criterion values below 0 dB SNR. However, for collocated speakers it is expected that higher criterion values can be applied without impairing speech intelligibility. Results show that degradation of intelligibility for separated speakers is only seen for criterion values of 0 dB and above, indicating a negligible contribution of a BMLD-like detection advantage in multitalker settings. These results show that the spatial benefit is related to a spatial separation of speech components at positive local SNRs rather than to a BMLD-like detection improvement for speech components at negative local SNRs. PMID:27080648

  7. Visual Speech Perception in Children with Language Learning Impairments

    Science.gov (United States)

    Knowland, Victoria C. P.; Evans, Sam; Snell, Caroline; Rosen, Stuart

    2016-01-01

    Purpose: The purpose of the study was to assess the ability of children with developmental language learning impairments (LLIs) to use visual speech cues from the talking face. Method: In this cross-sectional study, 41 typically developing children (mean age: 8 years 0 months, range: 4 years 5 months to 11 years 10 months) and 27 children with…

  8. Cueing Visual Attention to Spatial Locations With Auditory Cues

    OpenAIRE

    Kean, Matthew; Crawford, Trevor J

    2008-01-01

    We investigated exogenous and endogenous orienting of visual attention to the spatial loca-tion of an auditory cue. In Experiment 1, significantly faster saccades were observed to vis-ual targets appearing ipsilateral, compared to contralateral, to the peripherally-presented cue. This advantage was greatest in an 80% target-at-cue (TAC) condition but equivalent in 20% and 50% TAC conditions. In Experiment 2, participants maintained central fixation while making an elevation judgment of the pe...

  9. Speech recognition and understanding

    Energy Technology Data Exchange (ETDEWEB)

    Vintsyuk, T.K.

    1983-05-01

    This article discusses the automatic processing of speech signals with the aim of finding a sequence of works (speech recognition) or a concept (speech understanding) being transmitted by the speech signal. The goal of the research is to develop an automatic typewriter that will automatically edit and type text under voice control. A dynamic programming method is proposed in which all possible class signals are stored, after which the presented signal is compared to all the stored signals during the recognition phase. Topics considered include element-by-element recognition of words of speech, learning speech recognition, phoneme-by-phoneme speech recognition, the recognition of connected speech, understanding connected speech, and prospects for designing speech recognition and understanding systems. An application of the composition dynamic programming method for the solution of basic problems in the recognition and understanding of speech is presented.

  10. Modeling the Contribution of Phonotactic Cues to the Problem of Word Segmentation

    Science.gov (United States)

    Blanchard, Daniel; Heinz, Jeffrey; Golinkoff, Roberta

    2010-01-01

    How do infants find the words in the speech stream? Computational models help us understand this feat by revealing the advantages and disadvantages of different strategies that infants might use. Here, we outline a computational model of word segmentation that aims both to incorporate cues proposed by language acquisition researchers and to…

  11. Weighting of Acoustic Cues to a Manner Distinction by Children with and without Hearing Loss

    Science.gov (United States)

    Nittrouer, Susan; Lowenstein, Joanna H.

    2015-01-01

    Purpose: Children must develop optimal perceptual weighting strategies for processing speech in their first language. Hearing loss can interfere with that development, especially if cochlear implants are required. The three goals of this study were to measure, for children with and without hearing loss: (a) cue weighting for a manner distinction,…

  12. Cross-Linguistic Differences in Prosodic Cues to Syntactic Disambiguation in German and English

    Science.gov (United States)

    O'Brien, Mary Grantham; Jackson, Carrie N.; Gardner, Christine E.

    2014-01-01

    This study examined whether late-learning English-German second language (L2) learners and late-learning German-English L2 learners use prosodic cues to disambiguate temporarily ambiguous first language and L2 sentences during speech production. Experiments 1a and 1b showed that English-German L2 learners and German-English L2 learners used a…

  13. Sound frequency affects speech emotion perception: Results from congenital amusia

    Directory of Open Access Journals (Sweden)

    Sydney eLolli

    2015-09-01

    Full Text Available Congenital amusics, or tone-deaf individuals, show difficulty in perceiving and producing small pitch differences. While amusia has marked effects on music perception, its impact on speech perception is less clear. Here we test the hypothesis that individual differences in pitch perception affect judgment of emotion in speech, by applying band-pass filters to spoken statements of emotional speech. A norming study was first conducted on Mechanical Turk to ensure that the intended emotions from the Macquarie Battery for Evaluation of Prosody (MBEP were reliably identifiable by US English speakers. The most reliably identified emotional speech samples were used in in Experiment 1, in which subjects performed a psychophysical pitch discrimination task, and an emotion identification task under band-pass and unfiltered speech conditions. Results showed a significant correlation between pitch discrimination threshold and emotion identification accuracy for band-pass filtered speech, with amusics (defined here as those with a pitch discrimination threshold > 16 Hz performing worse than controls. This relationship with pitch discrimination was not seen in unfiltered speech conditions. Given the dissociation between band-pass filtered and unfiltered speech conditions, we inferred that amusics may be compensating for poorer pitch perception by using speech cues that are filtered out in this manipulation.

  14. A Study between College English Autonomous Audio-visual Learning of Network and Cultivating the Ability of Listening Comprehension%大学英语网络自主视听学习与听力理解能力培养研究

    Institute of Scientific and Technical Information of China (English)

    柴春曼

    2014-01-01

    Under the network environment ,cultivating college students’ autonomous learning ability is an important subject in college English teaching reform .Language acquisition is a kind of independent learning ,so language learning must go through the acquisition process in order to achieve the pragmatic purpose ,and the foreign language acquisition cannot do without the students’ autonomous learning .In the process of independent audio-visual learning ,after trying a variety of learning strategies ,learners developed cognitive ability ,improved the scores of English listening test .%网络环境下培养大学生的自主学习能力是大学英语教学改革的一个重要课题。语言习得是一种自主的习得过程,外语学习必须经历习得的过程才能达到运用的目的,外语的习得离不开学生的自主学习。学习者在自主视听学习的过程中,尝试各种学习策略,认知能力得到发展,英语听力考试的成绩得到提高。

  15. The Empirical Study between College Students'English Autonomous Audio-visual Learning under the Network Environment and Cultivating the Ability of Listening Comprehension%网络环境下大学生英语自主视听学习与听力理解能力培养的实证研究

    Institute of Scientific and Technical Information of China (English)

    刘建国

    2013-01-01

    网络环境下培养大学生的自主学习能力是大学英语教学改革的一个重要课题。语言习得是一种自主的习得过程,外语学习必须经历习得的过程才能达到运用的目的,外语的习得离不开学生的自主学习。学习者在自主视听学习的过程中,尝试各种学习策略,认知能力得到发展,英语听力考试的成绩得到提高。%Under the network environment,cultivating college students’autonomous learning ability is an important subject in college English teaching reform.Language acquisition is a kind of in-dependent learning,so language learning must go through the acquisition process in order to achieve the pragmatic purpose,and the foreign language acquisition cannot do without the students'autono-mous learning.In the process of independent audio-visual learning,after trying a variety of learning strategies,learners developed cognitive ability,improved the scores of English listening test.

  16. Coding pitch differences in voiceless fricatives: Whispered relative to normal speech.

    Science.gov (United States)

    Heeren, Willemijn F L

    2015-12-01

    Intonation can be perceived in whispered speech despite the absence of the fundamental frequency. In the past, acoustic correlates of pitch in whisper have been sought in vowel content, but, recently, studies of normal speech demonstrated correlates of intonation in consonants as well. This study examined how consonants may contribute to the coding of intonation in whispered relative to normal speech. The acoustic characteristics of whispered, voiceless fricatives /s/ and /f/, produced at different pitch targets (low, mid, high), were investigated and compared to corresponding normal speech productions to assess if whisper contained secondary or compensatory pitch correlates. Furthermore, listener sensitivity to fricative cues to pitch in whisper was established, also relative to normal speech. Consistent with recent studies, acoustic correlates of whispered and normal speech fricatives systematically varied with pitch target. Comparable findings across speech modes showed that acoustic correlates were secondary. Discrimination of vowel-fricative-vowel stimuli was less accurate and slower in whispered than normal speech, which is attributed to differences in acoustic cues available. Perception of fricatives presented without their vowel contexts, however, revealed comparable processing speeds and response accuracies between speech modes, supporting the finding that within fricatives, acoustic correlates of pitch are similar across speech modes. PMID:26723300

  17. Music and speech prosody: A common rhythm

    Directory of Open Access Journals (Sweden)

    Maija eHausen

    2013-09-01

    Full Text Available Disorders of music and speech perception, known as amusia and aphasia, have traditionally been regarded as dissociated deficits based on studies of brain damaged patients. This has been taken as evidence that music and speech are perceived by largely separate and independent networks in the brain. However, recent studies of congenital amusia have broadened this view by showing that the deficit is associated with problems in perceiving speech prosody, especially intonation and emotional prosody. In the present study the association between the perception of music and speech prosody was investigated with healthy Finnish adults (n = 61 using an on-line music perception test including the Scale subtest of Montreal Battery of Evaluation of Amusia (MBEA and Off-Beat and Out-of-key tasks as well as a prosodic verbal task that measures the perception of word stress. Regression analyses showed that there was a clear association between prosody perception and music perception, especially in the domain of rhythm perception. This association was evident after controlling for music education, age, pitch perception, visuospatial perception and working memory. Pitch perception was significantly associated with music perception but not with prosody perception. The association between music perception and visuospatial perception (measured using analogous tasks was less clear. Overall, the pattern of results indicates that there is a robust link between music and speech perception and that this link can be mediated by rhythmic cues (time and stress.

  18. Music and speech prosody: a common rhythm.

    Science.gov (United States)

    Hausen, Maija; Torppa, Ritva; Salmela, Viljami R; Vainio, Martti; Särkämö, Teppo

    2013-01-01

    Disorders of music and speech perception, known as amusia and aphasia, have traditionally been regarded as dissociated deficits based on studies of brain damaged patients. This has been taken as evidence that music and speech are perceived by largely separate and independent networks in the brain. However, recent studies of congenital amusia have broadened this view by showing that the deficit is associated with problems in perceiving speech prosody, especially intonation and emotional prosody. In the present study the association between the perception of music and speech prosody was investigated with healthy Finnish adults (n = 61) using an on-line music perception test including the Scale subtest of Montreal Battery of Evaluation of Amusia (MBEA) and Off-Beat and Out-of-key tasks as well as a prosodic verbal task that measures the perception of word stress. Regression analyses showed that there was a clear association between prosody perception and music perception, especially in the domain of rhythm perception. This association was evident after controlling for music education, age, pitch perception, visuospatial perception, and working memory. Pitch perception was significantly associated with music perception but not with prosody perception. The association between music perception and visuospatial perception (measured using analogous tasks) was less clear. Overall, the pattern of results indicates that there is a robust link between music and speech perception and that this link can be mediated by rhythmic cues (time and stress). PMID:24032022

  19. Speech Repairs, Intonational Boundaries and Discourse Markers Modeling Speakers' Utterances in Spoken Dialog

    CERN Document Server

    Heeman, P A

    1999-01-01

    In this thesis, we present a statistical language model for resolving speech repairs, intonational boundaries and discourse markers. Rather than finding the best word interpretation for an acoustic signal, we redefine the speech recognition problem to so that it also identifies the POS tags, discourse markers, speech repairs and intonational phrase endings (a major cue in determining utterance units). Adding these extra elements to the speech recognition problem actually allows it to better predict the words involved, since we are able to make use of the predictions of boundary tones, discourse markers and speech repairs to better account for what word will occur next. Furthermore, we can take advantage of acoustic information, such as silence information, which tends to co-occur with speech repairs and intonational phrase endings, that current language models can only regard as noise in the acoustic signal. The output of this language model is a much fuller account of the speaker's turn, with part-of-speech ...

  20. Criteria for public speech planning : characteristics of language learning

    Directory of Open Access Journals (Sweden)

    Tomaž Petek

    2012-12-01

    Full Text Available Public speaking is understood as monological discourse production, directed at a wider or narrower public or group of people. The theoretical part of this article introduces the characteristics of effective public speaking; criteria were designed for the preparation of a public speech, and four main sections defined, i.e. a construction of public speech (consideration of text type characteristics, appropriateness of the topic and selection of content, appropriateness of the mode of topic development, formation of a meaningful, comprehensible and integrated text; b integral mode of public speech (fluent, natural and free speaking, clear diction; c verbal language (social genre, selection of words consistent with the speech, grammatical correctness, correct pronunciation, formal constructions, formal [dynamic] accent, non-verbal language (auditory non-verbal speech cues, visual non-verbal speech cues. The fulfilment of these criteria was tested in practice, namely on second and third year undergraduate students (prospective teachers (N = 211. On the whole, all the average marks of third year students were better than those of the second year students. The most common difficulty facing the students was fluent, natural and free speaking as well as appropriate topic development, whereas the most successfully fulfilled criteria were those of appropriate topic selection and consideration of text type characteristics.

  1. SynFace—Speech-Driven Facial Animation for Virtual Speech-Reading Support

    Directory of Open Access Journals (Sweden)

    Salvi Giampiero

    2009-01-01

    Full Text Available This paper describes SynFace, a supportive technology that aims at enhancing audio-based spoken communication in adverse acoustic conditions by providing the missing visual information in the form of an animated talking head. Firstly, we describe the system architecture, consisting of a 3D animated face model controlled from the speech input by a specifically optimised phonetic recogniser. Secondly, we report on speech intelligibility experiments with focus on multilinguality and robustness to audio quality. The system, already available for Swedish, English, and Flemish, was optimised for German and for Swedish wide-band speech quality available in TV, radio, and Internet communication. Lastly, the paper covers experiments with nonverbal motions driven from the speech signal. It is shown that turn-taking gestures can be used to affect the flow of human-human dialogues. We have focused specifically on two categories of cues that may be extracted from the acoustic signal: prominence/emphasis and interactional cues (turn-taking/back-channelling.

  2. Perception of aircraft Deviation Cues

    Science.gov (United States)

    Martin, Lynne; Azuma, Ronald; Fox, Jason; Verma, Savita; Lozito, Sandra

    2005-01-01

    To begin to address the need for new displays, required by a future airspace concept to support new roles that will be assigned to flight crews, a study of potentially informative display cues was undertaken. Two cues were tested on a simple plan display - aircraft trajectory and flight corridor. Of particular interest was the speed and accuracy with which participants could detect an aircraft deviating outside its flight corridor. Presence of the trajectory cue significantly reduced participant reaction time to a deviation while the flight corridor cue did not. Although non-significant, the flight corridor cue seemed to have a relationship with the accuracy of participants judgments rather than their speed. As this is the second of a series of studies, these issues will be addressed further in future studies.

  3. Frequency band-importance functions for auditory and auditory-visual speech recognition

    Science.gov (United States)

    Grant, Ken W.

    2005-04-01

    In many everyday listening environments, speech communication involves the integration of both acoustic and visual speech cues. This is especially true in noisy and reverberant environments where the speech signal is highly degraded, or when the listener has a hearing impairment. Understanding the mechanisms involved in auditory-visual integration is a primary interest of this work. Of particular interest is whether listeners are able to allocate their attention to various frequency regions of the speech signal differently under auditory-visual conditions and auditory-alone conditions. For auditory speech recognition, the most important frequency regions tend to be around 1500-3000 Hz, corresponding roughly to important acoustic cues for place of articulation. The purpose of this study is to determine the most important frequency region under auditory-visual speech conditions. Frequency band-importance functions for auditory and auditory-visual conditions were obtained by having subjects identify speech tokens under conditions where the speech-to-noise ratio of different parts of the speech spectrum is independently and randomly varied on every trial. Point biserial correlations were computed for each separate spectral region and the normalized correlations are interpreted as weights indicating the importance of each region. Relations among frequency-importance functions for auditory and auditory-visual conditions will be discussed.

  4. Speech recognition: Acoustic phonetic and lexical knowledge representation

    Science.gov (United States)

    Zue, V. W.

    1984-02-01

    The purpose of this program is to develop a speech data base facility under which the acoustic characteristics of speech sounds in various contexts can be studied conveniently; investigate the phonological properties of a large lexicon of, say 10,000 words and determine to what extent the phonotactic constraints can be utilized in speech recognition; study the acoustic cues that are used to mark work boundaries; develop a test bed in the form of a large-vocabulary, IWR system to study the interactions of acoustic, phonetic and lexical knowledge; and develop a limited continuous speech recognition system with the goal of recognizing any English word from its spelling in order to assess the interactions of higher-level knowledge sources.

  5. 无经营许可销售侵权音像复制品行为的刑法适用——以犯罪对象为切人点%No business license for the sale of the infringement audio-visual copies behavior of the Criminal Law applying --the object of crime as the starting point

    Institute of Scientific and Technical Information of China (English)

    王志

    2012-01-01

    实务界和学术界对无经营许可销售侵权音像复制品行为的刑法适用都有不同的认识,主要争议在于销售侵权复制品罪与非法经营罪之间是否存在竞合关系。以及属于何种竞合形态。从司法解释来看,侵权音像复制品是销售侵权复制品罪的犯罪对象,并为非法经营罪所排斥。销售侵权复制品罪与非法经营罪的保护客体为排斥关系。销售侵权复制品罪与非法经营罪不具有竞合关系。该行为只能以销售侵权复制品罪定罪量刑。。%It is different about no business license for the sale of the infringement audio-visual copies behavior of the Criminal Law applying in criminal law theorists and Jude, the main controversy is that the crime of sale infringing copies and crime of illegal business have concurrence, and belong to what kind of concurrence model. Basis judicial interpretation infringing copies is criminal object of the crime of sale infringing copies, isn'tthe crime of illegal business. The relation of the object of protection of the crime of sale infringing copies and the crime of illegal business is exclusion. There isn't overlap of law between criminal object of the crime of sale infringing copies and the crime of illegal business. This behavior can only be convicted and sentenced by the crime of sale infringing copies.

  6. Infants deploy selective attention to the mouth of a talking face when learning speech.

    Science.gov (United States)

    Lewkowicz, David J; Hansen-Tift, Amy M

    2012-01-31

    The mechanisms underlying the acquisition of speech-production ability in human infancy are not well understood. We tracked 4-12-mo-old English-learning infants' and adults' eye gaze while they watched and listened to a female reciting a monologue either in their native (English) or nonnative (Spanish) language. We found that infants shifted their attention from the eyes to the mouth between 4 and 8 mo of age regardless of language and then began a shift back to the eyes at 12 mo in response to native but not nonnative speech. We posit that the first shift enables infants to gain access to redundant audiovisual speech cues that enable them to learn their native speech forms and that the second shift reflects growing native-language expertise that frees them to shift attention to the eyes to gain access to social cues. On this account, 12-mo-old infants do not shift attention to the eyes when exposed to nonnative speech because increasing native-language expertise and perceptual narrowing make it more difficult to process nonnative speech and require them to continue to access redundant audiovisual cues. Overall, the current findings demonstrate that the development of speech production capacity relies on changes in selective audiovisual attention and that this depends critically on early experience. PMID:22307596

  7. a New Structure for Automatic Speech Recognition

    Science.gov (United States)

    Duchnowski, Paul

    Speech is a wideband signal with cues identifying a particular element distributed across frequency. To capture these cues, most ASR systems analyze the speech signal into spectral (or spectrally-derived) components prior to recognition. Traditionally, these components are integrated across frequency to form a vector of "acoustic evidence" on which a decision by the ASR system is based. This thesis develops an alternate approach, post-labeling integration. In this scheme, tentative decisions or labels, of the identity of a given speech element are assigned in parallel by sub -recognizers, each operating on a band-limited portion of the speech waveform. Outputs of these independent channels are subsequently combined (integrated) to render the final decision. Remarkably good recognition of bandlimited nonsense syllables by humans leads to the consideration of this method. It also allows potentially more accurate parameterization of the speech waveform and simultaneously robust estimation of parameter probabilities. The algorithm also represents an attempt to make explicit use of redundancies in speech. Three basic methods of parameterizing the bandlimited input of the sub-recognizers were considered, focusing respectively on LPC and cepstrum coefficients, and parameters based on the autocorrelation function. Four sub-recognizers were implemented as discrete Hidden Markov Model (HMM) systems. Maximum A Posteriori (MAP) hypothesis testing approach was applied to the problem of integrating the individual sub-recognizer decisions on a frame by frame basis. Final segmentation was achieved by a secondary HMM. Five methods of estimating the probabilities necessary for MAP integration were tested. The proposed structure was applied to the task of phonetic, speaker-independent, continuous speech recognition. Performance for several combinations of parameterization schemes and integration methods was measured. The best score of 58.5% on a 39 phone alphabet is roughly

  8. Speech and Language Impairments

    Science.gov (United States)

    ... easily be mistaken for other disabilities such as autism or learning disabilities, so it’s very important to ensure that the child receives a thorough evaluation by a certified speech-language pathologist. Back to top What Causes Speech ...

  9. Speech impairment (adult)

    Science.gov (United States)

    ... impairment; Impairment of speech; Inability to speak; Aphasia; Dysarthria; Slurred speech; Dysphonia voice disorders ... in others the condition does not get better. DYSARTHRIA With dysarthria, the person has ongoing difficulty expressing ...

  10. Speech perception as categorization

    OpenAIRE

    Holt, Lori L.; Lotto, Andrew J.

    2010-01-01

    Speech perception (SP) most commonly refers to the perceptual mapping from the highly variable acoustic speech signal to a linguistic representation, whether it be phonemes, diphones, syllables, or words. This is an example of categorization, in that potentially discriminable speech sounds are assigned to functionally equivalent classes. In this tutorial, we present some of the main challenges to our understanding of the categorization of speech sounds and the conceptualization of SP that has...

  11. Fishing for meaningful units in connected speech

    DEFF Research Database (Denmark)

    Henrichsen, Peter Juel; Christiansen, Thomas Ulrich

    2009-01-01

    In many branches of spoken language analysis including ASR, the set of smallest meaningful units of speech is taken to coincide with the set of phones or phonemes. However, fishing for phones is difficult, error-prone, and computationally expensive. We present an experiment, based on machine...... far lower than for phonemic recognition. Our findings show that it is possible to automatically characterize a linguistic message, without detailed spectral information or presumptions about the target units. Further, fishing for simple meaningful cues and enhancing these selectively would potentially...

  12. Relative cue encoding in the context of sophisticated models of categorization: Separating information from categorization

    Science.gov (United States)

    McMurray, Bob

    2014-01-01

    Traditional studies of human categorization often treat the processes of encoding features and cues as peripheral to the question of how stimuli are categorized. However, in domains where the features and cues are less transparent, how information is encoded prior to categorization may constrain our understanding of the architecture of categorization. This is particularly true in speech perception, where acoustic cues to phonological categories are ambiguous and influenced by multiple factors. Here, it is crucial to consider the joint contributions of the information in the input and the categorization architecture. We contrasted accounts that argue for raw acoustic information encoding with accounts that posit that cues are encoded relative to expectations, and investigated how two categorization architectures—exemplar models and back-propagation parallel distributed processing models—deal with each kind of information. Relative encoding, akin to predictive coding, is a form of noise reduction, so it can be expected to improve model accuracy; however, like predictive coding, the use of relative encoding in speech perception by humans is controversial, so results are compared to patterns of human performance, rather than on the basis of overall accuracy. We found that, for both classes of models, in the vast majority of parameter settings, relative cues greatly helped the models approximate human performance. This suggests that expectation-relative processing is a crucial precursor step in phoneme categorization, and that understanding the information content is essential to understanding categorization processes. PMID:25475048

  13. Relative cue encoding in the context of sophisticated models of categorization: Separating information from categorization.

    Science.gov (United States)

    Apfelbaum, Keith S; McMurray, Bob

    2015-08-01

    Traditional studies of human categorization often treat the processes of encoding features and cues as peripheral to the question of how stimuli are categorized. However, in domains where the features and cues are less transparent, how information is encoded prior to categorization may constrain our understanding of the architecture of categorization. This is particularly true in speech perception, where acoustic cues to phonological categories are ambiguous and influenced by multiple factors. Here, it is crucial to consider the joint contributions of the information in the input and the categorization architecture. We contrasted accounts that argue for raw acoustic information encoding with accounts that posit that cues are encoded relative to expectations, and investigated how two categorization architectures-exemplar models and back-propagation parallel distributed processing models-deal with each kind of information. Relative encoding, akin to predictive coding, is a form of noise reduction, so it can be expected to improve model accuracy; however, like predictive coding, the use of relative encoding in speech perception by humans is controversial, so results are compared to patterns of human performance, rather than on the basis of overall accuracy. We found that, for both classes of models, in the vast majority of parameter settings, relative cues greatly helped the models approximate human performance. This suggests that expectation-relative processing is a crucial precursor step in phoneme categorization, and that understanding the information content is essential to understanding categorization processes. PMID:25475048

  14. Talking Speech Input.

    Science.gov (United States)

    Berliss-Vincent, Jane; Whitford, Gigi

    2002-01-01

    This article presents both the factors involved in successful speech input use and the potential barriers that may suggest that other access technologies could be more appropriate for a given individual. Speech input options that are available are reviewed and strategies for optimizing use of speech recognition technology are discussed. (Contains…

  15. Speech-Language Pathologists

    Science.gov (United States)

    ... INDEX | OOH SITE MAP | EN ESPAÑOL Healthcare > Speech-Language Pathologists PRINTER-FRIENDLY EN ESPAÑOL Summary What They ... workers and occupations. What They Do -> What Speech-Language Pathologists Do About this section Speech-language pathologists ...

  16. The effects of noise vocoding on speech quality perception.

    Science.gov (United States)

    Anderson, Melinda C; Arehart, Kathryn H; Kates, James M

    2014-03-01

    Speech perception depends on access to spectral and temporal acoustic cues. Temporal cues include slowly varying amplitude changes (i.e. temporal envelope, TE) and quickly varying amplitude changes associated with the center frequency of the auditory filter (i.e. temporal fine structure, TFS). This study quantifies the effects of TFS randomization through noise vocoding on the perception of speech quality by parametrically varying the amount of original TFS available above 1500Hz. The two research aims were: 1) to establish the role of TFS in quality perception, and 2) to determine if the role of TFS in quality perception differs between subjects with normal hearing and subjects with sensorineural hearing loss. Ratings were obtained from 20 subjects (10 with normal hearing and 10 with hearing loss) using an 11-point quality scale. Stimuli were processed in three different ways: 1) A 32-channel noise-excited vocoder with random envelope fluctuations in the noise carrier, 2) a 32-channel noise-excited vocoder with the noise-carrier envelope smoothed, and 3) removal of high-frequency bands. Stimuli were presented in quiet and in babble noise at 18dB and 12dB signal-to-noise ratios. TFS randomization had a measurable detrimental effect on quality ratings for speech in quiet and a smaller effect for speech in background babble. Subjects with normal hearing and subjects with sensorineural hearing loss provided similar quality ratings for noise-vocoded speech. PMID:24333929

  17. The timing and effort of lexical access in natural and degraded speech

    Directory of Open Access Journals (Sweden)

    Anita Eva Wagner

    2016-03-01

    Full Text Available Understanding speech is effortless in ideal situations, and although adverse conditions, such as caused by hearing impairment, often render it an effortful task, they do not necessarily suspend speech comprehension. A prime example of this is speech perception by cochlear implant users, whose hearing prostheses transmit speech as a significantly degraded signal. It is yet unknown how mechanisms of speech processing deal with such degraded signals, and whether they are affected by effortful processing of speech.This paper compares the automatic process of lexical competition between natural and degraded speech, and combines gaze fixations, which capture the course of lexical disambiguation, with pupillometry, which quantifies the mental effort involved in processing speech. Listeners’ ocular responses were recorded during disambiguation of lexical embeddings with matching and mismatching durational cues. Durational cues were selected due to their substantial role in listeners' quick limitation of the number of lexical candidates for lexical access in natural speech. Results showed that lexical competition increased effort in processing natural stimuli in particular in presence of mismatching cues. Signal degradation reduced listeners’ ability to quickly integrate durational cues in lexical selection, and delayed and prolonged lexical competition. The effort of processing degraded speech was increased overall, and because it had its sources at the pre-lexical level this effect can be attributed to listening to degraded speech rather than to lexical disambiguation. In sum, the course of lexical competition was largely comparable for natural and degraded speech, but showed crucial shifts in timing, and different sources of increased mental effort. We argue that well-timed progress of information from sensory to pre-lexical and lexical stages of processing, which is the result of perceptual adaptation during speech development, is the reason why

  18. The Timing and Effort of Lexical Access in Natural and Degraded Speech.

    Science.gov (United States)

    Wagner, Anita E; Toffanin, Paolo; Başkent, Deniz

    2016-01-01

    Understanding speech is effortless in ideal situations, and although adverse conditions, such as caused by hearing impairment, often render it an effortful task, they do not necessarily suspend speech comprehension. A prime example of this is speech perception by cochlear implant users, whose hearing prostheses transmit speech as a significantly degraded signal. It is yet unknown how mechanisms of speech processing deal with such degraded signals, and whether they are affected by effortful processing of speech. This paper compares the automatic process of lexical competition between natural and degraded speech, and combines gaze fixations, which capture the course of lexical disambiguation, with pupillometry, which quantifies the mental effort involved in processing speech. Listeners' ocular responses were recorded during disambiguation of lexical embeddings with matching and mismatching durational cues. Durational cues were selected due to their substantial role in listeners' quick limitation of the number of lexical candidates for lexical access in natural speech. Results showed that lexical competition increased mental effort in processing natural stimuli in particular in presence of mismatching cues. Signal degradation reduced listeners' ability to quickly integrate durational cues in lexical selection, and delayed and prolonged lexical competition. The effort of processing degraded speech was increased overall, and because it had its sources at the pre-lexical level this effect can be attributed to listening to degraded speech rather than to lexical disambiguation. In sum, the course of lexical competition was largely comparable for natural and degraded speech, but showed crucial shifts in timing, and different sources of increased mental effort. We argue that well-timed progress of information from sensory to pre-lexical and lexical stages of processing, which is the result of perceptual adaptation during speech development, is the reason why in ideal

  19. Beat synchronization predicts neural speech encoding and reading readiness in preschoolers.

    Science.gov (United States)

    Woodruff Carr, Kali; White-Schwoch, Travis; Tierney, Adam T; Strait, Dana L; Kraus, Nina

    2014-10-01

    Temporal cues are important for discerning word boundaries and syllable segments in speech; their perception facilitates language acquisition and development. Beat synchronization and neural encoding of speech reflect precision in processing temporal cues and have been linked to reading skills. In poor readers, diminished neural precision may contribute to rhythmic and phonological deficits. Here we establish links between beat synchronization and speech processing in children who have not yet begun to read: preschoolers who can entrain to an external beat have more faithful neural encoding of temporal modulations in speech and score higher on tests of early language skills. In summary, we propose precise neural encoding of temporal modulations as a key mechanism underlying reading acquisition. Because beat synchronization abilities emerge at an early age, these findings may inform strategies for early detection of and intervention for language-based learning disabilities. PMID:25246562

  20. Lip movements entrain the observers' low-frequency brain oscillations to facilitate speech intelligibility

    OpenAIRE

    Park, Hyojin; Kayser, Christoph; Thut, Gregor; Gross, Joachim

    2016-01-01

    eLife digest People are able communicate effectively with each other even in very noisy places where it is difficult to actually hear what others are saying. In a face-to-face conversation, people detect and respond to many physical cues – including body posture, facial expressions, head and eye movements and gestures – alongside the sound cues. Lip movements are particularly important and contain enough information to allow trained observers to understand speech even if they cannot hear the ...

  1. Effects of Within-Talker Variability on Speech Intelligibility in Mandarin-Speaking Adult and Pediatric Cochlear Implant Patients

    Science.gov (United States)

    Su, Qiaotong; Galvin, John J.; Zhang, Guoping; Li, Yongxin

    2016-01-01

    Cochlear implant (CI) speech performance is typically evaluated using well-enunciated speech produced at a normal rate by a single talker. CI users often have greater difficulty with variations in speech production encountered in everyday listening. Within a single talker, speaking rate, amplitude, duration, and voice pitch information may be quite variable, depending on the production context. The coarse spectral resolution afforded by the CI limits perception of voice pitch, which is an important cue for speech prosody and for tonal languages such as Mandarin Chinese. In this study, sentence recognition from the Mandarin speech perception database was measured in adult and pediatric Mandarin-speaking CI listeners for a variety of speaking styles: voiced speech produced at slow, normal, and fast speaking rates; whispered speech; voiced emotional speech; and voiced shouted speech. Recognition of Mandarin Hearing in Noise Test sentences was also measured. Results showed that performance was significantly poorer with whispered speech relative to the other speaking styles and that performance was significantly better with slow speech than with fast or emotional speech. Results also showed that adult and pediatric performance was significantly poorer with Mandarin Hearing in Noise Test than with Mandarin speech perception sentences at the normal rate. The results suggest that adult and pediatric Mandarin-speaking CI patients are highly susceptible to whispered speech, due to the lack of lexically important voice pitch cues and perhaps other qualities associated with whispered speech. The results also suggest that test materials may contribute to differences in performance observed between adult and pediatric CI users. PMID:27363714

  2. Using auditory-visual speech to probe the basis of noise-impaired consonant-vowel perception in dyslexia and auditory neuropathy

    Science.gov (United States)

    Ramirez, Joshua; Mann, Virginia

    2005-08-01

    Both dyslexics and auditory neuropathy (AN) subjects show inferior consonant-vowel (CV) perception in noise, relative to controls. To better understand these impairments, natural acoustic speech stimuli that were masked in speech-shaped noise at various intensities were presented to dyslexic, AN, and control subjects either in isolation or accompanied by visual articulatory cues. AN subjects were expected to benefit from the pairing of visual articulatory cues and auditory CV stimuli, provided that their speech perception impairment reflects a relatively peripheral auditory disorder. Assuming that dyslexia reflects a general impairment of speech processing rather than a disorder of audition, dyslexics were not expected to similarly benefit from an introduction of visual articulatory cues. The results revealed an increased effect of noise masking on the perception of isolated acoustic stimuli by both dyslexic and AN subjects. More importantly, dyslexics showed less effective use of visual articulatory cues in identifying masked speech stimuli and lower visual baseline performance relative to AN subjects and controls. Last, a significant positive correlation was found between reading ability and the ameliorating effect of visual articulatory cues on speech perception in noise. These results suggest that some reading impairments may stem from a central deficit of speech processing.

  3. Evaluating audio-visual and computer programs for classroom use.

    Science.gov (United States)

    Van Ort, S

    1989-01-01

    Appropriate faculty decisions regarding adoption of audiovisual and computer programs are critical to the classroom use of these learning materials. The author describes the decision-making process in one college of nursing and the adaptation of an evaluation tool for use by faculty in reviewing audiovisual and computer programs. PMID:2467237

  4. Joint Audio-Visual Tracking Using Particle Filters

    Directory of Open Access Journals (Sweden)

    Dmitry N. Zotkin

    2002-11-01

    Full Text Available It is often advantageous to track objects in a scene using multimodal information when such information is available. We use audio as a complementary modality to video data, which, in comparison to vision, can provide faster localization over a wider field of view. We present a particle-filter based tracking framework for performing multimodal sensor fusion for tracking people in a videoconferencing environment using multiple cameras and multiple microphone arrays. One advantage of our proposed tracker is its ability to seamlessly handle temporary absence of some measurements (e.g., camera occlusion or silence. Another advantage is the possibility of self-calibration of the joint system to compensate for imprecision in the knowledge of array or camera parameters by treating them as containing an unknown statistical component that can be determined using the particle filter framework during tracking. We implement the algorithm in the context of a videoconferencing and meeting recording system. The system also performs high-level semantic analysis of the scene by keeping participant tracks, recognizing turn-taking events and recording an annotated transcript of the meeting. Experimental results are presented. Our system operates in real-time and is shown to be robust and reliable.

  5. Audio-Visual Equipment Depreciation. RDU-75-07.

    Science.gov (United States)

    Drake, Miriam A.; Baker, Martha

    A study was conducted at Purdue University to gather operational and budgetary planning data for the Libraries and Audiovisual Center. The objectives were: (1) to complete a current inventory of equipment including year of purchase, costs, and salvage value; (2) to determine useful life data for general classes of equipment; and (3) to determine…

  6. Automatic Identification used in Audio-Visual indexing and Analysis

    Directory of Open Access Journals (Sweden)

    A. Satish Chowdary

    2011-09-01

    Full Text Available To locate a video clip in large collections is very important for retrieval applications, especially for digital rights management. We attempt to provide a comprehensive and high-level review of audiovisual features that can be extracted from the standard compressed domains, such as MPEG-1 and MPEG-2. This paper presents a graph transformation and matching approach to identify the occurrence of potentially different ordering or length due to content editing. With a novel batch query algorithm to retrieve similar frames, the mapping relationship between the query and database video is first represented by a bipartite graph. The densely matched parts along the long sequence are then extracted, followed by a filter-and-refine search strategy to prune some irrelevant subsequences. During the filtering stage, Maximum Size Matching is deployed for each sub graph constructed by the query and candidate subsequence to obtain a smaller set of candidates. During the refinement stage, Sub-Maximum Similarity Matching is devised to identify the subsequence with the highest aggregate score from all candidates, according to a robust video similarity model that incorporates visual content, temporal order, and frame alignment information. This new algorithm is based on dynamic programming that fully uses the temporal dimension to measure the similarity between two video sequences. A normalized chromaticity histogram is used as a feature which is illumination invariant. Dynamic programming is applied on shot level to find the optimal nonlinear mapping between video sequences. Two new normalized distance measures are presented for video sequence matching. One measure is based on the normalization of the optimal path found by dynamic programming. The other measure combines both the visual features and the temporal information. The proposed distance measures are suitable for variable-length comparisons.

  7. Audio-visual Training for Lip–reading

    DEFF Research Database (Denmark)

    Gebert, Hermann; Bothe, Hans-Heinrich

    2011-01-01

    personalized learning process involving media rich content delivered via wireless networks to mobile devices. The main goal of this book is to provide innovative and creative ideas for improving the quality of learning and to explore all new learning- oriented technologies, devices and networks. The topics of...

  8. Audio-visual interactions in product sound design

    OpenAIRE

    Özcan, E.; Van Egmond, R.

    2010-01-01

    Consistent product experience requires congruity between product properties such as visual appearance and sound. Therefore, for designing appropriate product sounds by manipulating their spectral-temporal structure, product sounds should preferably not be considered in isolation but as an integral part of the main product concept. Because visual aspects of a product are considered to dominate the communication of the desired product concept, sound is usually expected to fit the visual charact...

  9. Uses and Abuses of Audio-Visual Aids in Reading.

    Science.gov (United States)

    Eggers, Edwin H.

    Audiovisual aids are properly used in reading when they "turn students on," and they are abused when they fail to do so or when they actually "turn students off." General guidelines one could use in sorting usable from unusable aids are (1) Has the teacher saved time by using an audiovisual aid? (2) Is the aid appropriate to the sophistication…

  10. A Joint Audio-Visual Approach to Audio Localization

    DEFF Research Database (Denmark)

    Jensen, Jesper Rindom; Christensen, Mads Græsbøll

    2015-01-01

    Localization of audio sources is an important research problem, e.g., to facilitate noise reduction. In the recent years, the problem has been tackled using distributed microphone arrays (DMA). A common approach is to apply direction-of-arrival (DOA) estimation on each array (denoted as nodes), and...... then map the DOA estimates to a location. In practice, however, the individual nodes contain few microphones, limiting the DOA estimation accuracy and, thereby, also the localization performance. We investigate a new approach, where range estimates are also obtained and utilized from each node, e.......g., using time-of-flight cameras. Moreover, we propose an optimal method for weighting such DOA and range information for audio localization. Our experiments on both synthetic and real data show that there is a clear, potential advantage of using the joint audiovisual localization framework....

  11. Real-Time Audio-Visual Analysis for Multiperson Videoconferencing

    Directory of Open Access Journals (Sweden)

    Petr Motlicek

    2013-01-01

    Full Text Available We describe the design of a system consisting of several state-of-the-art real-time audio and video processing components enabling multimodal stream manipulation (e.g., automatic online editing for multiparty videoconferencing applications in open, unconstrained environments. The underlying algorithms are designed to allow multiple people to enter, interact, and leave the observable scene with no constraints. They comprise continuous localisation of audio objects and its application for spatial audio object coding, detection, and tracking of faces, estimation of head poses and visual focus of attention, detection and localisation of verbal and paralinguistic events, and the association and fusion of these different events. Combined all together, they represent multimodal streams with audio objects and semantic video objects and provide semantic information for stream manipulation systems (like a virtual director. Various experiments have been performed to evaluate the performance of the system. The obtained results demonstrate the effectiveness of the proposed design, the various algorithms, and the benefit of fusing different modalities in this scenario.

  12. Audio-visual interactions in product sound design

    NARCIS (Netherlands)

    Özcan, E.; Van Egmond, R.

    2010-01-01

    Consistent product experience requires congruity between product properties such as visual appearance and sound. Therefore, for designing appropriate product sounds by manipulating their spectral-temporal structure, product sounds should preferably not be considered in isolation but as an integral p

  13. Preattentive processing of audio-visual emotional signals

    DEFF Research Database (Denmark)

    Föcker, J.; Gondan, Matthias; Röder, B.

    2011-01-01

    Previous research has shown that redundant information in faces and voices leads to faster emotional categorization compared to incongruent emotional information even when attending to only one modality. The aim of the present study was to test whether these crossmodal effects are predominantly due...... to a response conflict rather than interference at earlier, e.g. perceptual processing stages. In Experiment 1, participants had to categorize the valence and rate the intensity of happy, sad, angry and neutral unimodal or bimodal face-voice stimuli. They were asked to rate either the facial or vocal...... stimuli were more efficiently processed than unimodal visual stimuli. To study the role of a possible response conflict, Experiment 2 used a modified paradigm in which emotional and response conflicts were disentangled. Incongruency effects were significant even in the absence of response conflicts. The...

  14. Discovering Words in Fluent Speech: The Contribution of Two Kinds of Statistical Information

    OpenAIRE

    ErikDThiessen

    2013-01-01

    To efficiently segment fluent speech, infants must discover the predominant phonological form of words in the native language. In English, for example, content words typically begin with a stressed syllable. To discover this regularity, infants need to identify a set of words. We propose that statistical learning plays two roles in this process. First, it provides a cue that allows infants to segment words from fluent speech, even without language-specific phonological knowledge. Second, once...

  15. Digital speech processing using Matlab

    CERN Document Server

    Gopi, E S

    2014-01-01

    Digital Speech Processing Using Matlab deals with digital speech pattern recognition, speech production model, speech feature extraction, and speech compression. The book is written in a manner that is suitable for beginners pursuing basic research in digital speech processing. Matlab illustrations are provided for most topics to enable better understanding of concepts. This book also deals with the basic pattern recognition techniques (illustrated with speech signals using Matlab) such as PCA, LDA, ICA, SVM, HMM, GMM, BPN, and KSOM.

  16. Hypermnesia: the role of multiple retrieval cues.

    Science.gov (United States)

    Otani, H; Widner, R L; Whiteman, H L; St Louis, J P

    1999-09-01

    We demonstrate that encoding multiple cues enhances hypermnesia. College students were presented with 36 (Experiment 1) or 60 (Experiments 2 and 3) sets of words and were asked to encode the sets under single- or multiple-cue conditions. In the single-cue conditions, each set consisted of a cue and a target. In the multiple-cue conditions, each set consisted of three cues and a target. Following the presentation of the word sets, the participants received either three cued recall tests (Experiments 1 and 2) or three free recall tests (Experiment 3). With this manipulation, we observed greater hypermnesia in the multiple-cue conditions than in the single-cue conditions. Furthermore, the greater hypermnesic recall resulted from increased reminiscence rather than reduced intertest forgetting. The present findings support the hypothesis that the availability of multiple retrieval cues plays an important role in hypermnesia. PMID:10540821

  17. Aggression detection in speech using sensor and semantic information

    NARCIS (Netherlands)

    Lefter, I.; Rothkrantz, L.J.M.; Burghouts, G.J.

    2012-01-01

    By analyzing a multimodal (audio-visual) database with aggressive incidents in trains, we have observed that there are no trivial fusion algorithms to successfully predict multimodal aggression based on unimodal sensor inputs. We proposed a fusion framework that contains a set of intermediate level

  18. Indirect Speech Acts

    Institute of Scientific and Technical Information of China (English)

    李威

    2001-01-01

    Indirect speech acts are frequently used in verbal communication, the interpretation of them is of great importance in order to meet the demands of the development of students' communicative competence. This paper, therefore, intends to present Searle' s indirect speech acts and explore the way how indirect speech acts are interpreted in accordance with two influential theories. It consists of four parts. Part one gives a general introduction to the notion of speech acts theory. Part two makes an elaboration upon the conception of indirect speech act theory proposed by Searle and his supplement and development of illocutionary acts. Part three deals with the interpretation of indirect speech acts. Part four draws implication from the previous study and also serves as the conclusion of the dissertation.

  19. Esophageal speeches modified by the Speech Enhancer Program®

    OpenAIRE

    Manochiopinig, Sriwimon; Boonpramuk, Panuthat

    2014-01-01

    Esophageal speech appears to be the first choice of speech treatment for a laryngectomy. However, many laryngectomy people are unable to speak well. The aim of this study was to evaluate post-modified speech quality of Thai esophageal speakers using the Speech Enhancer Program®. The method adopted was to approach five speech–language pathologists to assess the speech accuracy and intelligibility of the words and continuing speech of the seven laryngectomy people. A comparison study was conduc...

  20. Temporal Cortex Activation to Audiovisual Speech in Normal-Hearing and Cochlear Implant Users Measured with Functional Near-Infrared Spectroscopy

    Science.gov (United States)

    van de Rijt, Luuk P. H.; van Opstal, A. John; Mylanus, Emmanuel A. M.; Straatman, Louise V.; Hu, Hai Yin; Snik, Ad F. M.; van Wanrooij, Marc M.

    2016-01-01

    Background: Speech understanding may rely not only on auditory, but also on visual information. Non-invasive functional neuroimaging techniques can expose the neural processes underlying the integration of multisensory processes required for speech understanding in humans. Nevertheless, noise (from functional MRI, fMRI) limits the usefulness in auditory experiments, and electromagnetic artifacts caused by electronic implants worn by subjects can severely distort the scans (EEG, fMRI). Therefore, we assessed audio-visual activation of temporal cortex with a silent, optical neuroimaging technique: functional near-infrared spectroscopy (fNIRS). Methods: We studied temporal cortical activation as represented by concentration changes of oxy- and deoxy-hemoglobin in four, easy-to-apply fNIRS optical channels of 33 normal-hearing adult subjects and five post-lingually deaf cochlear implant (CI) users in response to supra-threshold unisensory auditory and visual, as well as to congruent auditory-visual speech stimuli. Results: Activation effects were not visible from single fNIRS channels. However, by discounting physiological noise through reference channel subtraction (RCS), auditory, visual and audiovisual (AV) speech stimuli evoked concentration changes for all sensory modalities in both cohorts (p < 0.001). Auditory stimulation evoked larger concentration changes than visual stimuli (p < 0.001). A saturation effect was observed for the AV condition. Conclusions: Physiological, systemic noise can be removed from fNIRS signals by RCS. The observed multisensory enhancement of an auditory cortical channel can be plausibly described by a simple addition of the auditory and visual signals with saturation. PMID:26903848

  1. Speech Alarms Pilot Study

    Science.gov (United States)

    Sandor, Aniko; Moses, Haifa

    2016-01-01

    Speech alarms have been used extensively in aviation and included in International Building Codes (IBC) and National Fire Protection Association's (NFPA) Life Safety Code. However, they have not been implemented on space vehicles. Previous studies conducted at NASA JSC showed that speech alarms lead to faster identification and higher accuracy. This research evaluated updated speech and tone alerts in a laboratory environment and in the Human Exploration Research Analog (HERA) in a realistic setup.

  2. Context dependent speech recognition

    OpenAIRE

    Andersson, Sebastian

    2006-01-01

    Poor speech recognition is a problem when developing spoken dialogue systems, but several studies has showed that speech recognition can be improved by post-processing of recognition output that use the dialogue context, acoustic properties of a user utterance and other available resources to train a statistical model to use as a filter between the speech recogniser and dialogue manager. In this thesis a corpus of logged interactions between users and a dialogue system was used...

  3. Speech input and output

    Science.gov (United States)

    Class, F.; Mangold, H.; Stall, D.; Zelinski, R.

    1981-12-01

    Possibilities for acoustical dialogs with electronic data processing equipment were investigated. Speech recognition is posed as recognizing word groups. An economical, multistage classifier for word string segmentation is presented and its reliability in dealing with continuous speech (problems of temporal normalization and context) is discussed. Speech synthesis is considered in terms of German linguistics and phonetics. Preprocessing algorithms for total synthesis of written texts were developed. A macrolanguage, MUSTER, is used to implement this processing in an acoustic data information system (ADES).

  4. Cue weight in the perception of Trique glottal consonants.

    Science.gov (United States)

    DiCanio, Christian

    2014-02-01

    This paper examines the perceptual weight of cues to the coda glottal consonant contrast in Trique (Oto-Manguean) with native listeners. The language contrasts words with no coda (/Vː/) from words with a coda glottal stop (/VɁ/) or breathy coda (/Vɦ/). The results from a speeded AX (same-different) lexical discrimination task show high accuracy in lexical identification for the /Vː/-/Vɦ/ contrast, but lower accuracy for the other contrasts. The second experiment consists of a labeling task where the three acoustic dimensions that distinguished the glottal consonant codas in production [duration, the amplitude difference between the first two harmonics (H1-H2), and F0] were modified orthogonally using step-wise resynthesis. This task determines the relative weight of each dimension in phonological categorization. The results show that duration was the strongest cue. Listeners were only sensitive to changes in H1-H2 for the /Vː/-/Vɦ/ and /Vː/-/VɁ/ contrasts when duration was ambiguous. Listeners were only sensitive to changes in F0 for the /Vː/-/Vɦ/ contrast when both duration and H1-H2 were ambiguous. The perceptual cue weighting for each contrast closely matches existing production data [DiCanio (2012 a). J. Phon. 40, 162-176] Cue weight differences in speech perception are explained by differences in step-interval size and the notion of adaptive plasticity [Francis et al. (2008). J. Acoust. Soc. Am. 124, 1234-1251; Holt and Lotto (2006). J. Acoust. Soc. Am. 119, 3059-3071]. PMID:25234896

  5. Increased pain intensity is associated with greater verbal communication difficulty and increased production of speech and co-speech gestures.

    Directory of Open Access Journals (Sweden)

    Samantha Rowbotham

    Full Text Available Effective pain communication is essential if adequate treatment and support are to be provided. Pain communication is often multimodal, with sufferers utilising speech, nonverbal behaviours (such as facial expressions, and co-speech gestures (bodily movements, primarily of the hands and arms that accompany speech and can convey semantic information to communicate their experience. Research suggests that the production of nonverbal pain behaviours is positively associated with pain intensity, but it is not known whether this is also the case for speech and co-speech gestures. The present study explored whether increased pain intensity is associated with greater speech and gesture production during face-to-face communication about acute, experimental pain. Participants (N = 26 were exposed to experimentally elicited pressure pain to the fingernail bed at high and low intensities and took part in video-recorded semi-structured interviews. Despite rating more intense pain as more difficult to communicate (t(25  = 2.21, p =  .037, participants produced significantly longer verbal pain descriptions and more co-speech gestures in the high intensity pain condition (Words: t(25  = 3.57, p  = .001; Gestures: t(25  = 3.66, p =  .001. This suggests that spoken and gestural communication about pain is enhanced when pain is more intense. Thus, in addition to conveying detailed semantic information about pain, speech and co-speech gestures may provide a cue to pain intensity, with implications for the treatment and support received by pain sufferers. Future work should consider whether these findings are applicable within the context of clinical interactions about pain.

  6. Increased pain intensity is associated with greater verbal communication difficulty and increased production of speech and co-speech gestures.

    Science.gov (United States)

    Rowbotham, Samantha; Wardy, April J; Lloyd, Donna M; Wearden, Alison; Holler, Judith

    2014-01-01

    Effective pain communication is essential if adequate treatment and support are to be provided. Pain communication is often multimodal, with sufferers utilising speech, nonverbal behaviours (such as facial expressions), and co-speech gestures (bodily movements, primarily of the hands and arms that accompany speech and can convey semantic information) to communicate their experience. Research suggests that the production of nonverbal pain behaviours is positively associated with pain intensity, but it is not known whether this is also the case for speech and co-speech gestures. The present study explored whether increased pain intensity is associated with greater speech and gesture production during face-to-face communication about acute, experimental pain. Participants (N = 26) were exposed to experimentally elicited pressure pain to the fingernail bed at high and low intensities and took part in video-recorded semi-structured interviews. Despite rating more intense pain as more difficult to communicate (t(25)  = 2.21, p =  .037), participants produced significantly longer verbal pain descriptions and more co-speech gestures in the high intensity pain condition (Words: t(25)  = 3.57, p  = .001; Gestures: t(25)  = 3.66, p =  .001). This suggests that spoken and gestural communication about pain is enhanced when pain is more intense. Thus, in addition to conveying detailed semantic information about pain, speech and co-speech gestures may provide a cue to pain intensity, with implications for the treatment and support received by pain sufferers. Future work should consider whether these findings are applicable within the context of clinical interactions about pain. PMID:25343486

  7. Principles of speech coding

    CERN Document Server

    Ogunfunmi, Tokunbo

    2010-01-01

    It is becoming increasingly apparent that all forms of communication-including voice-will be transmitted through packet-switched networks based on the Internet Protocol (IP). Therefore, the design of modern devices that rely on speech interfaces, such as cell phones and PDAs, requires a complete and up-to-date understanding of the basics of speech coding. Outlines key signal processing algorithms used to mitigate impairments to speech quality in VoIP networksOffering a detailed yet easily accessible introduction to the field, Principles of Speech Coding provides an in-depth examination of the

  8. Advances in Speech Recognition

    CERN Document Server

    Neustein, Amy

    2010-01-01

    This volume is comprised of contributions from eminent leaders in the speech industry, and presents a comprehensive and in depth analysis of the progress of speech technology in the topical areas of mobile settings, healthcare and call centers. The material addresses the technical aspects of voice technology within the framework of societal needs, such as the use of speech recognition software to produce up-to-date electronic health records, not withstanding patients making changes to health plans and physicians. Included will be discussion of speech engineering, linguistics, human factors ana

  9. Ear, Hearing and Speech

    DEFF Research Database (Denmark)

    Poulsen, Torben

    2000-01-01

    An introduction is given to the the anatomy and the function of the ear, basic psychoacoustic matters (hearing threshold, loudness, masking), the speech signal and speech intelligibility. The lecture note is written for the course: Fundamentals of Acoustics and Noise Control (51001)......An introduction is given to the the anatomy and the function of the ear, basic psychoacoustic matters (hearing threshold, loudness, masking), the speech signal and speech intelligibility. The lecture note is written for the course: Fundamentals of Acoustics and Noise Control (51001)...

  10. Advances in speech processing

    Science.gov (United States)

    Ince, A. Nejat

    1992-10-01

    The field of speech processing is undergoing a rapid growth in terms of both performance and applications and this is fueled by the advances being made in the areas of microelectronics, computation, and algorithm design. The use of voice for civil and military communications is discussed considering advantages and disadvantages including the effects of environmental factors such as acoustic and electrical noise and interference and propagation. The structure of the existing NATO communications network and the evolving Integrated Services Digital Network (ISDN) concept are briefly reviewed to show how they meet the present and future requirements. The paper then deals with the fundamental subject of speech coding and compression. Recent advances in techniques and algorithms for speech coding now permit high quality voice reproduction at remarkably low bit rates. The subject of speech synthesis is next treated where the principle objective is to produce natural quality synthetic speech from unrestricted text input. Speech recognition where the ultimate objective is to produce a machine which would understand conversational speech with unrestricted vocabulary, from essentially any talker, is discussed. Algorithms for speech recognition can be characterized broadly as pattern recognition approaches and acoustic phonetic approaches. To date, the greatest degree of success in speech recognition has been obtained using pattern recognition paradigms. It is for this reason that the paper is concerned primarily with this technique.

  11. [The voice and speech].

    Science.gov (United States)

    Pesák, J; Honová, J; Majtner, J; Vojtĕchovský, K

    1998-01-01

    Biophysics is the science comprising the sum of biophysical disciplines describing living systems. It also includes the biophysics of voice and speech. The latter deals with physiological acoustics, phonetics, phoniatry as well as logopaedics. In connection with the problems of voice and speech, including also their teaching problems, a common language is often being sought for appropriate to all the interested scientific branches. As a result of our efforts aimed at removing the existing barriers we have tried to set up a University Society for the Study of Voice and Speech. One of its first activities was also, besides other events, the realization of a videofilm On voice and speech. PMID:10803289

  12. Evaluation of multimodal ground cues

    DEFF Research Database (Denmark)

    Nordahl, Rolf; Lecuyer, Anatole; Serafin, Stefania; Turchet, Luca; Papetti, Stefano; Fontana, Federico; Visell, Yon

    This chapter presents an array of results on the perception of ground surfaces via multiple sensory modalities,with special attention to non visual perceptual cues, notably those arising from audition and haptics, as well as interactions between them. It also reviews approaches to combining...

  13. Zebra finches can use positional and transitional cues to distinguish vocal element strings.

    Science.gov (United States)

    Chen, Jiani; Ten Cate, Carel

    2015-08-01

    Learning sequences is of great importance to humans and non-human animals. Many motor and mental actions, such as singing in birds and speech processing in humans, rely on sequential learning. At least two mechanisms are considered to be involved in such learning. The chaining theory proposes that learning of sequences relies on memorizing the transitions between adjacent items, while the positional theory suggests that learners encode the items according to their ordinal position in the sequence. Positional learning is assumed to dominate sequential learning. However, human infants exposed to a string of speech sounds can learn transitional (chaining) cues. So far, it is not clear whether birds, an increasingly important model for examining vocal processing, can do this. In this study we use a Go-Nogo design to examine whether zebra finches can use transitional cues to distinguish artificially constructed strings of song elements. Zebra finches were trained with sequences differing in transitional and positional information and next tested with novel strings sharing positional and transitional similarities with the training strings. The results show that they can attend to both transitional and positional cues and that their sequential coding strategies can be biased toward transitional cues depending on the learning context. This article is part of a Special Issue entitled: In Honor of Jerry Hogan. PMID:25217867

  14. Central Auditory Processing of Temporal and Spectral-Variance Cues in Cochlear Implant Listeners.

    Directory of Open Access Journals (Sweden)

    Carol Q Pham

    Full Text Available Cochlear implant (CI listeners have difficulty understanding speech in complex listening environments. This deficit is thought to be largely due to peripheral encoding problems arising from current spread, which results in wide peripheral filters. In normal hearing (NH listeners, central processing contributes to segregation of speech from competing sounds. We tested the hypothesis that basic central processing abilities are retained in post-lingually deaf CI listeners, but processing is hampered by degraded input from the periphery. In eight CI listeners, we measured auditory nerve compound action potentials to characterize peripheral filters. Then, we measured psychophysical detection thresholds in the presence of multi-electrode maskers placed either inside (peripheral masking or outside (central masking the peripheral filter. This was intended to distinguish peripheral from central contributions to signal detection. Introduction of temporal asynchrony between the signal and masker improved signal detection in both peripheral and central masking conditions for all CI listeners. Randomly varying components of the masker created spectral-variance cues, which seemed to benefit only two out of eight CI listeners. Contrastingly, the spectral-variance cues improved signal detection in all five NH listeners who listened to our CI simulation. Together these results indicate that widened peripheral filters significantly hamper central processing of spectral-variance cues but not of temporal cues in post-lingually deaf CI listeners. As indicated by two CI listeners in our study, however, post-lingually deaf CI listeners may retain some central processing abilities similar to NH listeners.

  15. Do We Perceive Others Better than Ourselves? A Perceptual Benefit for Noise-Vocoded Speech Produced by an Average Speaker.

    Directory of Open Access Journals (Sweden)

    William L Schuerman

    Full Text Available In different tasks involving action perception, performance has been found to be facilitated when the presented stimuli were produced by the participants themselves rather than by another participant. These results suggest that the same mental representations are accessed during both production and perception. However, with regard to spoken word perception, evidence also suggests that listeners' representations for speech reflect the input from their surrounding linguistic community rather than their own idiosyncratic productions. Furthermore, speech perception is heavily influenced by indexical cues that may lead listeners to frame their interpretations of incoming speech signals with regard to speaker identity. In order to determine whether word recognition evinces similar self-advantages as found in action perception, it was necessary to eliminate indexical cues from the speech signal. We therefore asked participants to identify noise-vocoded versions of Dutch words that were based on either their own recordings or those of a statistically average speaker. The majority of participants were more accurate for the average speaker than for themselves, even after taking into account differences in intelligibility. These results suggest that the speech representations accessed during perception of noise-vocoded speech are more reflective of the input of the speech community, and hence that speech perception is not necessarily based on representations of one's own speech.

  16. Processing of audio and visual speech for telecommunication systems

    Science.gov (United States)

    Shah, Druti; Marshall, Stephen

    1999-07-01

    Most verbal communications use cues from both the visual and acoustic modalities to convey messages. During the production of speech, the visible information provided by the external articulatory organs can influence the understanding of the language, by interpreting the combined information into meaningful linguistic expressions. The task of integrating speech and image data to emulate the bimodal human interaction system c an be depicted by developing automated systems. These systems have a wide range of applications such as the videophone systems, where the interdependencies between image and speech signals can be exploited for data compression and in solving the task of lip synchronization which has been a major problem. Therefore the objective of this work is to investigate and quantify this relationship such that the knowledge gained will assist in longer term multimedia and videophone research.

  17. Speech-Language Therapy (For Parents)

    Science.gov (United States)

    ... 5 Things to Know About Zika & Pregnancy Speech-Language Therapy KidsHealth > For Parents > Speech-Language Therapy Print ... with speech and/or language disorders. Speech Disorders, Language Disorders, and Feeding Disorders A speech disorder refers ...

  18. Time-expanded speech and speech recognition in older adults.

    Science.gov (United States)

    Vaughan, Nancy E; Furukawa, Izumi; Balasingam, Nirmala; Mortz, Margaret; Fausti, Stephen A

    2002-01-01

    Speech understanding deficits are common in older adults. In addition to hearing sensitivity, changes in certain cognitive functions may affect speech recognition. One such change that may impact the ability to follow a rapidly changing speech signal is processing speed. When speakers slow the rate of their speech naturally in order to speak clearly, speech recognition is improved. The acoustic characteristics of naturally slowed speech are of interest in developing time-expansion algorithms to improve speech recognition for older listeners. In this study, we tested younger normally hearing, older normally hearing, and older hearing-impaired listeners on time-expanded speech using increased duration and increased intensity of unvoiced consonants. Although all groups performed best on unprocessed speech, performance with processed speech was better with the consonant gain feature without time expansion in the noise condition and better at the slowest time-expanded rate in the quiet condition. The effects of signal processing on speech recognition are discussed. PMID:17642020

  19. Cues for localization in the horizontal plane

    DEFF Research Database (Denmark)

    Jeppesen, Jakob; Møller, Henrik

    2005-01-01

    manipulated in HRTFs used for binaural synthesis of sound in the horizontal plane. The manipulation of cues resulted in HRTFs with cues ranging from correct combinations of spectral information and ITDs to combinations with severely conflicting cues. Both the ITD and the spectral information seem to be...

  20. Fragrances as Cues for Remembering Words

    Science.gov (United States)

    Eich, James Eric

    1978-01-01

    Results of this experiment suggest that specific encoding of a word is not a necessary condition for cue effectiveness. Results imply that the effect of a nominal fragrance cue arises through the mediation of a functional, implicitly generated semantic cue. (Author/SW)

  1. Cue salience influences the use of height cues in reorientation in pigeons (Columba livia).

    Science.gov (United States)

    Du, Yu; Mahdi, Nuha; Paul, Breanne; Spetch, Marcia L

    2016-07-01

    Although orienting ability has been examined with numerous types of cues, most research has focused only on cues from the horizontal plane. The current study investigated pigeons' use of wall height, a vertical cue, in an open-field task and compared it with their use of horizontal cues. Pigeons were trained to locate food in 2 diagonal corners of a rectangular enclosure with 2 opposite high walls as height cues. Before each trial, pigeons were rotated to disorient them. In training, pigeons could use either the horizontal cues from the rectangular enclosure or the height information from the walls to locate the food. In testing, the apparatus was modified to provide (a) horizontal cues only, (b) height cues only, and (c) both height and horizontal cues in conflict. In Experiment 1 the lower and high walls, respectively, were 40 and 80 cm, whereas in Experiment 2 they were made more perceptually salient by shortening them to 20 and 40 cm. Pigeons accurately located the goal corners with horizontal cues alone in both experiments, but they searched accurately with height cues alone only in Experiment 2. When the height cues conflicted with horizontal cues, pigeons preferred the horizontal cues over the height cues in Experiment 1 but not in Experiment 2, suggesting that perceptual salience influences the relative weighting of cues. (PsycINFO Database Record PMID:27379717

  2. The temporal binding window for audiovisual speech: Children are like little adults.

    Science.gov (United States)

    Hillock-Dunn, Andrea; Grantham, D Wesley; Wallace, Mark T

    2016-07-29

    During a typical communication exchange, both auditory and visual cues contribute to speech comprehension. The influence of vision on speech perception can be measured behaviorally using a task where incongruent auditory and visual speech stimuli are paired to induce perception of a novel token reflective of multisensory integration (i.e., the McGurk effect). This effect is temporally constrained in adults, with illusion perception decreasing as the temporal offset between the auditory and visual stimuli increases. Here, we used the McGurk effect to investigate the development of the temporal characteristics of audiovisual speech binding in 7-24 year-olds. Surprisingly, results indicated that although older participants perceived the McGurk illusion more frequently, no age-dependent change in the temporal boundaries of audiovisual speech binding was observed. PMID:26920938

  3. The Exploitation of Subphonemic Acoustic Detail in L2 Speech Segmentation

    Science.gov (United States)

    Shoemaker, Ellenor

    2014-01-01

    The current study addresses an aspect of second language (L2) phonological acquisition that has received little attention to date--namely, the acquisition of allophonic variation as a word boundary cue. The role of subphonemic variation in the segmentation of speech by native speakers has been indisputably demonstrated; however, the acquisition of…

  4. Effects of First and Second Language on Segmentation of Non-Native Speech

    Science.gov (United States)

    Hanulikova, Adriana; Mitterer, Holger; McQueen, James M.

    2011-01-01

    Do Slovak-German bilinguals apply native Slovak phonological and lexical knowledge when segmenting German speech? When Slovaks listen to their native language, segmentation is impaired when fixed-stress cues are absent (Hanulikova, McQueen & Mitterer, 2010), and, following the Possible-Word Constraint (PWC; Norris, McQueen, Cutler & Butterfield,…

  5. Children's Responses to Computer-Synthesized Speech in Educational Media: Gender Consistency and Gender Similarity Effects

    Science.gov (United States)

    Lee, Kwan Min; Liao, Katharine; Ryu, Seoungho

    2007-01-01

    This study examines children's social responses to gender cues in synthesized speech in a computer-based instruction setting. Eighty 5th-grade elementary school children were randomly assigned to one of the conditions in a full-factorial 2 (participant gender) x 2 (voice gender) x 2 (content gender) experiment. Results show that children apply…

  6. Speech Compression for Noise-Corrupted Thai Expressive Speech

    Directory of Open Access Journals (Sweden)

    Suphattharachai Chomphan

    2011-01-01

    Full Text Available Problem statement: In speech communication, speech coding aims at preserving the speech quality with lower coding bitrate. When considering the communication environment, various types of noises deteriorates the speech quality. The expressive speech with different speaking styles may cause different speech quality with the same coding method. Approach: This research proposed a study of speech compression for noise-corrupted Thai expressive speech by using two coding methods of CS-ACELP and MP-CELP. The speech material included a hundredmale speech utterances and a hundred female speech utterances. Four speaking styles included enjoyable, sad, angry and reading styles. Five sentences of Thai speech were chosen. Three types of noises were included (train, car and air conditioner. Five levels of each type of noise were varied from 0-20 dB. The subjective test of mean opinion score was exploited in the evaluation process. Results: The experimental results showed that CS-ACELP gave the better speech quality than that of MP-CELP at all three bitrates of 6000, 8600-12600 bps. When considering the levels of noise, the 20-dB noise gave the best speech quality, while 0-dB noise gave the worst speech quality. When considering the speech gender, female speech gave the better results than that of male speech. When considering the types of noise, the air-conditioner noise gave the best speech quality, while the train noise gave the worst speech quality. Conclusion: From the study, it can be seen that coding methods, types of noise, levels of noise, speech gender influence on the coding speech quality.

  7. Improving Alaryngeal Speech Intelligibility.

    Science.gov (United States)

    Christensen, John M.; Dwyer, Patricia E.

    1990-01-01

    Laryngectomized patients using esophageal speech or an electronic artificial larynx have difficulty producing correct voicing contrasts between homorganic consonants. This paper describes a therapy technique that emphasizes "pushing harder" on voiceless consonants to improve alaryngeal speech intelligibility and proposes focusing on the production…

  8. Speech Situations and TEFL

    Institute of Scientific and Technical Information of China (English)

    吴树奇; 高建国

    2008-01-01

    This paper deals with how speech situations or ratherspeech implicatures affect TEFL.As far as the writer is concerned,they have much influence on many aspect of language teaching.To illustrate this point explicitly,the writer focuses on the influence of speech situations upon pronunciation,intonation,lexical meanings,sentence comprehension and the grammatical study of the English language.

  9. Speech and Language Delay

    Science.gov (United States)

    ... child depends on the cause of the speech delay. Your doctor will tell you the cause of your child's problem and explain any treatments that might fix the problem or make it better. A speech and language pathologist might be helpful in making treatment plans. This ...

  10. Private Speech in Ballet

    Science.gov (United States)

    Johnston, Dale

    2006-01-01

    Authoritarian teaching practices in ballet inhibit the use of private speech. This paper highlights the critical importance of private speech in the cognitive development of young ballet students, within what is largely a non-verbal art form. It draws upon research by Russian psychologist Lev Vygotsky and contemporary socioculturalists, to…

  11. Speech processing standards

    Science.gov (United States)

    Ince, A. Nejat

    1990-05-01

    Speech processing standards are given for 64, 32, 16 kb/s and lower rate speech and more generally, speech-band signals which are or will be promulgated by CCITT and NATO. The International Telegraph and Telephone Consultative Committee (CCITT) of the International body which deals, among other things, with speech processing within the context of ISDN. Within NATO there are also bodies promulgating standards which make interoperability, possible without complex and expensive interfaces. Some of the applications for low-bit rate voice and the related work undertaken by CCITT Study Groups which are responsible for developing standards in terms of encoding algorithms, codec design objectives as well as standards on the assessment of speech quality, are highlighted.

  12. Visual cues for data mining

    Science.gov (United States)

    Rogowitz, Bernice E.; Rabenhorst, David A.; Gerth, John A.; Kalin, Edward B.

    1996-04-01

    This paper describes a set of visual techniques, based on principles of human perception and cognition, which can help users analyze and develop intuitions about tabular data. Collections of tabular data are widely available, including, for example, multivariate time series data, customer satisfaction data, stock market performance data, multivariate profiles of companies and individuals, and scientific measurements. In our approach, we show how visual cues can help users perform a number of data mining tasks, including identifying correlations and interaction effects, finding clusters and understanding the semantics of cluster membership, identifying anomalies and outliers, and discovering multivariate relationships among variables. These cues are derived from psychological studies on perceptual organization, visual search, perceptual scaling, and color perception. These visual techniques are presented as a complement to the statistical and algorithmic methods more commonly associated with these tasks, and provide an interactive interface for the human analyst.

  13. Application and design of audio-visual aids stomatology teaching in orthodontic non-stomatology students%非口腔医学专业医学生口腔正畸学教学中“口腔直观教学法”的设计与应用

    Institute of Scientific and Technical Information of China (English)

    李若萱; 吕亚林; 王晓庚

    2012-01-01

    Objective This study is to discuss the effects of audio- visual aids stomatology teaching in undergraduate orthodontic training for students majoring in preventive medicine in two credit hours.Methods We selected 85 students from the 2007 and 2008 matriculating classes of the preventive medicine department of Capital Medical University.Using the eight-year orthodontic textbook as our reference,we taught the theory through the multimedia pathway in the first class hour,and implemented teaching by playing situation in the trainee class hour.A follow-up survey was carried out to obtain students' feedback on the combined teaching method.Results Our survey showed that the majority of students realized the goal of using the method and believed their interest in learning orthodontics was significantly enhanced.In fact,they became fascinated by orthodontics in the limited time of the study.Conclusions We concluded that the integration of object teaching combination with situational teaching is of great assistance to orthodontic training; however,the integration must be carefully prepared to ensure student participation,maximize the benefits of integration and improve the course from direct feedback.%目的 在2学时的非口腔医学专业本科学生口腔正畸学教学中设计并实施“口腔直观教学法”,并评价其教学效果.方法 以首都医科大学2007级和2008级预防医学专业85名学生作为研究对象,以八年制口腔正畸学教科书为教材,1学时理论教学采用多媒体形式,1学时见习教学采用情景扮演方式.教学结束后,采用理论考核和问卷调查方式评价教学效果,分析学生对“口腔直观教学法”的反馈评价.结果 学生对口腔正畸学理论知识掌握较好,大部分学生能够明确教学目的.学生认为“口腔直观教学法”增强了对学习口腔正畸学的兴趣,在极其有限的时间内,对口腔正畸学留下了深刻印象.结论 “口腔直观教学法”适合

  14. Multi-Cue Pedestrian Recognition

    OpenAIRE

    Munder, Stefan

    2007-01-01

    This thesis addresses the problem of detecting complex, deformable objects in an arbitrary, cluttered environment in sequences of video images. Often, no single best technique exists for such a challenging problem, as different approaches possess different characteristics with regard to detection accuracy, processing speed, or the kind of errors made. Therefore, multi-cue approaches are pursued in this thesis. By combining multiple detection methods, each utilizing a different aspect of the v...

  15. Discovering words in fluent speech: the contribution of two kinds of statistical information.

    Science.gov (United States)

    Thiessen, Erik D; Erickson, Lucy C

    2012-01-01

    To efficiently segment fluent speech, infants must discover the predominant phonological form of words in the native language. In English, for example, content words typically begin with a stressed syllable. To discover this regularity, infants need to identify a set of words. We propose that statistical learning plays two roles in this process. First, it provides a cue that allows infants to segment words from fluent speech, even without language-specific phonological knowledge. Second, once infants have identified a set of lexical forms, they can learn from the distribution of acoustic features across those word forms. The current experiments demonstrate both processes are available to 5-month-old infants. This demonstration of sensitivity to statistical structure in speech, weighted more heavily than phonological cues to segmentation at an early age, is consistent with theoretical accounts that claim statistical learning plays a role in helping infants to adapt to the structure of their native language from very early in life. PMID:23335903

  16. Listeners' attitudes: speech supplementation strategies for improving effectiveness of speakers with mixed dysarthria as a result of motor neuron disease.

    Science.gov (United States)

    Toy, Natalie; Joubert, Karin

    2008-01-01

    This study examined unfamiliar and familiar listener attitudes towards the use of combined alphabet-topic cues and a control condition (habitual speech with no cues) associated with the speech of three individuals with severe mixed dysarthria. Two listener groups (N = 36) were shown experimentally imposed visual images of the combined alphabet-topic cue strategy in conjunction with recorded auditory presentations with the habitual speech of three individuals with mixed dysarthria. Using a 7-point Likert scale, listeners were asked to rate how effective they thought the speakers communicated; how comfortable they were communicating with the speakers; and how persistent they were in trying to understand the speakers. The results revealed that there were no significant differences in the attitude ratings of familiar listeners as compared to unfamiliar listeners. However, results revealed that rating of communicative effectiveness, comfort communicating with speakers and listener persistence were each more favourable when using the combined cue condition than purely habitual speech.The results suggest that augmentative and alternative communication strategies providing frequent and specific cues regarding the content and constituent words of a message may enhance the attitudes of listeners. PMID:19485070

  17. Speech Acts In President Barack Obama Victory Speech 2012

    OpenAIRE

    Januarini, Erna

    2016-01-01

    In the thesis, entitled Speech Acts In President Barack Obama's Victory Speech 2012. The author analyzes the illocutionary acts and direct and indirect speech acts and by Barack Obama as a speaker based on representative, directive, expressive, commissive, and declaration. The purpose of this thesis is to find the types of illocutionary acts and direct and indirect speech acts and in Barack Obama's victory speech 2012. In writing this thesis, the author uses a qualitative method from Huberman...

  18. 78 FR 49693 - Speech-to-Speech and Internet Protocol (IP) Speech-to-Speech Telecommunications Relay Services...

    Science.gov (United States)

    2013-08-15

    ... Rulemaking, published at 73 FR 47120, August 13, 2008 (2008 STS NPRM). The Commission sought comment on... Abbreviated Dialing Arrangements, CC Docket No. 92-105, Report and Order, published at 65 FR 54799, September... COMMISSION 47 CFR Part 64 Speech-to-Speech and Internet Protocol (IP) Speech-to-Speech...

  19. 78 FR 49717 - Speech-to-Speech and Internet Protocol (IP) Speech-to-Speech Telecommunications Relay Services...

    Science.gov (United States)

    2013-08-15

    ..., Report and Order and Further Notice of Proposed Rulemaking, published at 77 FR 25609, May 1, 2012 (VRS... Nos. 03-123 and 08-15, Notice of Proposed Rulemaking, published at 73 FR 47120, August 13, 2008 (2008... COMMISSION 47 CFR Part 64 Speech-to-Speech and Internet Protocol (IP) Speech-to-Speech...

  20. Going to a Speech Therapist

    Science.gov (United States)

    ... What's in this article? What Do Speech Therapists Help With? Who Needs Speech Therapy? What's It Like? How Long Will Treatment Last? Some kids have trouble saying certain sounds or words. This can be frustrating ... speech therapists (also called speech-language pathologists ). What ...

  1. Survey On Speech Synthesis

    Directory of Open Access Journals (Sweden)

    A. Indumathi

    2012-12-01

    Full Text Available The primary goal of this paper is to provide an overview of existing Text-To-Speech (TTS Techniques by highlighting its usage and advantage. First Generation Techniques includes Formant Synthesis and Articulatory Synthesis. Formant Synthesis works by using individually controllable formant filters, which can be set to produce accurate estimations of the vocal-track transfer function. Articulatory Synthesis produces speech by direct modeling of Human articulator behavior. Second Generation Techniques incorporates Concatenative synthesis and Sinusoidal synthesis. Concatenative synthesis generates speech output by concatenating the segments of recorded speech. Generally, Concatenative synthesis generates the natural sounding synthesized speech. Sinusoidal Synthesis use a harmonic model and decompose each frame into a set of harmonics of an estimated fundamental frequency. The model parameters are the amplitudes and periods of the harmonics. With these, the value of the fundamental can be changed while keeping the same basic spectral..In adding, Third Generation includes Hidden Markov Model (HMM and Unit Selection Synthesis.HMM trains the parameter module and produce high quality Speech. Finally, Unit Selection operates by selecting the best sequence of units from a large speech database which matches the specification.

  2. An Exploration of Rhythmic Grouping of Speech Sequences by French- and German-Learning Infants.

    Science.gov (United States)

    Abboub, Nawal; Boll-Avetisyan, Natalie; Bhatara, Anjali; Höhle, Barbara; Nazzi, Thierry

    2016-01-01

    Rhythm in music and speech can be characterized by a constellation of several acoustic cues. Individually, these cues have different effects on rhythmic perception: sequences of sounds alternating in duration are perceived as short-long pairs (weak-strong/iambic pattern), whereas sequences of sounds alternating in intensity or pitch are perceived as loud-soft, or high-low pairs (strong-weak/trochaic pattern). This perceptual bias-called the Iambic-Trochaic Law (ITL)-has been claimed to be an universal property of the auditory system applying in both the music and the language domains. Recent studies have shown that language experience can modulate the effects of the ITL on rhythmic perception of both speech and non-speech sequences in adults, and of non-speech sequences in 7.5-month-old infants. The goal of the present study was to explore whether language experience also modulates infants' grouping of speech. To do so, we presented sequences of syllables to monolingual French- and German-learning 7.5-month-olds. Using the Headturn Preference Procedure (HPP), we examined whether they were able to perceive a rhythmic structure in sequences of syllables that alternated in duration, pitch, or intensity. Our findings show that both French- and German-learning infants perceived a rhythmic structure when it was cued by duration or pitch but not intensity. Our findings also show differences in how these infants use duration and pitch cues to group syllable sequences, suggesting that pitch cues were the easier ones to use. Moreover, performance did not differ across languages, failing to reveal early language effects on rhythmic perception. These results contribute to our understanding of the origin of rhythmic perception and perceptual mechanisms shared across music and speech, which may bootstrap language acquisition. PMID:27378887

  3. Global Freedom of Speech

    DEFF Research Database (Denmark)

    Binderup, Lars Grassme

    2007-01-01

    opposed to a legal norm, that curbs exercises of the right to free speech that offend the feelings or beliefs of members from other cultural groups. The paper rejects the suggestion that acceptance of such a norm is in line with liberal egalitarian thinking. Following a review of the classical liberal...... egalitarian reasons for free speech - reasons from overall welfare, from autonomy and from respect for the equality of citizens - it is argued that these reasons outweigh the proposed reasons for curbing culturally offensive speech. Currently controversial cases such as that of the Danish Cartoon Controversy...

  4. Quit interest influences smoking cue-reactivity.

    Science.gov (United States)

    Veilleux, Jennifer C; Skinner, Kayla D; Pollert, Garrett A

    2016-12-01

    Interest in quitting smoking is important to model in cue-reactivity studies, because the craving elicited by cue exposure likely requires different self-regulation efforts for smokers who are interested in quitting compared to those without any quit interest. The objective of the current study was to evaluate the role of quit interest in how cigarette cue exposure influences self-control efforts. Smokers interested in quitting (n=37) and smokers with no interest in quitting (n=53) were randomly assigned to a cigarette or neutral cue exposure task. Following the cue exposure, all participants completed two self-control tasks, a measure of risky gambling (the Iowa Gambling Task) and a cold pressor tolerance task. Results indicated that smokers interested in quitting had worse performance on the gambling task when exposed to a cigarette cue compared to neutral cue exposure. We also found that people interested in quitting tolerated the cold pressor task for a shorter amount of time than people not interested in quitting. Finally, we found that for people interested in quitting, exposure to a cigarette cue was associated with increased motivation to take steps toward decreasing use. Overall these results suggest that including quit interest in studies of cue reactivity is valuable, as quit interest influenced smoking cue-reactivity responses. PMID:27487082

  5. Prosodic cues to word order: what level of representation?

    Directory of Open Access Journals (Sweden)

    Carline eBernard

    2012-10-01

    Full Text Available Within language, systematic correlations exist between syntactic structure and prosody. Prosodic prominence, for instance, falls on the complement and not the head of syntactic phrases, and its realization depends on the phrasal position of the prominent element. Thus, in Japanese, a functor-final language, prominence is phrase-initial and realized as increased pitch (^Tōkyō ni ‘Tokyo to’, whereas in French, English or Italian, functor-initial languages, it manifests itself as phrase-final lengthening (to Rome. Prosody is readily available in the linguistic signal even to the youngest infants. It has, therefore, been proposed that young learners might be able to exploit its correlations with syntax to bootstrap language structure. In this study, we tested this hypothesis, investigating how 8-month-old monolingual French infants processed an artificial grammar manipulating the relative position of prosodic prominence and word frequency. In Condition 1, we created a speech stream in which the two cues, prosody and frequency, were aligned, frequent words being prosodically non-prominent and infrequent ones being prominent, as is the case in natural language (functors are prosodically minimal compared to content words. In Condition 2, the two cues were misaligned, with frequent words carrying prosodic prominence, unlike in natural language. After familiarization with the aligned or the misaligned stream in a headturn preference procedure, we tested infants’ preference for test items having a frequent word initial or a frequent word final word order. We found that infants’ familiarized with the aligned stream showed the expected preference for the frequent word initial test items, mimicking the functor-initial word order of French. Infants in the misaligned condition showed no preference. These results suggest that infants are able to use word frequency and prosody as early cues to word order and they integrate them into a coherent

  6. Speech and Swallowing

    Science.gov (United States)

    ... Español In Your Area NPF Shop Speech and Swallowing Problems Make Text Smaller Make Text Larger You ... How do I know if I have a swallowing problem? I have recently lost weight without trying. ...

  7. Speech disorders - children

    Science.gov (United States)

    ... this page: //medlineplus.gov/ency/article/001430.htm Speech disorders - children To use the sharing features on ... PA: Elsevier Saunders; 2011:chap 32. Read More Autism spectrum disorder Cerebral palsy Hearing loss Intellectual disability ...

  8. Speech impairment (adult)

    Science.gov (United States)

    ... ALS or Lou Gehrig disease), cerebral palsy, myasthenia gravis, or multiple sclerosis (MS) Facial trauma Facial weakness, ... provider will likely ask about the speech impairment. Questions may include when the problem developed, whether there ...

  9. Computer-generated speech

    Energy Technology Data Exchange (ETDEWEB)

    Aimthikul, Y.

    1981-12-01

    This thesis reviews the essential aspects of speech synthesis and distinguishes between the two prevailing techniques: compressed digital speech and phonemic synthesis. It then presents the hardware details of the five speech modules evaluated. FORTRAN programs were written to facilitate message creation and retrieval with four of the modules driven by a PDP-11 minicomputer. The fifth module was driven directly by a computer terminal. The compressed digital speech modules (T.I. 990/306, T.S.I. Series 3D and N.S. Digitalker) each contain a limited vocabulary produced by the manufacturers while both the phonemic synthesizers made by Votrax permit an almost unlimited set of sounds and words. A text-to-phoneme rules program was adapted for the PDP-11 (running under the RSX-11M operating system) to drive the Votrax Speech Pac module. However, the Votrax Type'N Talk unit has its own built-in translator. Comparison of these modules revealed that the compressed digital speech modules were superior in pronouncing words on an individual basis but lacked the inflection capability that permitted the phonemic synthesizers to generate more coherent phrases. These findings were necessarily highly subjective and dependent on the specific words and phrases studied. In addition, the rapid introduction of new modules by manufacturers will necessitate new comparisons. However, the results of this research verified that all of the modules studied do possess reasonable quality of speech that is suitable for man-machine applications. Furthermore, the development tools are now in place to permit the addition of computer speech output in such applications.

  10. The effects of auditory and visual vowel training on speech reading performance

    Science.gov (United States)

    Richie, Carolyn; Kewley-Port, Diane

    2003-10-01

    Speech reading, the use of visual cues to understand speech, may provide a substantial benefit for normal-hearing listeners in noisy environments and for hearing-impaired listeners in everyday communication. However, there exists great individual variability in speech reading ability, and studies have shown that only a modest improvement in speech reading ability is achieved with training. The purpose of this investigation was to determine the effects of a novel approach to speech reading training on word and sentence identification tasks. In contrast to previous research, which involved training on consonant recognition, this study focused on vowels. Two groups of normal-hearing adults participated in auditory-visual (AV) conditions with added background noise. The first group of listeners received training on the recognition of 14 English vowels in isolated words, while the second group of listeners received no training. All listeners performed speech reading pre- and post-tests, on words and sentences. Results are discussed in terms of differences between groups, dependent upon whether training was administered, and a comparison is made between this and other speech reading training methods. Finally, the potential benefit of this vowel-based speech reading training method for the rehabilitation for hearing-impaired listeners is discussed. [Work supported by NIHDCD-02229.

  11. Kin-informative recognition cues in ants

    DEFF Research Database (Denmark)

    Nehring, Volker; Evison, Sophie E F; Santorelli, Lorenzo A;

    2011-01-01

    found little or no kin information in recognition cues. Here, we test the hypothesis that social insects do not have kin-informative recognition cues by investigating the recognition cues and relatedness of workers from four colonies of the ant Acromyrmex octospinosus. Contrary to the theoretical...... prediction, we show that the cuticular hydrocarbons of ant workers in all four colonies are informative enough to allow full-sisters to be distinguished from half-sisters with a high accuracy. These results contradict the hypothesis of non-heritable recognition cues and suggest that there is more potential...

  12. Visual cues for landmine detection

    Science.gov (United States)

    Staszewski, James J.; Davison, Alan D.; Tischuk, Julia A.; Dippel, David J.

    2007-04-01

    Can human vision supplement the information that handheld landmine detection equipment provides its operators to increase detection rates and reduce the hazard of the task? Contradictory viewpoints exist regarding the viability of visual detection of landmines. Assuming both positions are credible, this work aims to reconcile them by exploring the visual information produced by landmine burial and how any visible signatures change as a function of time in a natural environment. Its objective is to acquire objective, foundational knowledge on which training could be based and subsequently evaluated. A representative set of demilitarized landmines were buried at a field site with bare soil and vegetated surfaces using doctrinal procedures. High resolution photographs of the ground surface were taken for approximately one month starting in April 2006. Photos taken immediately after burial show clearly visible surface signatures. Their features change with time and weather exposure, but the patterns they define persist, as photos taken a month later show. An analysis exploiting the perceptual sensitivity of expert observers showed signature photos to domain experts with instructions to identify the cues and patterns that defined the signatures. Analysis of experts' verbal descriptions identified a small set of easily communicable cues that characterize signatures and their changes over the duration of observation. Findings suggest that visual detection training is viable and has potential to enhance detection capabilities. The photos and descriptions generated offer materials for designing such training and testing its utility. Plans for investigating the generality of the findings, especially potential limiting conditions, are discussed.

  13. SPEECH DISORDERS ENCOUNTERED DURING SPEECH THERAPY AND THERAPY TECHNIQUES

    Directory of Open Access Journals (Sweden)

    İlhan ERDEM

    2013-06-01

    Full Text Available Speech which is a physical and mental process, agreed signs and sounds to create a sense of mind to the message that change . Process to identify the sounds of speech it is essential to know the structure and function of various organs which allows to happen the conversation. Speech is a physical and mental process so many factors can lead to speech disorders. Speech disorder can be about language acquisitions as well as it can be caused medical and psychological many factors. Disordered speech, language, medical and psychological conditions as well as acquisitions also be caused by many factors. Speaking, is the collective work of many organs, such as an orchestra. Mental dimension of the speech disorder which is a very complex skill so it must be found which of these obstacles inhibit conversation. Speech disorder is a defect in speech flow, rhythm, tizliğinde, beats, the composition and vocalization. In this study, speech disorders such as articulation disorders, stuttering, aphasia, dysarthria, a local dialect speech, , language and lip-laziness, rapid speech peech defects in a term of language skills. This causes of speech disorders were investigated and presented suggestions for remedy was discussed.

  14. Practical speech user interface design

    CERN Document Server

    Lewis, James R

    2010-01-01

    Although speech is the most natural form of communication between humans, most people find using speech to communicate with machines anything but natural. Drawing from psychology, human-computer interaction, linguistics, and communication theory, Practical Speech User Interface Design provides a comprehensive yet concise survey of practical speech user interface (SUI) design. It offers practice-based and research-based guidance on how to design effective, efficient, and pleasant speech applications that people can really use. Focusing on the design of speech user interfaces for IVR application

  15. Cueing Animations: Dynamic Signaling Aids Information Extraction and Comprehension

    Science.gov (United States)

    Boucheix, Jean-Michel; Lowe, Richard K.; Putri, Dian K.; Groff, Jonathan

    2013-01-01

    The effectiveness of animations containing two novel forms of animation cueing that target relations between event units rather than individual entities was compared with that of animations containing conventional entity-based cueing or no cues. These relational event unit cues ("progressive path" and "local coordinated" cues) were specifically…

  16. The influence of masker type on early reflection processing and speech intelligibility (L)

    DEFF Research Database (Denmark)

    Arweiler, Iris; Buchholz, Jörg M.; Dau, Torsten

    2013-01-01

    listening did not provide a benefit from ERs apart from a binaural energy summation, such that monaural auditory processing could account for the data. However, a diffuse speech shaped noise (SSN) was used in the speech intelligibility experiments, which does not provide distinct binaural cues to the...... auditory system. In the present study, the monaural and binaural benefit from ERs for speech intelligibility was investigated using three directional maskers presented from 90° azimuth: a SSN, a multi-talker babble, and a reversed two-talker masker. For normal-hearing as well as hearing-impaired listeners......Arweiler and Buchholz [J. Acoust. Soc. Am. 130, 996-1005 (2011)] showed that, while the energy of early reflections (ERs) in a room improves speech intelligibility, the benefit is smaller than that provided by the energy of the direct sound (DS). In terms of integration of ERs and DS, binaural...

  17. Seeing the talker’s face supports executive processing of speech in steady state noise

    Directory of Open Access Journals (Sweden)

    Sushmit eMishra

    2013-11-01

    Full Text Available Listening to speech in noise depletes cognitive resources, affecting speech processing. The present study investigated how remaining resources or cognitive spare capacity (CSC can be deployed by young adults with normal hearing. We administered a test of CSC (CSCT, Mishra et al., 2013 along with a battery of established cognitive tests to 20 participants with normal hearing. In the CSCT, lists of two-digit numbers were presented with and without visual cues in quiet, as well as in steady-state and speech-like noise at a high intelligibility level. In low load conditions, two numbers were recalled according to instructions inducing executive processing (updating, inhibition and in high load conditions the participants were additionally instructed to recall one extra number, which was the always the first item in the list. In line with previous findings, results showed that CSC was sensitive to memory load and executive function but generally not related to working memory capacity. Furthermore, CSCT scores in quiet were lowered by visual cues, probably due to distraction. In steady-state noise, the presence of visual cues improved CSCT scores, probably by enabling better encoding. Contrary to our expectation, CSCT performance was disrupted more in steady-state than speech-like noise, although only without visual cues, possibly because selective attention could be used to ignore the speech-like background and provide an enriched representation of target items in working memory similar to that obtained in quiet. This interpretation is supported by a consistent association between CSCT scores and updating skills.

  18. Improved perception of speech in noise and Mandarin tones with acoustic simulations of harmonic coding for cochlear implants.

    Science.gov (United States)

    Li, Xing; Nie, Kaibao; Imennov, Nikita S; Won, Jong Ho; Drennan, Ward R; Rubinstein, Jay T; Atlas, Les E

    2012-11-01

    Harmonic and temporal fine structure (TFS) information are important cues for speech perception in noise and music perception. However, due to the inherently coarse spectral and temporal resolution in electric hearing, the question of how to deliver harmonic and TFS information to cochlear implant (CI) users remains unresolved. A harmonic-single-sideband-encoder [(HSSE); Nie et al. (2008). Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing; Lie et al., (2010). Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing] strategy has been proposed that explicitly tracks the harmonics in speech and transforms them into modulators conveying both amplitude modulation and fundamental frequency information. For unvoiced speech, HSSE transforms the TFS into a slowly varying yet still noise-like signal. To investigate its potential, four- and eight-channel vocoder simulations of HSSE and the continuous-interleaved-sampling (CIS) strategy were implemented, respectively. Using these vocoders, five normal-hearing subjects' speech recognition performance was evaluated under different masking conditions; another five normal-hearing subjects' Mandarin tone identification performance was also evaluated. Additionally, the neural discharge patterns evoked by HSSE- and CIS-encoded Mandarin tone stimuli were simulated using an auditory nerve model. All subjects scored significantly higher with HSSE than with CIS vocoders. The modeling analysis demonstrated that HSSE can convey temporal pitch cues better than CIS. Overall, the results suggest that HSSE is a promising strategy to enhance speech perception with CIs. PMID:23145619

  19. Gaze in Visual Search Is Guided More Efficiently by Positive Cues than by Negative Cues.

    Directory of Open Access Journals (Sweden)

    Günter Kugler

    Full Text Available Visual search can be accelerated when properties of the target are known. Such knowledge allows the searcher to direct attention to items sharing these properties. Recent work indicates that information about properties of non-targets (i.e., negative cues can also guide search. In the present study, we examine whether negative cues lead to different search behavior compared to positive cues. We asked observers to search for a target defined by a certain shape singleton (broken line among solid lines. Each line was embedded in a colored disk. In "positive cue" blocks, participants were informed about possible colors of the target item. In "negative cue" blocks, the participants were informed about colors that could not contain the target. Search displays were designed such that with both the positive and negative cues, the same number of items could potentially contain the broken line ("relevant items". Thus, both cues were equally informative. We measured response times and eye movements. Participants exhibited longer response times when provided with negative cues compared to positive cues. Although negative cues did guide the eyes to relevant items, there were marked differences in eye movements. Negative cues resulted in smaller proportions of fixations on relevant items, longer duration of fixations and in higher rates of fixations per item as compared to positive cues. The effectiveness of both cue types, as measured by fixations on relevant items, increased over the course of each search. In sum, a negative color cue can guide attention to relevant items, but it is less efficient than a positive cue of the same informational value.

  20. HUMAN SPEECH EMOTION RECOGNITION

    Directory of Open Access Journals (Sweden)

    Maheshwari Selvaraj

    2016-02-01

    Full Text Available Emotions play an extremely important role in human mental life. It is a medium of expression of one’s perspective or one’s mental state to others. Speech Emotion Recognition (SER can be defined as extraction of the emotional state of the speaker from his or her speech signal. There are few universal emotions- including Neutral, Anger, Happiness, Sadness in which any intelligent system with finite computational resources can be trained to identify or synthesize as required. In this work spectral and prosodic features are used for speech emotion recognition because both of these features contain the emotional information. Mel-frequency cepstral coefficients (MFCC is one of the spectral features. Fundamental frequency, loudness, pitch and speech intensity and glottal parameters are the prosodic features which are used to model different emotions. The potential features are extracted from each utterance for the computational mapping between emotions and speech patterns. Pitch can be detected from the selected features, using which gender can be classified. Support Vector Machine (SVM, is used to classify the gender in this work. Radial Basis Function and Back Propagation Network is used to recognize the emotions based on the selected features, and proved that radial basis function produce more accurate results for emotion recognition than the back propagation network.

  1. Robust Speech/Non-Speech Classification in Heterogeneous Multimedia Content

    NARCIS (Netherlands)

    Huijbregts, Marijn; Jong, de Franciska

    2011-01-01

    In this paper we present a speech/non-speech classification method that allows high quality classification without the need to know in advance what kinds of audible non-speech events are present in an audio recording and that does not require a single parameter to be tuned on in-domain data. Because

  2. Children's recognition of emotions from vocal cues

    NARCIS (Netherlands)

    D.A. Sauter; C. Panattoni; F. Happé

    2013-01-01

    Emotional cues contain important information about the intentions and feelings of others. Despite a wealth of research into children's understanding of facial signals of emotions, little research has investigated the developmental trajectory of interpreting affective cues in the voice. In this study

  3. Stereotyping in Fear of Success Cues.

    Science.gov (United States)

    Juran, Shelley

    Prior studies suggest that sex-role stereotypes influence responses to Horner's fear of success cue. This study investigates stereotypes about both sex roles and achievement settings. One hundred sixty college males and females wrote stories to different cues, then rated the masculinity-femininity of their story characters. Both "John and "Anne"…

  4. Synchronization by the hand: The sight of gestures modulates low-frequency activity in brain responses to continuous speech

    Directory of Open Access Journals (Sweden)

    Emmanuel eBiau

    2015-09-01

    Full Text Available During social interactions, speakers often produce spontaneous gestures to accompany their speech. These coordinated body movements convey communicative intentions, and modulate how listeners perceive the message in a subtle, but important way. In the present perspective, we put the focus on the role that congruent non-verbal information from beat gestures may play in the neural responses to speech. Whilst delta-theta oscillatory brain responses reflect the time-frequency structure of the speech signal, we argue that beat gestures promote phase resetting at relevant word onsets. This mechanism may facilitate the anticipation of associated acoustic cues relevant for prosodic/syllabic-based segmentation in speech perception. We report recently published data supporting this hypothesis, and discuss the potential of beats (and gestures in general for further studies investigating continuous AV speech processing through low-frequency oscillations.

  5. Synchronization by the hand: the sight of gestures modulates low-frequency activity in brain responses to continuous speech.

    Science.gov (United States)

    Biau, Emmanuel; Soto-Faraco, Salvador

    2015-01-01

    During social interactions, speakers often produce spontaneous gestures to accompany their speech. These coordinated body movements convey communicative intentions, and modulate how listeners perceive the message in a subtle, but important way. In the present perspective, we put the focus on the role that congruent non-verbal information from beat gestures may play in the neural responses to speech. Whilst delta-theta oscillatory brain responses reflect the time-frequency structure of the speech signal, we argue that beat gestures promote phase resetting at relevant word onsets. This mechanism may facilitate the anticipation of associated acoustic cues relevant for prosodic/syllabic-based segmentation in speech perception. We report recently published data supporting this hypothesis, and discuss the potential of beats (and gestures in general) for further studies investigating continuous AV speech processing through low-frequency oscillations. PMID:26441618

  6. Guiding Attention by Cooperative Cues

    Institute of Scientific and Technical Information of China (English)

    KangWoo Lee

    2008-01-01

    A common assumption in visual attention is based on the rationale of "limited capacity of information pro-ceasing". From this view point there is little consideration of how different information channels or modules are cooperating because cells in processing stages are forced to compete for the limited resource. To examine the mechanism behind the cooperative behavior of information channels, a computational model of selective attention is implemented based on two hypotheses. Unlike the traditional view of visual attention, the cooperative behavior is assumed to be a dynamic integration process between the bottom-up and top-down information. Furthermore, top-down information is assumed to provide a contextual cue during selection process and to guide the attentional allocation among many bottom-up candidates. The result from a series of simulation with still and video images showed some interesting properties that could not be explained by the competitive aspect of selective attention alone.

  7. Denial Denied: Freedom of Speech

    Directory of Open Access Journals (Sweden)

    Glen Newey

    2009-12-01

    Full Text Available Free speech is a widely held principle. This is in some ways surprising, since formal and informal censorship of speech is widespread, and rather different issues seem to arise depending on whether the censorship concerns who speaks, what content is spoken or how it is spoken. I argue that despite these facts, free speech can indeed be seen as a unitary principle. On my analysis, the core of the free speech principle is the denial of the denial of speech, whether to a speaker, to a proposition, or to a mode of expression. Underlying free speech is the principle of freedom of association, according to which speech is both a precondition of future association (e.g. as a medium for negotiation and a mode of association in its own right. I conclude by applying this account briefly to two contentious issues: hate speech and pornography.

  8. Speech spectrogram expert

    Energy Technology Data Exchange (ETDEWEB)

    Johannsen, J.; Macallister, J.; Michalek, T.; Ross, S.

    1983-01-01

    Various authors have pointed out that humans can become quite adept at deriving phonetic transcriptions from speech spectrograms (as good as 90percent accuracy at the phoneme level). The authors describe an expert system which attempts to simulate this performance. The speech spectrogram expert (spex) is actually a society made up of three experts: a 2-dimensional vision expert, an acoustic-phonetic expert, and a phonetics expert. The visual reasoning expert finds important visual features of the spectrogram. The acoustic-phonetic expert reasons about how visual features relates to phonemes, and about how phonemes change visually in different contexts. The phonetics expert reasons about allowable phoneme sequences and transformations, and deduces an english spelling for phoneme strings. The speech spectrogram expert is highly interactive, allowing users to investigate hypotheses and edit rules. 10 references.

  9. Punctuation in Quoted Speech

    CERN Document Server

    Doran, C F

    1996-01-01

    Quoted speech is often set off by punctuation marks, in particular quotation marks. Thus, it might seem that the quotation marks would be extremely useful in identifying these structures in texts. Unfortunately, the situation is not quite so clear. In this work, I will argue that quotation marks are not adequate for either identifying or constraining the syntax of quoted speech. More useful information comes from the presence of a quoting verb, which is either a verb of saying or a punctual verb, and the presence of other punctuation marks, usually commas. Using a lexicalized grammar, we can license most quoting clauses as text adjuncts. A distinction will be made not between direct and indirect quoted speech, but rather between adjunct and non-adjunct quoting clauses.

  10. Protection limits on free speech

    Institute of Scientific and Technical Information of China (English)

    李敏

    2014-01-01

    Freedom of speech is one of the basic rights of citizens should receive broad protection, but in the real context of China under what kind of speech can be protected and be restricted, how to grasp between state power and free speech limit is a question worth considering. People tend to ignore the freedom of speech and its function, so that some of the rhetoric cannot be demonstrated in the open debates.

  11. Speech characteristics in depression.

    Science.gov (United States)

    Stassen, H H; Bomben, G; Günther, E

    1991-01-01

    This study examined the relationship between speech characteristics and psychopathology throughout the course of affective disturbances. Our sample comprised 20 depressive, hospitalized patients who had been selected according to the following criteria: (1) first admission; (2) long-term patient; (3) early entry into study; (4) late entry into study; (5) low scorer; (6) high scorer, and (7) distinct retarded-depressive symptomatology. Since our principal goal was to model the course of affective disturbances in terms of speech parameters, a total of 6 repeated measurements had been carried out over a 2-week period, including 3 different psychopathological instruments and speech recordings from automatic speech as well as from reading out loud. It turned out that neither applicability nor efficiency of single-parameter models depend in any way on the given, clinically defined subgroups. On the other hand, however, no significant differences between the clinically defined subgroups showed up with regard to basic speech parameters, except for the fact that low scorers seemed to take their time when producing utterances (this in contrast to all other patients who, on the average, had a considerably shorter recording time). As to the relationship between psychopathology and speech parameters over time, we found significant correlations: (1) in 60% of cases between the apathic syndrome and energy/dynamics; (2) in 50% of cases between the retarded-depressive syndrome and energy/dynamics; (3) in 45% of cases between the apathic syndrome and mean vocal pitch, and (4) in 71% of low scores between the somatic-depressive syndrome and time duration of pauses. All in all, single parameter models turned out to cover only specific aspects of the individual courses of affective disturbances, thus speaking against a simple approach which applies in general. PMID:1886971

  12. Awareness of rhythm patterns in speech and music in children with specific language impairments

    Directory of Open Access Journals (Sweden)

    Ruth eCumming

    2015-12-01

    Full Text Available Children with specific language impairments (SLIs show impaired perception and production of language, and also show impairments in perceiving auditory cues to rhythm (amplitude rise time [ART] and sound duration and in tapping to a rhythmic beat. Here we explore potential links between language development and rhythm perception in 45 children with SLI and 50 age-matched controls. We administered three rhythmic tasks, a musical beat detection task, a tapping-to-music task, and a novel music/speech task, which varied rhythm and pitch cues independently or together in both speech and music. Via low-pass filtering, the music sounded as though it was played from a low-quality radio and the speech sounded as though it was muffled (heard behind the door. We report data for all of the SLI children (N = 45, IQ varying, as well as for two independent subgroupings with intact IQ. One subgroup, Pure SLI, had intact phonology and reading (N=16, the other, SLI PPR (N=15, had impaired phonology and reading. When IQ varied (all SLI children, we found significant group differences in all the rhythmic tasks. For the Pure SLI group, there were rhythmic impairments in the tapping task only. For children with SLI and poor phonology (SLI PPR, group differences were found in all of the filtered speech/music AXB tasks. We conclude that difficulties with rhythmic cues in both speech and music are present in children with SLIs, but that some rhythmic measures are more sensitive than others. The data are interpreted within a ‘prosodic phrasing’ hypothesis, and we discuss the potential utility of rhythmic and musical interventions in remediating speech and language difficulties in children.

  13. Awareness of Rhythm Patterns in Speech and Music in Children with Specific Language Impairments.

    Science.gov (United States)

    Cumming, Ruth; Wilson, Angela; Leong, Victoria; Colling, Lincoln J; Goswami, Usha

    2015-01-01

    Children with specific language impairments (SLIs) show impaired perception and production of language, and also show impairments in perceiving auditory cues to rhythm [amplitude rise time (ART) and sound duration] and in tapping to a rhythmic beat. Here we explore potential links between language development and rhythm perception in 45 children with SLI and 50 age-matched controls. We administered three rhythmic tasks, a musical beat detection task, a tapping-to-music task, and a novel music/speech task, which varied rhythm and pitch cues independently or together in both speech and music. Via low-pass filtering, the music sounded as though it was played from a low-quality radio and the speech sounded as though it was muffled (heard "behind the door"). We report data for all of the SLI children (N = 45, IQ varying), as well as for two independent subgroupings with intact IQ. One subgroup, "Pure SLI," had intact phonology and reading (N = 16), the other, "SLI PPR" (N = 15), had impaired phonology and reading. When IQ varied (all SLI children), we found significant group differences in all the rhythmic tasks. For the Pure SLI group, there were rhythmic impairments in the tapping task only. For children with SLI and poor phonology (SLI PPR), group differences were found in all of the filtered speech/music AXB tasks. We conclude that difficulties with rhythmic cues in both speech and music are present in children with SLIs, but that some rhythmic measures are more sensitive than others. The data are interpreted within a "prosodic phrasing" hypothesis, and we discuss the potential utility of rhythmic and musical interventions in remediating speech and language difficulties in children. PMID:26733848

  14. An Exploration of Rhythmic Grouping of Speech Sequences by French- and German-Learning Infants

    OpenAIRE

    Abboub, Nawal; Boll-Avetisyan, Natalie; Bhatara, Anjali; Höhle, Barbara; Nazzi, Thierry

    2016-01-01

    Rhythm in music and speech can be characterized by a constellation of several acoustic cues. Individually, these cues have different effects on rhythmic perception: sequences of sounds alternating in duration are perceived as short-long pairs (weak-strong/iambic pattern), whereas sequences of sounds alternating in intensity or pitch are perceived as loud-soft, or high-low pairs (strong-weak/trochaic pattern). This perceptual bias—called the Iambic-Trochaic Law (ITL)–has been claimed to be an ...

  15. The University and Free Speech

    OpenAIRE

    Grcic, Joseph

    2014-01-01

    Free speech is a necessary condition for the growth of knowledge and the implementation of real and rational democracy. Educational institutions play a central role in socializing individuals to function within their society. Academic freedom is the right to free speech in the context of the university and tenure, properly interpreted, is a necessary component of protecting academic freedom and free speech.

  16. Speech in Parkinson's disease

    OpenAIRE

    Širca, Patricija

    2012-01-01

    The thesis presents an analysis of speech of four male subjects with a diagnosis of Parkinson's disease associated with dementia. The analysis was performed on the record of the description of each one. All persons were asked to describe the scene in the picture, taken from the Boston test for aphasia, entitled: The cookie theft. Description was shot with a recorder and then converted to written words. With the help of pre-prepared check list, the speech has been properly evaluated. Each w...

  17. Hemispheric Asymmetry of Endogenous Neural Oscillations in Young Children: Implications for Hearing Speech In Noise.

    Science.gov (United States)

    Thompson, Elaine C; Woodruff Carr, Kali; White-Schwoch, Travis; Tierney, Adam; Nicol, Trent; Kraus, Nina

    2016-01-01

    Speech signals contain information in hierarchical time scales, ranging from short-duration (e.g., phonemes) to long-duration cues (e.g., syllables, prosody). A theoretical framework to understand how the brain processes this hierarchy suggests that hemispheric lateralization enables specialized tracking of acoustic cues at different time scales, with the left and right hemispheres sampling at short (25 ms; 40 Hz) and long (200 ms; 5 Hz) periods, respectively. In adults, both speech-evoked and endogenous cortical rhythms are asymmetrical: low-frequency rhythms predominate in right auditory cortex, and high-frequency rhythms in left auditory cortex. It is unknown, however, whether endogenous resting state oscillations are similarly lateralized in children. We investigated cortical oscillations in children (3-5 years; N = 65) at rest and tested our hypotheses that this temporal asymmetry is evident early in life and facilitates recognition of speech in noise. We found a systematic pattern of increasing leftward asymmetry for higher frequency oscillations; this pattern was more pronounced in children who better perceived words in noise. The observed connection between left-biased cortical oscillations in phoneme-relevant frequencies and speech-in-noise perception suggests hemispheric specialization of endogenous oscillatory activity may support speech processing in challenging listening environments, and that this infrastructure is present during early childhood. PMID:26804355

  18. Cross-modal cueing in audiovisual spatial attention

    OpenAIRE

    Blurton, Steven Paul; Mark W Greenlee; Gondan, Matthias

    2015-01-01

    Visual processing is most effective at the location of our attentional focus. It has long been known that various spatial cues can direct visuospatial attention and influence the detection of auditory targets. Cross-modal cueing, however, seems to depend on the type of the visual cue: facilitation effects have been reported for endogenous visual cues while exogenous cues seem to be mostly ineffective. In three experiments, we investigated cueing effects on the processing of audiovisual signal...

  19. Infants with Williams syndrome detect statistical regularities in continuous speech.

    Science.gov (United States)

    Cashon, Cara H; Ha, Oh-Ryeong; Graf Estes, Katharine; Saffran, Jenny R; Mervis, Carolyn B

    2016-09-01

    Williams syndrome (WS) is a rare genetic disorder associated with delays in language and cognitive development. The reasons for the language delay are unknown. Statistical learning is a domain-general mechanism recruited for early language acquisition. In the present study, we investigated whether infants with WS were able to detect the statistical structure in continuous speech. Eighteen 8- to 20-month-olds with WS were familiarized with 2min of a continuous stream of synthesized nonsense words; the statistical structure of the speech was the only cue to word boundaries. They were tested on their ability to discriminate statistically-defined "words" and "part-words" (which crossed word boundaries) in the artificial language. Despite significant cognitive and language delays, infants with WS were able to detect the statistical regularities in the speech stream. These findings suggest that an inability to track the statistical properties of speech is unlikely to be the primary basis for the delays in the onset of language observed in infants with WS. These results provide the first evidence of statistical learning by infants with developmental delays. PMID:27299804

  20. Learning to perceptually organize speech signals in native fashion.

    Science.gov (United States)

    Nittrouer, Susan; Lowenstein, Joanna H

    2010-03-01

    The ability to recognize speech involves sensory, perceptual, and cognitive processes. For much of the history of speech perception research, investigators have focused on the first and third of these, asking how much and what kinds of sensory information are used by normal and impaired listeners, as well as how effective amounts of that information are altered by "top-down" cognitive processes. This experiment focused on perceptual processes, asking what accounts for how the sensory information in the speech signal gets organized. Two types of speech signals processed to remove properties that could be considered traditional acoustic cues (amplitude envelopes and sine wave replicas) were presented to 100 listeners in five groups: native English-speaking (L1) adults, 7-, 5-, and 3-year-olds, and native Mandarin-speaking adults who were excellent second-language (L2) users of English. The L2 adults performed more poorly than L1 adults with both kinds of signals. Children performed more poorly than L1 adults but showed disproportionately better performance for the sine waves than for the amplitude envelopes compared to both groups of adults. Sentence context had similar effects across groups, so variability in recognition was attributed to differences in perceptual organization of the sensory information, presumed to arise from native language experience. PMID:20329861

  1. Social orienting of children with autism to facial expressions and speech: a study with a wearable eye-tracker in naturalistic settings

    OpenAIRE

    SilviaMagrelli; PatrickJermann; JaquelineNadel; FrancoisAnsermet

    2013-01-01

    This study investigates attention orienting to social stimuli in children with Autism Spectrum Conditions (ASC) during dyadic social interactions taking place in real-life settings. We study the effect of social cues that differ in complexity and distinguish between social cues produced by facial expressions of emotion and those produced during speech. We record the children's gazes using a head-mounted eye-tracking device and report on a detailed and quantitative analysis of the mo...

  2. Perception of Emotion in Conversational Speech by Younger and Older Listeners

    Science.gov (United States)

    Schmidt, Juliane; Janse, Esther; Scharenborg, Odette

    2016-01-01

    This study investigated whether age and/or differences in hearing sensitivity influence the perception of the emotion dimensions arousal (calm vs. aroused) and valence (positive vs. negative attitude) in conversational speech. To that end, this study specifically focused on the relationship between participants’ ratings of short affective utterances and the utterances’ acoustic parameters (pitch, intensity, and articulation rate) known to be associated with the emotion dimensions arousal and valence. Stimuli consisted of short utterances taken from a corpus of conversational speech. In two rating tasks, younger and older adults either rated arousal or valence using a 5-point scale. Mean intensity was found to be the main cue participants used in the arousal task (i.e., higher mean intensity cueing higher levels of arousal) while mean F0 was the main cue in the valence task (i.e., higher mean F0 being interpreted as more negative). Even though there were no overall age group differences in arousal or valence ratings, compared to younger adults, older adults responded less strongly to mean intensity differences cueing arousal and responded more strongly to differences in mean F0 cueing valence. Individual hearing sensitivity among the older adults did not modify the use of mean intensity as an arousal cue. However, individual hearing sensitivity generally affected valence ratings and modified the use of mean F0. We conclude that age differences in the interpretation of mean F0 as a cue for valence are likely due to age-related hearing loss, whereas age differences in rating arousal do not seem to be driven by hearing sensitivity differences between age groups (as measured by pure-tone audiometry). PMID:27303340

  3. Action experience changes attention to kinematic cues

    Directory of Open Access Journals (Sweden)

    Courtney eFilippi

    2016-02-01

    Full Text Available The current study used remote corneal reflection eye-tracking to examine the relationship between motor experience and action anticipation in 13-month-old infants. To measure online anticipation of actions infants watched videos where the actor’s hand provided kinematic information (in its orientation about the type of object that the actor was going to reach for. The actor’s hand orientation either matched the orientation of a rod (congruent cue or did not match the orientation of the rod (incongruent cue. To examine relations between motor experience and action anticipation, we used a 2 (reach first vs. observe first x 2 (congruent kinematic cue vs. incongruent kinematic cue between-subjects design. We show that 13-month-old infants in the observe first condition spontaneously generate rapid online visual predictions to congruent hand orientation cues and do not visually anticipate when presented incongruent cues. We further demonstrate that the speed that these infants generate predictions to congruent motor cues is correlated with their own ability to pre-shape their hands. Finally, we demonstrate that following reaching experience, infants generate rapid predictions to both congruent and incongruent hand shape cues—suggesting that short-term experience changes attention to kinematics.

  4. Evaluation of cleft palate speech.

    Science.gov (United States)

    Smith, Bonnie; Guyette, Thomas W

    2004-04-01

    Children born with palatal clefts are at risk for speech/language delay and speech problems related to palatal insufficiency. These individuals require regular speech evaluations, starting in the first year of life and often continuing into adulthood. The primary role of the speech pathologist on the cleft palate/craniofacial team is to evaluate whether deviations in oral cavity structures, such as the velopharynx, negatively impact speech production. This article focuses on the assessment of velopharyngeal function before and after palatal surgery. PMID:15145667

  5. Denial Denied: Freedom of Speech

    OpenAIRE

    Glen Newey

    2009-01-01

    Free speech is a widely held principle. This is in some ways surprising, since formal and informal censorship of speech is widespread, and rather different issues seem to arise depending on whether the censorship concerns who speaks, what content is spoken or how it is spoken. I argue that despite these facts, free speech can indeed be seen as a unitary principle. On my analysis, the core of the free speech principle is the denial of the denial of speech, whether to a speaker, to a propositio...

  6. Packet speech systems technology

    Science.gov (United States)

    Weinstein, C. J.; Blankenship, P. E.

    1982-09-01

    The long-range objectives of the Packet Speech Systems Technology Program are to develop and demonstrate techniques for efficient digital speech communications on networks suitable for both voice and data, and to investigate and develop techniques for integrated voice and data communication in packetized networks, including wideband common-user satellite links. Specific areas of concern are: the concentration of statistically fluctuating volumes of voice traffic, the adaptation of communication strategies to varying conditions of network links and traffic volume, and the interconnection of wideband satellite networks to terrestrial systems. Previous efforts in this area have led to new vocoder structures for improved narrowband voice performance and multiple-rate transmission, and to demonstrations of conversational speech and conferencing on the ARPANET and the Atlantic Packet Satellite Network. The current program has two major thrusts: i.e., the development and refinement of practical low-cost, robust, narrowband, and variable-rate speech algorithms and voice terminal structures; and the establishment of an experimental wideband satellite network to serve as a unique facility for the realistic investigation of voice/data networking strategies.

  7. Black History Speech

    Science.gov (United States)

    Noldon, Carl

    2007-01-01

    The author argues in this speech that one cannot expect students in the school system to know and understand the genius of Black history if the curriculum is Eurocentric, which is a residue of racism. He states that his comments are designed for the enlightenment of those who suffer from a school system that "hypocritically manipulates Black…

  8. Hearing speech in music

    Directory of Open Access Journals (Sweden)

    Seth-Reino Ekström

    2011-01-01

    Full Text Available The masking effect of a piano composition, played at different speeds and in different octaves, on speech-perception thresholds was investigated in 15 normal-hearing and 14 moderately-hearing-impaired subjects. Running speech (just follow conversation, JFC testing and use of hearing aids increased the everyday validity of the findings. A comparison was made with standard audiometric noises [International Collegium of Rehabilitative Audiology (ICRA noise and speech spectrum-filtered noise (SPN]. All masking sounds, music or noise, were presented at the same equivalent sound level (50 dBA. The results showed a significant effect of piano performance speed and octave (P<.01. Low octave and fast tempo had the largest effect; and high octave and slow tempo, the smallest. Music had a lower masking effect than did ICRA noise with two or six speakers at normal vocal effort (P<.01 and SPN (P<.05. Subjects with hearing loss had higher masked thresholds than the normal-hearing subjects (P<.01, but there were smaller differences between masking conditions (P<.01. It is pointed out that music offers an interesting opportunity for studying masking under realistic conditions, where spectral and temporal features can be varied independently. The results have implications for composing music with vocal parts, designing acoustic environments and creating a balance between speech perception and privacy in social settings.

  9. Free Speech Yearbook 1979.

    Science.gov (United States)

    Kane, Peter E., Ed.

    The seven articles in this collection deal with theoretical and practical freedom of speech issues. Topics covered are: the United States Supreme Court, motion picture censorship, and the color line; judicial decision making; the established scientific community's suppression of the ideas of Immanuel Velikovsky; the problems of avant-garde jazz,…

  10. Charisma in business speeches

    DEFF Research Database (Denmark)

    Niebuhr, Oliver; Brem, Alexander; Novák-Tót, Eszter;

    2016-01-01

    Charisma is a key component of spoken language interaction; and it is probably for this reason that charismatic speech has been the subject of intensive research for centuries. However, what is still largely missing is a quantitative and objective line of research that, firstly, involves analyses...

  11. Hearing speech in music.

    Science.gov (United States)

    Ekström, Seth-Reino; Borg, Erik

    2011-01-01

    The masking effect of a piano composition, played at different speeds and in different octaves, on speech-perception thresholds was investigated in 15 normal-hearing and 14 moderately-hearing-impaired subjects. Running speech (just follow conversation, JFC) testing and use of hearing aids increased the everyday validity of the findings. A comparison was made with standard audiometric noises [International Collegium of Rehabilitative Audiology (ICRA) noise and speech spectrum-filtered noise (SPN)]. All masking sounds, music or noise, were presented at the same equivalent sound level (50 dBA). The results showed a significant effect of piano performance speed and octave (PMusic had a lower masking effect than did ICRA noise with two or six speakers at normal vocal effort (Pmusic offers an interesting opportunity for studying masking under realistic conditions, where spectral and temporal features can be varied independently. The results have implications for composing music with vocal parts, designing acoustic environments and creating a balance between speech perception and privacy in social settings. PMID:21768731

  12. 1984 Newbery Acceptance Speech.

    Science.gov (United States)

    Cleary, Beverly

    1984-01-01

    This acceptance speech for an award honoring "Dear Mr. Henshaw," a book about feelings of a lonely child of divorce intended for eight-, nine-, and ten-year-olds, highlights children's letters to author. Changes in society that affect children, the inception of "Dear Mr. Henshaw," and children's reactions to books are highlighted. (EJS)

  13. Cues of maternal condition influence offspring selfishness.

    Directory of Open Access Journals (Sweden)

    Janine W Y Wong

    Full Text Available The evolution of parent-offspring communication was mostly studied from the perspective of parents responding to begging signals conveying information about offspring condition. Parents should respond to begging because of the differential fitness returns obtained from their investment in offspring that differ in condition. For analogous reasons, offspring should adjust their behavior to cues/signals of parental condition: parents that differ in condition pay differential costs of care and, hence, should provide different amounts of food. In this study, we experimentally tested in the European earwig (Forficula auricularia if cues of maternal condition affect offspring behavior in terms of sibling cannibalism. We experimentally manipulated female condition by providing them with different amounts of food, kept nymph condition constant, allowed for nymph exposure to chemical maternal cues over extended time, quantified nymph survival (deaths being due to cannibalism and extracted and analyzed the females' cuticular hydrocarbons (CHC. Nymph survival was significantly affected by chemical cues of maternal condition, and this effect depended on the timing of breeding. Cues of poor maternal condition enhanced nymph survival in early broods, but reduced nymph survival in late broods, and vice versa for cues of good condition. Furthermore, female condition affected the quantitative composition of their CHC profile which in turn predicted nymph survival patterns. Thus, earwig offspring are sensitive to chemical cues of maternal condition and nymphs from early and late broods show opposite reactions to the same chemical cues. Together with former evidence on maternal sensitivities to condition-dependent nymph chemical cues, our study shows context-dependent reciprocal information exchange about condition between earwig mothers and their offspring, potentially mediated by cuticular hydrocarbons.

  14. Cues of maternal condition influence offspring selfishness.

    Science.gov (United States)

    Wong, Janine W Y; Lucas, Christophe; Kölliker, Mathias

    2014-01-01

    The evolution of parent-offspring communication was mostly studied from the perspective of parents responding to begging signals conveying information about offspring condition. Parents should respond to begging because of the differential fitness returns obtained from their investment in offspring that differ in condition. For analogous reasons, offspring should adjust their behavior to cues/signals of parental condition: parents that differ in condition pay differential costs of care and, hence, should provide different amounts of food. In this study, we experimentally tested in the European earwig (Forficula auricularia) if cues of maternal condition affect offspring behavior in terms of sibling cannibalism. We experimentally manipulated female condition by providing them with different amounts of food, kept nymph condition constant, allowed for nymph exposure to chemical maternal cues over extended time, quantified nymph survival (deaths being due to cannibalism) and extracted and analyzed the females' cuticular hydrocarbons (CHC). Nymph survival was significantly affected by chemical cues of maternal condition, and this effect depended on the timing of breeding. Cues of poor maternal condition enhanced nymph survival in early broods, but reduced nymph survival in late broods, and vice versa for cues of good condition. Furthermore, female condition affected the quantitative composition of their CHC profile which in turn predicted nymph survival patterns. Thus, earwig offspring are sensitive to chemical cues of maternal condition and nymphs from early and late broods show opposite reactions to the same chemical cues. Together with former evidence on maternal sensitivities to condition-dependent nymph chemical cues, our study shows context-dependent reciprocal information exchange about condition between earwig mothers and their offspring, potentially mediated by cuticular hydrocarbons. PMID:24498046

  15. Metaheuristic applications to speech enhancement

    CERN Document Server

    Kunche, Prajna

    2016-01-01

    This book serves as a basic reference for those interested in the application of metaheuristics to speech enhancement. The major goal of the book is to explain the basic concepts of optimization methods and their use in heuristic optimization in speech enhancement to scientists, practicing engineers, and academic researchers in speech processing. The authors discuss why it has been a challenging problem for researchers to develop new enhancement algorithms that aid in the quality and intelligibility of degraded speech. They present powerful optimization methods to speech enhancement that can help to solve the noise reduction problems. Readers will be able to understand the fundamentals of speech processing as well as the optimization techniques, how the speech enhancement algorithms are implemented by utilizing optimization methods, and will be given the tools to develop new algorithms. The authors also provide a comprehensive literature survey regarding the topic.

  16. Learning foreign sounds in an alien world: videogame training improves non-native speech categorization.

    Science.gov (United States)

    Lim, Sung-joo; Holt, Lori L

    2011-01-01

    Although speech categories are defined by multiple acoustic dimensions, some are perceptually weighted more than others and there are residual effects of native-language weightings in non-native speech perception. Recent research on nonlinguistic sound category learning suggests that the distribution characteristics of experienced sounds influence perceptual cue weights: Increasing variability across a dimension leads listeners to rely upon it less in subsequent category learning (Holt & Lotto, 2006). The present experiment investigated the implications of this among native Japanese learning English /r/-/l/ categories. Training was accomplished using a videogame paradigm that emphasizes associations among sound categories, visual information, and players' responses to videogame characters rather than overt categorization or explicit feedback. Subjects who played the game for 2.5h across 5 days exhibited improvements in /r/-/l/ perception on par with 2-4 weeks of explicit categorization training in previous research and exhibited a shift toward more native-like perceptual cue weights. PMID:21827533

  17. Fully Automated Assessment of the Severity of Parkinson's Disease from Speech.

    Science.gov (United States)

    Bayestehtashk, Alireza; Asgari, Meysam; Shafran, Izhak; McNames, James

    2015-01-01

    For several decades now, there has been sporadic interest in automatically characterizing the speech impairment due to Parkinson's disease (PD). Most early studies were confined to quantifying a few speech features that were easy to compute. More recent studies have adopted a machine learning approach where a large number of potential features are extracted and the models are learned automatically from the data. In the same vein, here we characterize the disease using a relatively large cohort of 168 subjects, collected from multiple (three) clinics. We elicited speech using three tasks - the sustained phonation task, the diadochokinetic task and a reading task, all within a time budget of 4 minutes, prompted by a portable device. From these recordings, we extracted 1582 features for each subject using openSMILE, a standard feature extraction tool. We compared the effectiveness of three strategies for learning a regularized regression and find that ridge regression performs better than lasso and support vector regression for our task. We refine the feature extraction to capture pitch-related cues, including jitter and shimmer, more accurately using a time-varying harmonic model of speech. Our results show that the severity of the disease can be inferred from speech with a mean absolute error of about 5.5, explaining 61% of the variance and consistently well-above chance across all clinics. Of the three speech elicitation tasks, we find that the reading task is significantly better at capturing cues than diadochokinetic or sustained phonation task. In all, we have demonstrated that the data collection and inference can be fully automated, and the results show that speech-based assessment has promising practical application in PD. The techniques reported here are more widely applicable to other paralinguistic tasks in clinical domain. PMID:25382935

  18. The cortical representation of the speech envelope is earlier for audiovisual speech than audio speech.

    Science.gov (United States)

    Crosse, Michael J; Lalor, Edmund C

    2014-04-01

    Visual speech can greatly enhance a listener's comprehension of auditory speech when they are presented simultaneously. Efforts to determine the neural underpinnings of this phenomenon have been hampered by the limited temporal resolution of hemodynamic imaging and the fact that EEG and magnetoencephalographic data are usually analyzed in response to simple, discrete stimuli. Recent research has shown that neuronal activity in human auditory cortex tracks the envelope of natural speech. Here, we exploit this finding by estimating a linear forward-mapping between the speech envelope and EEG data and show that the latency at which the envelope of natural speech is represented in cortex is shortened by >10 ms when continuous audiovisual speech is presented compared with audio-only speech. In addition, we use a reverse-mapping approach to reconstruct an estimate of the speech stimulus from the EEG data and, by comparing the bimodal estimate with the sum of the unimodal estimates, find no evidence of any nonlinear additive effects in the audiovisual speech condition. These findings point to an underlying mechanism that could account for enhanced comprehension during audiovisual speech. Specifically, we hypothesize that low-level acoustic features that are temporally coherent with the preceding visual stream may be synthesized into a speech object at an earlier latency, which may provide an extended period of low-level processing before extraction of semantic information. PMID:24401714

  19. The Perceptual Cues that Reshape Expert Reasoning

    Science.gov (United States)

    Harré, Michael; Bossomaier, Terry; Snyder, Allan

    2012-07-01

    The earliest stages in our perception of the world have a subtle but powerful influence on later thought processes; they provide the contextual cues within which our thoughts are framed and they adapt to many different environments throughout our lives. Understanding the changes in these cues is crucial to understanding how our perceptual ability develops, but these changes are often difficult to quantify in sufficiently complex tasks where objective measures of development are available. Here we simulate perceptual learning using neural networks and demonstrate fundamental changes in these cues as a function of skill. These cues are cognitively grouped together to form perceptual templates that enable rapid `whole scene' categorisation of complex stimuli. Such categories reduce the computational load on our capacity limited thought processes, they inform our higher cognitive processes and they suggest a framework of perceptual pre-processing that captures the central role of perception in expertise.

  20. Compensation for complete assimilation in speech perception: The case of Korean labial-to-velar assimilation

    OpenAIRE

    Mitterer, H.; Kim, S.; Cho, T.

    2013-01-01

    In connected speech, phonological assimilation to neighboring words can lead to pronunciation variants (e.g., 'garden bench'→ "gardem bench"). A large body of literature suggests that listeners use the phonetic context to reconstruct the intended word for assimilation types that often lead to incomplete assimilations (e.g., a pronunciation of "garden" that carries cues for both a labial [m] and an alveolar [n]). In the current paper, we show that a similar context effect is observed for an as...

  1. SPEECH PROCESSING –AN OVERVIEW

    Directory of Open Access Journals (Sweden)

    A.INDUMATHI

    2012-06-01

    Full Text Available One of the earliest goals of speech processing was coding speech for efficient transmission. Later, the research spread in various area like Automatic Speech Recognition (ASR, Speech Synthesis (TTS,Speech Enhancement, Automatic Language Translation (ALT.Initially, ASR is used to recognize single words in a small vocabulary, later many product was developed for continuous speech for large vocabulary.Speech Synthesis is used for synthesizing the speech corresponding to a given text Speech Synthesis provide a way to communicate for persons unable to speak. When Speech Synthesis used together withASR, it allows a complete two-way spoken interaction between humans and machines. Speech Enhancement technique is applied to improve the quality of speech signal. Automatic Language Translation helps toconvert one language into another language. Basic concept of speech processing is provided for beginners.

  2. Predicting speech intelligibility in conditions with nonlinearly processed noisy speech

    DEFF Research Database (Denmark)

    Jørgensen, Søren; Dau, Torsten

    2013-01-01

    that a measure of the across audio-frequency variance at the output of the modulation-frequency selective process in the model is sufficient to account for the phase jitter distortion. Thus, a joint spectro-temporal modulation analysis, as proposed in [3], does not seem to be required. The results are......The speech-based envelope power spectrum model (sEPSM; [1]) was proposed in order to overcome the limitations of the classical speech transmission index (STI) and speech intelligibility index (SII). The sEPSM applies the signal-tonoise ratio in the envelope domain (SNRenv), which was demonstrated...... to successfully predict speech intelligibility in conditions with nonlinearly processed noisy speech, such as processing with spectral subtraction. Moreover, a multiresolution version (mr-sEPSM) was demonstrated to account for speech intelligibility in various conditions with stationary and...

  3. Perception of health from facial cues.

    Science.gov (United States)

    Henderson, Audrey J; Holzleitner, Iris J; Talamas, Sean N; Perrett, David I

    2016-05-01

    Impressions of health are integral to social interactions, yet poorly understood. A review of the literature reveals multiple facial characteristics that potentially act as cues to health judgements. The cues vary in their stability across time: structural shape cues including symmetry and sexual dimorphism alter slowly across the lifespan and have been found to have weak links to actual health, but show inconsistent effects on perceived health. Facial adiposity changes over a medium time course and is associated with both perceived and actual health. Skin colour alters over a short time and has strong effects on perceived health, yet links to health outcomes have barely been evaluated. Reviewing suggested an additional influence of demeanour as a perceptual cue to health. We, therefore, investigated the association of health judgements with multiple facial cues measured objectively from two-dimensional and three-dimensional facial images. We found evidence for independent contributions of face shape and skin colour cues to perceived health. Our empirical findings: (i) reinforce the role of skin yellowness; (ii) demonstrate the utility of global face shape measures of adiposity; and (iii) emphasize the role of affect in facial images with nominally neutral expression in impressions of health. PMID:27069057

  4. Direct and Indirect Cues to Knowledge States during Word Learning

    Science.gov (United States)

    Saylor, Megan M.; Carroll, C. Brooke

    2009-01-01

    The present study investigated three-year-olds' sensitivity to direct and indirect cues to others' knowledge states for word learning purposes. Children were given either direct, physical cues to knowledge or indirect, verbal cues to knowledge. Preschoolers revealed a better ability to learn words from a speaker following direct, physical cues to…

  5. Hate Speech: Power in the Marketplace.

    Science.gov (United States)

    Harrison, Jack B.

    1994-01-01

    A discussion of hate speech and freedom of speech on college campuses examines the difference between hate speech from normal, objectionable interpersonal comments and looks at Supreme Court decisions on the limits of student free speech. Two cases specifically concerning regulation of hate speech on campus are considered: Chaplinsky v. New…

  6. Variation and Synthetic Speech

    CERN Document Server

    Miller, C; Massey, N; Miller, Corey; Karaali, Orhan; Massey, Noel

    1997-01-01

    We describe the approach to linguistic variation taken by the Motorola speech synthesizer. A pan-dialectal pronunciation dictionary is described, which serves as the training data for a neural network based letter-to-sound converter. Subsequent to dictionary retrieval or letter-to-sound generation, pronunciations are submitted a neural network based postlexical module. The postlexical module has been trained on aligned dictionary pronunciations and hand-labeled narrow phonetic transcriptions. This architecture permits the learning of individual postlexical variation, and can be retrained for each speaker whose voice is being modeled for synthesis. Learning variation in this way can result in greater naturalness for the synthetic speech that is produced by the system.

  7. [Improving speech comprehension using a new cochlear implant speech processor].

    Science.gov (United States)

    Müller-Deile, J; Kortmann, T; Hoppe, U; Hessel, H; Morsnowski, A

    2009-06-01

    The aim of this multicenter clinical field study was to assess the benefits of the new Freedom 24 sound processor for cochlear implant (CI) users implanted with the Nucleus 24 cochlear implant system. The study included 48 postlingually profoundly deaf experienced CI users who demonstrated speech comprehension performance with their current speech processor on the Oldenburg sentence test (OLSA) in quiet conditions of at least 80% correct scores and who were able to perform adaptive speech threshold testing using the OLSA in noisy conditions. Following baseline measures of speech comprehension performance with their current speech processor, subjects were upgraded to the Freedom 24 speech processor. After a take-home trial period of at least 2 weeks, subject performance was evaluated by measuring the speech reception threshold with the Freiburg multisyllabic word test and speech intelligibility with the Freiburg monosyllabic word test at 50 dB and 70 dB in the sound field. The results demonstrated highly significant benefits for speech comprehension with the new speech processor. Significant benefits for speech comprehension were also demonstrated with the new speech processor when tested in competing background noise.In contrast, use of the Abbreviated Profile of Hearing Aid Benefit (APHAB) did not prove to be a suitably sensitive assessment tool for comparative subjective self-assessment of hearing benefits with each processor. Use of the preprocessing algorithm known as adaptive dynamic range optimization (ADRO) in the Freedom 24 led to additional improvements over the standard upgrade map for speech comprehension in quiet and showed equivalent performance in noise. Through use of the preprocessing beam-forming algorithm BEAM, subjects demonstrated a highly significant improved signal-to-noise ratio for speech comprehension thresholds (i.e., signal-to-noise ratio for 50% speech comprehension scores) when tested with an adaptive procedure using the Oldenburg

  8. On Speech Act Theory

    Institute of Scientific and Technical Information of China (English)

    邓仁毅

    2009-01-01

    Speech act has developed from the work of linguistic philosophers and originates in Austin's observation and study. It was the particular search for the eonstative, utterances which describe something outside the text and can therefore be judged true or false that prompted John L. Austin to direct his attention to the distinction with so -called performa-tires. The two representative linguists are Aus-tin and Searle.

  9. HATE SPEECH AS COMMUNICATION

    OpenAIRE

    Gladilin Aleksey Vladimirovich

    2012-01-01

    The purpose of the paper is a theoretical comprehension of hate speech from communication point of view, on the one hand, and from the point of view of prejudice, stereotypes and discrimination on the other. Such a comprehension caused by the need to develop objective forensic linguistics methodology to analyze texts that are supposedly extremist. The method of analysis and synthesis is the basic in the investigation. Approach to functions and other elements of communication theory is based o...

  10. Predicting Speech Intelligibility

    OpenAIRE

    HINES, ANDREW

    2012-01-01

    Hearing impairment, and specifically sensorineural hearing loss, is an increasingly prevalent condition, especially amongst the ageing population. It occurs primarily as a result of damage to hair cells that act as sound receptors in the inner ear and causes a variety of hearing perception problems, most notably a reduction in speech intelligibility. Accurate diagnosis of hearing impairments is a time consuming process and is complicated by the reliance on indirect measurements based on patie...

  11. Regulating hate speech online

    OpenAIRE

    Banks, James

    2010-01-01

    The exponential growth in the Internet as a means of communication has been emulated by an increase in far-right and extremist web sites and hate based activity in cyberspace. The anonymity and mobility afforded by the Internet has made harassment and expressions of hate effortless in a landscape that is abstract and beyond the realms of traditional law enforcement. This paper examines the complexities of regulating hate speech on the Internet through legal and technological frameworks. It ex...

  12. Speech rhythm: a metaphor?

    Science.gov (United States)

    Nolan, Francis; Jeon, Hae-Sung

    2014-12-19

    Is speech rhythmic? In the absence of evidence for a traditional view that languages strive to coordinate either syllables or stress-feet with regular time intervals, we consider the alternative that languages exhibit contrastive rhythm subsisting merely in the alternation of stronger and weaker elements. This is initially plausible, particularly for languages with a steep 'prominence gradient', i.e. a large disparity between stronger and weaker elements; but we point out that alternation is poorly achieved even by a 'stress-timed' language such as English, and, historically, languages have conspicuously failed to adopt simple phonological remedies that would ensure alternation. Languages seem more concerned to allow 'syntagmatic contrast' between successive units and to use durational effects to support linguistic functions than to facilitate rhythm. Furthermore, some languages (e.g. Tamil, Korean) lack the lexical prominence which would most straightforwardly underpin prominence of alternation. We conclude that speech is not incontestibly rhythmic, and may even be antirhythmic. However, its linguistic structure and patterning allow the metaphorical extension of rhythm in varying degrees and in different ways depending on the language, and it is this analogical process which allows speech to be matched to external rhythms. PMID:25385774

  13. Speech and the Right Hemisphere

    Directory of Open Access Journals (Sweden)

    E. M. R. Critchley

    1991-01-01

    Full Text Available Two facts are well recognized: the location of the speech centre with respect to handedness and early brain damage, and the involvement of the right hemisphere in certain cognitive functions including verbal humour, metaphor interpretation, spatial reasoning and abstract concepts. The importance of the right hemisphere in speech is suggested by pathological studies, blood flow parameters and analysis of learning strategies. An insult to the right hemisphere following left hemisphere damage can affect residual language abilities and may activate non-propositional inner speech. The prosody of speech comprehension even more so than of speech production—identifying the voice, its affective components, gestural interpretation and monitoring one's own speech—may be an essentially right hemisphere task. Errors of a visuospatial type may occur in the learning process. Ease of learning by actors and when learning foreign languages is achieved by marrying speech with gesture and intonation, thereby adopting a right hemisphere strategy.

  14. Effects of Verbal Cues versus Pictorial Cues on the Transfer of Stimulus Control for Children with Autism

    Science.gov (United States)

    West, Elizabeth Anne

    2008-01-01

    The author examined the transfer of stimulus control from instructor assistance to verbal cues and pictorial cues. The intent was to determine whether it is easier to transfer stimulus control to one form of cue or the other. No studies have conducted such comparisons to date; however, literature exists to suggest that visual cues may be…

  15. Preschoolers' Learning of Brand Names from Visual Cues.

    OpenAIRE

    Macklin, M Carole

    1996-01-01

    This research addresses the question of how perceptual cues affect preschoolers' learning of brand names. It is found that when visual cues are provided in addition to brand names that are prior-associated in children's memory structures, children better remember the brand names. Although two cues (a picture and a color) improve memory over the imposition of a single cue, extensive visual cues may overtax young children's processing abilities. The study contributes to our understanding of how...

  16. Speech recognition in university classrooms

    OpenAIRE

    Wald, Mike; Bain, Keith; Basson, Sara H

    2002-01-01

    The LIBERATED LEARNING PROJECT (LLP) is an applied research project studying two core questions: 1) Can speech recognition (SR) technology successfully digitize lectures to display spoken words as text in university classrooms? 2) Can speech recognition technology be used successfully as an alternative to traditional classroom notetaking for persons with disabilities? This paper addresses these intriguing questions and explores the underlying complex relationship between speech recognition te...

  17. Visualizing structures of speech expressiveness

    OpenAIRE

    Herbelin, Bruno; Jensen, Karl Kristoffer; Graugaard, Lars

    2008-01-01

    Speech is both beautiful and informative. In this work, a conceptual study ofthe speech, through investigation of the tower of Babel, the archetypal phonemes, and astudy of the reasons of uses of language is undertaken in order to create an artistic workinvestigating the nature of speech. The Babel myth speaks about distance created whenaspiring to the heaven as the reason for language division. Meanwhile, Locquin statesthrough thorough investigations that only a few phonemes are present thro...

  18. Lecturer’s Speech Competence

    OpenAIRE

    Svetlana Viktorovna Panina; Svetlana Yurievna Zalutskaya; Galina Egorovna Zhondorova

    2014-01-01

    The analysis of the issue of lecturer’s speech competence is presented. Lecturer’s speech competence is the main component of professional image, the indicator of communicative culture, having a great impact on the quality of pedagogical activity Research objective: to define the main drawbacks of speech competence of lecturers of North-Eastern Federal University named after M. K. Ammosov (NEFU) (Russia, Yakutsk) and suggest the ways of drawbacks corrections in terms of multilingual education...

  19. Speech Recognition Technology: Applications & Future

    OpenAIRE

    Pankaj Pathak

    2010-01-01

    Voice or speech recognition is "the technology by which sounds, words or phrases spoken by humans are converted into electrical signals, and these signals are transformed into coding patterns to which meaning has been assigned", .It is the technology needs a combination of improved artificial intelligence technology and a more sophisticated speech-recognition engine . Initially a primitive device is developed which could recognize speech, by AT & T Bell Laboratories in the 1940s. According to...

  20. Motor Equivalence in Speech Production

    OpenAIRE

    Perrier, Pascal; Fuchs, Susanne

    2015-01-01

    International audience The first section provides a description of the concepts of “motor equivalence” and “degrees of freedom”. It is illustrated with a few examples of motor tasks in general and of speech production tasks in particular. In the second section, the methodology used to investigate experimentally motor equivalence phenomena in speech production is presented. It is mainly based on paradigms that perturb the perception-action loop during on-going speech, either by limiting the...

  1. Speech therapy for Parkinson's disease.

    OpenAIRE

    Scott, S; Caird, F I

    1983-01-01

    Twenty-six patients with the speech disorder of Parkinson's disease received daily speech therapy (prosodic exercises) at home for 2 to 3 weeks. There were significant improvements in speech as assessed by scores for prosodic abnormality and intelligibility' and these were maintained in part for up to 3 months. The degree of improvement was clinically and psychologically important, and relatives commented on the social benefits. The use of a visual reinforcement device produced limited benefi...

  2. Somatosensory basis of speech production.

    Science.gov (United States)

    Tremblay, Stéphanie; Shiller, Douglas M; Ostry, David J

    2003-06-19

    The hypothesis that speech goals are defined acoustically and maintained by auditory feedback is a central idea in speech production research. An alternative proposal is that speech production is organized in terms of control signals that subserve movements and associated vocal-tract configurations. Indeed, the capacity for intelligible speech by deaf speakers suggests that somatosensory inputs related to movement play a role in speech production-but studies that might have documented a somatosensory component have been equivocal. For example, mechanical perturbations that have altered somatosensory feedback have simultaneously altered acoustics. Hence, any adaptation observed under these conditions may have been a consequence of acoustic change. Here we show that somatosensory information on its own is fundamental to the achievement of speech movements. This demonstration involves a dissociation of somatosensory and auditory feedback during speech production. Over time, subjects correct for the effects of a complex mechanical load that alters jaw movements (and hence somatosensory feedback), but which has no measurable or perceptible effect on acoustic output. The findings indicate that the positions of speech articulators and associated somatosensory inputs constitute a goal of speech movements that is wholly separate from the sounds produced. PMID:12815431

  3. Differential Gaze Patterns on Eyes and Mouth During Audiovisual Speech Segmentation.

    Science.gov (United States)

    Lusk, Laina G; Mitchel, Aaron D

    2016-01-01

    Speech is inextricably multisensory: both auditory and visual components provide critical information for all aspects of speech processing, including speech segmentation, the visual components of which have been the target of a growing number of studies. In particular, a recent study (Mitchel and Weiss, 2014) established that adults can utilize facial cues (i.e., visual prosody) to identify word boundaries in fluent speech. The current study expanded upon these results, using an eye tracker to identify highly attended facial features of the audiovisual display used in Mitchel and Weiss (2014). Subjects spent the most time watching the eyes and mouth. A significant trend in gaze durations was found with the longest gaze duration on the mouth, followed by the eyes and then the nose. In addition, eye-gaze patterns changed across familiarization as subjects learned the word boundaries, showing decreased attention to the mouth in later blocks while attention on other facial features remained consistent. These findings highlight the importance of the visual component of speech processing and suggest that the mouth may play a critical role in visual speech segmentation. PMID:26869959

  4. A Maximum Likelihood Estimation of Vocal-Tract-Related Filter Characteristics for Single Channel Speech Separation

    Directory of Open Access Journals (Sweden)

    Dansereau Richard M

    2007-01-01

    Full Text Available We present a new technique for separating two speech signals from a single recording. The proposed method bridges the gap between underdetermined blind source separation techniques and those techniques that model the human auditory system, that is, computational auditory scene analysis (CASA. For this purpose, we decompose the speech signal into the excitation signal and the vocal-tract-related filter and then estimate the components from the mixed speech using a hybrid model. We first express the probability density function (PDF of the mixed speech's log spectral vectors in terms of the PDFs of the underlying speech signal's vocal-tract-related filters. Then, the mean vectors of PDFs of the vocal-tract-related filters are obtained using a maximum likelihood estimator given the mixed signal. Finally, the estimated vocal-tract-related filters along with the extracted fundamental frequencies are used to reconstruct estimates of the individual speech signals. The proposed technique effectively adds vocal-tract-related filter characteristics as a new cue to CASA models using a new grouping technique based on an underdetermined blind source separation. We compare our model with both an underdetermined blind source separation and a CASA method. The experimental results show that our model outperforms both techniques in terms of SNR improvement and the percentage of crosstalk suppression.

  5. A Maximum Likelihood Estimation of Vocal-Tract-Related Filter Characteristics for Single Channel Speech Separation

    Directory of Open Access Journals (Sweden)

    Mohammad H. Radfar

    2006-11-01

    Full Text Available We present a new technique for separating two speech signals from a single recording. The proposed method bridges the gap between underdetermined blind source separation techniques and those techniques that model the human auditory system, that is, computational auditory scene analysis (CASA. For this purpose, we decompose the speech signal into the excitation signal and the vocal-tract-related filter and then estimate the components from the mixed speech using a hybrid model. We first express the probability density function (PDF of the mixed speech's log spectral vectors in terms of the PDFs of the underlying speech signal's vocal-tract-related filters. Then, the mean vectors of PDFs of the vocal-tract-related filters are obtained using a maximum likelihood estimator given the mixed signal. Finally, the estimated vocal-tract-related filters along with the extracted fundamental frequencies are used to reconstruct estimates of the individual speech signals. The proposed technique effectively adds vocal-tract-related filter characteristics as a new cue to CASA models using a new grouping technique based on an underdetermined blind source separation. We compare our model with both an underdetermined blind source separation and a CASA method. The experimental results show that our model outperforms both techniques in terms of SNR improvement and the percentage of crosstalk suppression.

  6. What Is Language? What Is Speech?

    Science.gov (United States)

    ... Public / Speech, Language and Swallowing / Development What Is Language? What Is Speech? [ en Español ] Kelly's 4-year-old son, Tommy, has speech and language problems. Friends and family have a hard time ...

  7. Counterconditioning reduces cue-induced craving and actual cue-elicited consumption.

    NARCIS (Netherlands)

    D. van Gucht; F. Baeyens; D. Vansteenwegen; D. Hermans; T. Beckers

    2010-01-01

    Cue-induced craving is not easily reduced by an extinction or exposure procedure and may constitute an important route toward relapse in addictive behavior after treatment. In the present study, we investigated the effectiveness of counterconditioning as an alternative procedure to reduce cue-induce

  8. Cues for Better Writing: Empirical Assessment of a Word Counter and Cueing Application's Effectiveness

    Science.gov (United States)

    Vijayasarathy, Leo R.; Gould, Susan Martin; Gould, Michael

    2015-01-01

    Written clarity and conciseness are desired by employers and emphasized in business communication courses. We developed and tested the efficacy of a cueing tool--Scribe Bene--to help students reduce their use of imprecise and ambiguous words and wordy phrases. Effectiveness was measured by comparing cue word usage between a treatment group given…

  9. Behavioral Signal Processing: Deriving Human Behavioral Informatics From Speech and Language

    Science.gov (United States)

    Narayanan, Shrikanth; Georgiou, Panayiotis G.

    2013-01-01

    The expression and experience of human behavior are complex and multimodal and characterized by individual and contextual heterogeneity and variability. Speech and spoken language communication cues offer an important means for measuring and modeling human behavior. Observational research and practice across a variety of domains from commerce to healthcare rely on speech- and language-based informatics for crucial assessment and diagnostic information and for planning and tracking response to an intervention. In this paper, we describe some of the opportunities as well as emerging methodologies and applications of human behavioral signal processing (BSP) technology and algorithms for quantitatively understanding and modeling typical, atypical, and distressed human behavior with a specific focus on speech- and language-based communicative, affective, and social behavior. We describe the three important BSP components of acquiring behavioral data in an ecologically valid manner across laboratory to real-world settings, extracting and analyzing behavioral cues from measured data, and developing models offering predictive and decision-making support. We highlight both the foundational speech and language processing building blocks as well as the novel processing and modeling opportunities. Using examples drawn from specific real-world applications ranging from literacy assessment and autism diagnostics to psychotherapy for addiction and marital well being, we illustrate behavioral informatics applications of these signal processing techniques that contribute to quantifying higher level, often subjectively described, human behavior in a domain-sensitive fashion. PMID:24039277

  10. Parameter masks for close talk speech segregation using deep neural networks

    Directory of Open Access Journals (Sweden)

    Jiang Yi

    2015-01-01

    Full Text Available A deep neural networks (DNN based close talk speech segregation algorithm is introduced. One nearby microphone is used to collect the target speech as close talk indicated, and another microphone is used to get the noise in environments. The time and energy difference between the two microphones signal is used as the segregation cue. A DNN estimator on each frequency channel is used to calculate the parameter masks. The parameter masks represent the target speech energy in each time frequency (T-F units. Experiment results show the good performance of the proposed system. The signal to noise ratio (SNR improvement is 8.1 dB on 0 dB noisy environment.

  11. An Improved Speech Enhancement Method based on Teager Energy Operator and Perceptual Wavelet Packet Decomposition

    Directory of Open Access Journals (Sweden)

    Huan Zhao

    2011-06-01

    Full Text Available According to the distribution characteristic of noise and clean speech signal in the frequency domain, a new speech enhancement method based on teager energy operator (TEO and perceptual wavelet packet decomposition (PWPD is proposed. Firstly, a modified Mask construction method is made to protect the acoustic cues at the low frequencies. Then a level-dependent parameter is introduced to further adjust the thresholds in light of the noise distribution feature. At last the sub-bands which have very little influence are set directly 0 to improve the signal-to-noise ratio (SNR and reduce the computation load. Simulation results show that, under different kinds of noise environments, this new method not only enhances the signal-to-noise ratio (SNR and perceptual evaluation of speech quality (PESQ, but also reduces the computation load, which is very advantageous for real-time realizing.

  12. ERP correlates of involuntary attention capture by prosodic salience in speech.

    Science.gov (United States)

    Wang, Jingtian; Friedman, David; Ritter, Walter; Bersick, Michael

    2005-01-01

    This study addressed whether temporally salient (e.g., word onset) or prosodically salient (e.g., stressed syllables) information serves as a cue to capture attention in speech sound analysis. In an auditory oddball paradigm, 16 native English speakers were asked to ignore binaurally presented disyllabic speech sounds and watch a silent movie while ERPs were recorded. Four types of phonetic deviants were employed: a deviant syllable that was either stressed or unstressed and that occurred in either the first or second temporal position. The nature of the phonetic change (a change from a voiced consonant to its corresponding unvoiced consonant) was kept constant. MMNs were observed for all deviants. In contrast, the P3a was only seen when the deviance occurred on stressed syllables. The sensitivity of the P3a to the stress manipulation suggests that prosodic rather than temporal salience captures attention in unattended speech sounds. PMID:15720580

  13. Discovering Words in Fluent Speech: The Contribution of Two Kinds of Statistical Information

    Directory of Open Access Journals (Sweden)

    Erik D Thiessen

    2013-01-01

    Full Text Available To efficiently segment fluent speech, infants must discover the predominant phonological form of words in the native language. In English, for example, content words typically begin with a stressed syllable. To discover this regularity, infants need to identify a set of words. We propose that statistical learning plays two roles in this process. First, it provides a cue that allows infants to segment words from fluent speech, even without language-specific phonological knowledge. Second, once infants have identified a set of lexical forms, they can learn from the distribution of acoustic features across those word forms. The current experiments demonstrate both processes are available to 5-month-old infants. This is an earlier age than prior demonstration of sensitivity to statistical structure in speech, and consistent with theoretical accounts that claim statistical learning plays a role in helping infants to adapt to the structure of their native language from very early in life.

  14. Perception of speech rhythm in second language: the case of rhythmically similar L1 and L2.

    Science.gov (United States)

    Ordin, Mikhail; Polyanskaya, Leona

    2015-01-01

    We investigated the perception of developmental changes in timing patterns that happen in the course of second language (L2) acquisition, provided that the native and the target languages of the learner are rhythmically similar (German and English). It was found that speech rhythm in L2 English produced by German learners becomes increasingly stress-timed as acquisition progresses. This development is captured by the tempo-normalized rhythm measures of durational variability. Advanced learners also deliver speech at a faster rate. However, when native speakers have to classify the timing patterns characteristic of L2 English of German learners at different proficiency levels, they attend to speech rate cues and ignore the differences in speech rhythm. PMID:25859228

  15. Neurocognitive mechanisms of audiovisual speech perception

    OpenAIRE

    Ojanen, Ville

    2005-01-01

    Face-to-face communication involves both hearing and seeing speech. Heard and seen speech inputs interact during audiovisual speech perception. Specifically, seeing the speaker's mouth and lip movements improves identification of acoustic speech stimuli, especially in noisy conditions. In addition, visual speech may even change the auditory percept. This occurs when mismatching auditory speech is dubbed onto visual articulation. Research on the brain mechanisms of audiovisual perception a...

  16. Effect of stimuli presentation method on perception of room size using only acoustic cues

    Science.gov (United States)

    Hunt, Jeffrey Barnabas

    People listen to music and speech in a large variety of rooms and many room parameters, including the size of the room, can drastically affect how well the speech is understood or the music enjoyed. While multi-modal (typically hearing and sight) tests may be more realistic, in order to isolate what acoustic cues listeners use to determine the size of a room, a listening-only tests is conducted here. Nearly all of the studies to-date on the perception of room volume using acoustic cues have presented the stimuli only over headphones and these studies have reported that, in most cases, the perceived room volume is more highly correlated with the perceived reverberation (reverberance) than with actual room volume. While reverberance may be a salient acoustic cue used for the determination or room size, the actual sound field in a room is not accurately reproduced when presented over headphones and it is thought that some of the complexities of the sound field that relate to perception of geometric volume, specifically directional information of reflections, may be lost. It is possible that the importance of reverberance may be overemphasized when using only headphones to present stimuli so a comparison of room-size perception is proposed where the sound field (from modeled and recorded impulse responses) is presented both over headphones and also over a surround system using higher order ambisonics to more accurately produce directional sound information. Major results are that, in this study, no difference could be seen between the two presentation methods and that reverberation time is highly correlated to room-size perception while real room size is not.

  17. Deciphering faces: quantifiable visual cues to weight.

    Science.gov (United States)

    Coetzee, Vinet; Chen, Jingying; Perrett, David I; Stephen, Ian D

    2010-01-01

    Body weight plays a crucial role in mate choice, as weight is related to both attractiveness and health. People are quite accurate at judging weight in faces, but the cues used to make these judgments have not been defined. This study consisted of two parts. First, we wanted to identify quantifiable facial cues that are related to body weight, as defined by body mass index (BMI). Second, we wanted to test whether people use these cues to judge weight. In study 1, we recruited two groups of Caucasian and two groups of African participants, determined their BMI and measured their 2-D facial images for: width-to-height ratio, perimeter-to-area ratio, and cheek-to-jaw-width ratio. All three measures were significantly related to BMI in males, while the width-to-height and cheek-to-jaw-width ratios were significantly related to BMI in females. In study 2, these images were rated for perceived weight by Caucasian observers. We showed that these observers use all three cues to judge weight in African and Caucasian faces of both sexes. These three facial cues, width-to-height ratio, perimeter-to-area ratio, and cheek-to-jaw-width ratio, are therefore not only related to actual weight but provide a basis for perceptual attributes as well. PMID:20301846

  18. Enhancing Manual Scan Registration Using Audio Cues

    Science.gov (United States)

    Ntsoko, T.; Sithole, G.

    2014-04-01

    Indoor mapping and modelling requires that acquired data be processed by editing, fusing, formatting the data, amongst other operations. Currently the manual interaction the user has with the point cloud (data) while processing it is visual. Visual interaction does have limitations, however. One way of dealing with these limitations is to augment audio in point cloud processing. Audio augmentation entails associating points of interest in the point cloud with audio objects. In coarse scan registration, reverberation, intensity and frequency audio cues were exploited to help the user estimate depth and occupancy of space of points of interest. Depth estimations were made reliably well when intensity and frequency were both used as depth cues. Coarse changes of depth could be estimated in this manner. The depth between surfaces can therefore be estimated with the aid of the audio objects. Sound reflections of an audio object provided reliable information of the object surroundings in some instances. For a point/area of interest in the point cloud, these reflections can be used to determine the unseen events around that point/area of interest. Other processing techniques could benefit from this while other information is estimated using other audio cues like binaural cues and Head Related Transfer Functions. These other cues could be used in position estimations of audio objects to aid in problems such as indoor navigation problems.

  19. Scene-based contextual cueing in pigeons.

    Science.gov (United States)

    Wasserman, Edward A; Teng, Yuejia; Brooks, Daniel I

    2014-10-01

    Repeated pairings of a particular visual context with a specific location of a target stimulus facilitate target search in humans. We explored an animal model of such contextual cueing. Pigeons had to peck a target, which could appear in 1 of 4 locations on color photographs of real-world scenes. On half of the trials, each of 4 scenes was consistently paired with 1 of 4 possible target locations; on the other half of the trials, each of 4 different scenes was randomly paired with the same 4 possible target locations. In Experiments 1 and 2, pigeons exhibited robust contextual cueing when the context preceded the target by 1 s to 8 s, with reaction times to the target being shorter on predictive-scene trials than on random-scene trials. Pigeons also responded more frequently during the delay on predictive-scene trials than on random-scene trials; indeed, during the delay on predictive-scene trials, pigeons predominately pecked toward the location of the upcoming target, suggesting that attentional guidance contributes to contextual cueing. In Experiment 3, involving left-right and top-bottom scene reversals, pigeons exhibited stronger control by global than by local scene cues. These results attest to the robustness and associative basis of contextual cueing in pigeons. PMID:25546098

  20. Enhancing Peer Feedback and Speech Preparation: The Speech Video Activity

    Science.gov (United States)

    Opt, Susan

    2012-01-01

    In the typical public speaking course, instructors or assistants videotape or digitally record at least one of the term's speeches in class or lab to offer students additional presentation feedback. Students often watch and self-critique their speeches on their own. Peers often give only written feedback on classroom presentations or completed…

  1. Preschool children use linguistic form class and pragmatic cues to interpret generics.

    Science.gov (United States)

    Gelman, Susan A; Raman, Lakshmi

    2003-01-01

    Generic noun phrases (e.g., "Bats live in caves") are important for expressing knowledge about abstract kinds. Past work has found that parents frequently use generic noun phrases in their speech to young children. However, little is known regarding how children understand these expressions, nor which cues signal generic meaning. The present set of 5 studies examined the influence of linguistic form class (e.g., "What color are dogs?" [generic] versus "What color are the dogs?" [nongeneric]) and of pragmatic context (e.g., "What color are they?" in the presence of either a single exemplar [generic] or multiple exemplars [nongeneric]). Participants were 2-year-olds (N = 42), 3-year-olds (N = 40), 4-year-olds (N = 40), and adults (N = 51). The data indicate that by 2 years of age, children use linguistic form class, and by 3 years of age, children use pragmatic context. These findings demonstrate that young children have begun to understand the distinction between generic and nongeneric noun phrases from a very young age, and that identification of generics is signaled not by formal or pragmatic cues alone, but by a combination of information from both language form and pragmatic context. It is suggested that children make use of multiple linguistic and conceptual cues to acquire and interpret generics. PMID:12625452

  2. Auditory detection of non-speech and speech stimuli in noise: Native speech advantage.

    Science.gov (United States)

    Huo, Shuting; Tao, Sha; Wang, Wenjing; Li, Mingshuang; Dong, Qi; Liu, Chang

    2016-05-01

    Detection thresholds of Chinese vowels, Korean vowels, and a complex tone, with harmonic and noise carriers were measured in noise for Mandarin Chinese-native listeners. The harmonic index was calculated as the difference between detection thresholds of the stimuli with harmonic carriers and those with noise carriers. The harmonic index for Chinese vowels was significantly greater than that for Korean vowels and the complex tone. Moreover, native speech sounds were rated significantly more native-like than non-native speech and non-speech sounds. The results indicate that native speech has an advantage over other sounds in simple auditory tasks like sound detection. PMID:27250202

  3. Phonetic recognition of natural speech by nonstationary Markov models

    Science.gov (United States)

    Falaschi, Alessandro

    1988-04-01

    A speech recognition system based on statistical decision theory, viewing the problem as the classical design of a decoder in a communication system framework is outlined. Statistical properties of the language are used to characterize the allowable phonetic sequence inside the words, while trying to capture allophonic phoneme features into functional-dependent acoustical models with the aim of utilizing them as word segmentation cues. Experiments prove the utility of an explicit modeling of the intrinsic speech nonstationarity in a statistically based speech recognition system. The nonstationarity of phonetic chain statistics and acoustical transition probabilities can be easily taken into account, yielding recognition improvements. The use of inside syllable position dependent phonetic models does not improve recognition performance, and the iterative Viterbi training algorithm seems unable to adequately valorize this kind of acoustical modeling. As a direct consequence of the system design, the recognized phonetic sequence exhibits word boundary marks even in absence of pauses between words, thus giving anchor points to the higher level parsing algorithms needed in a complete recognition system.

  4. Feeling backwards? How temporal order in speech affects the time course of vocal emotion recognition

    Directory of Open Access Journals (Sweden)

    SimonRigoulot

    2013-06-01

    Full Text Available Recent studies suggest that the time course for recognizing vocal expressions of basic emotion in speech varies significantly by emotion type, implying that listeners uncover acoustic evidence about emotions at different rates in speech (e.g., fear is recognized most quickly whereas happiness and disgust are recognized relatively slowly, Pell and Kotz, 2011. To investigate whether vocal emotion recognition is largely dictated by the amount of time listeners are exposed to speech or the position of critical emotional cues in the utterance, 40 English participants judged the meaning of emotionally-inflected pseudo-utterances presented in a gating paradigm, where utterances were gated as a function of their syllable structure in segments of increasing duration from the end of the utterance (i.e., gated ‘backwards’. Accuracy for detecting six target emotions in each gate condition and the mean identification point for each emotion in milliseconds were analyzed and compared to results from Pell & Kotz (2011. We again found significant emotion-specific differences in the time needed to accurately recognize emotions from speech prosody, and new evidence that utterance-final syllables tended to facilitate listeners’ accuracy in many conditions when compared to utterance-initial syllables. The time needed to recognize fear, anger, sadness, and neutral from speech cues was not influenced by how utterances were gated, although happiness and disgust were recognized significantly faster when listeners heard the end of utterances first. Our data provide new clues about the relative time course for recognizing vocally-expressed emotions within the 400-1200 millisecond time window, while highlighting that emotion recognition from prosody can be shaped by the temporal properties of speech.

  5. Distracted by cues for suppressed memories.

    Science.gov (United States)

    Hertel, Paula T; Hayes, Jeffrey A

    2015-06-01

    We examined the potential cost of practicing suppression of negative thoughts on subsequent performance in an unrelated task. Cues for previously suppressed and unsuppressed (baseline) responses in a think/no-think procedure were displayed as irrelevant flankers for neutral words to be judged for emotional valence. These critical flankers were homographs with one negative meaning denoted by their paired response during learning. Responses to the targets were delayed when suppression cues (compared with baseline cues and new negative homographs) were used as flankers, but only following direct-suppression instructions and not when benign substitutes had been provided to aid suppression. On a final recall test, suppression-induced forgetting following direct suppression and the flanker task was positively correlated with the flanker effect. Experiment 2 replicated these findings. Finally, valence ratings of neutral targets were influenced by the valence of the flankers but not by the prior role of the negative flankers. PMID:25904596

  6. Perceptual learning in speech

    OpenAIRE

    D. Norris; McQueen, J; Cutler, A.

    2003-01-01

    This study demonstrates that listeners use lexical knowledge in perceptual learning of speech sounds. Dutch listeners first made lexical decisions on Dutch words and nonwords. The final fricative of 20 critical words had been replaced by an ambiguous sound, between [f] and [s]. One group of listeners heard ambiguous [f]-final words (e.g., [WI tlo?], from witlof, chicory) and unambiguous [s]-final words (e.g., naaldbos, pine forest). Another group heard the reverse (e.g., ambiguous [na:ldbo?],...

  7. Taking a Stand for Speech.

    Science.gov (United States)

    Moore, Wayne D.

    1995-01-01

    Asserts that freedom of speech issues were among the first major confrontations in U.S. constitutional law. Maintains that lessons from the controversies surrounding the Sedition Act of 1798 have continuing practical relevance. Describes and discusses the significance of freedom of speech to the U.S. political system. (CFR)

  8. Speech Prosody in Cerebellar Ataxia

    Science.gov (United States)

    Casper, Maureen A.; Raphael, Lawrence J.; Harris, Katherine S.; Geibel, Jennifer M.

    2007-01-01

    Persons with cerebellar ataxia exhibit changes in physical coordination and speech and voice production. Previously, these alterations of speech and voice production were described primarily via perceptual coordinates. In this study, the spatial-temporal properties of syllable production were examined in 12 speakers, six of whom were healthy…

  9. Coordinated sensor cueing for chemical plume detection

    Science.gov (United States)

    Abraham, Nathan J.; Jensenius, Andrea M.; Watkins, Adam S.; Hawthorne, R. Chad; Stepnitz, Brian J.

    2011-05-01

    This paper describes an organic data fusion and sensor cueing approach for Chemical, Biological, Radiological, and Nuclear (CBRN) sensors. The Joint Warning and Reporting Network (JWARN) uses a hardware component referred to as the JWARN Component Interface Device (JCID). The Edgewood Chemical and Biological Center has developed a small footprint and open architecture solution for the JCID capability called JCID-on-a-Chip (JoaC). The JoaC program aims to reduce the cost and complexity of the JCID by shrinking the necessary functionality down to a small single board computer. This effort focused on development of a fusion and cueing algorithm organic to the JoaC hardware. By embedding this capability in the JoaC, sensors have the ability to receive and process cues from other sensors without the use of a complex and costly centralized infrastructure. Additionally, the JoaC software is hardware agnostic, as evidenced by its drop-in inclusion in two different system-on-a-chip platforms including Windows CE and LINUX environments. In this effort, a partnership between JPM-CA, JHU/APL, and the Edgewood Chemical and Biological Center (ECBC), the authors implemented and demonstrated a new algorithm for cooperative detection and localization of a chemical agent plume. This experiment used a pair of mobile Joint Services Lightweight Standoff Chemical Agent Detector (JSLSCAD) units which were controlled by fusion and cueing algorithms hosted on a JoaC. The algorithms embedded in the JoaC enabled the two sensor systems to perform cross cueing and cooperatively form a higher fidelity estimate of chemical releases by combining sensor readings. Additionally, each JSLSCAD had the ability to focus its search on smaller regions than those required by a single sensor system by using the cross cue information from the other sensor.

  10. Quality Estimation of Alaryngeal Speech

    Directory of Open Access Journals (Sweden)

    R.Dhivya

    2014-01-01

    Full Text Available Quality assessment can be done using subjective listening tests or using objective quality measures. Objective measures quantify quality. The sentence material is chosen from IEEE corpus. Real world noise data was taken from the noisy speech corpus NOIZEUS. Alaryngeal speaker‘s voice (alaryngeal speech is recorded. To enhance the quality of speech produced from the prosthetic device, four classes of enhancement methods encompassing four algorithms mband spectral subtraction algorithm, Karhunen–Loéve transform (KLT subspace algorithm, MASK statistical-model based algorithm and Wavelet Threshold-Wiener algorithm are used. The enhanced speech signals obtained from the four classes of algorithms are evaluated using Perceptual Evaluation of Speech Quality (PESQ. Spectrograms of these enhanced signals are also plotted.

  11. Spatial localization of speech segments

    DEFF Research Database (Denmark)

    Karlsen, Brian Lykkegaard

    1999-01-01

    Much is known about human localization of simple stimuli like sinusoids, clicks, broadband noise and narrowband noise in quiet. Less is known about human localization in noise. Even less is known about localization of speech and very few previous studies have reported data from localization of...... distribution of which azimuth angle the target is likely to have originated from. The model is trained on the experimental data. On the basis of the experimental results, it is concluded that the human ability to localize speech segments in adverse noise depends on the speech segment as well as its point of...... speech in noise. This study attempts to answer the question: ``Are there certain features of speech which have an impact on the human ability to determine the spatial location of a speaker in the horizontal plane under adverse noise conditions?''. The study consists of an extensive literature survey on...

  12. Speech Compression Using Multecirculerletet Transform

    Directory of Open Access Journals (Sweden)

    Sulaiman Murtadha

    2012-01-01

    Full Text Available Compressing the speech reduces the data storage requirements, leading to reducing the time of transmitting the digitized speech over long-haul links like internet. To obtain best performance in speech compression, wavelet transforms require filters that combine a number of desirable properties, such as orthogonality and symmetry.The MCT bases functions are derived from GHM bases function using 2D linear convolution .The fast computation algorithm methods introduced here added desirable features to the current transform. We further assess the performance of the MCT in speech compression application. This paper discusses the effect of using DWT and MCT (one and two dimension on speech compression. DWT and MCT performances in terms of compression ratio (CR, mean square error (MSE and peak signal to noise ratio (PSNR are assessed. Computer simulation results indicate that the two dimensions MCT offer a better compression ratio, MSE and PSNR than DWT.

  13. Techniques for automatic speech recognition

    Science.gov (United States)

    Moore, R. K.

    1983-05-01

    A brief insight into some of the algorithms that lie behind current automatic speech recognition system is provided. Early phonetically based approaches were not particularly successful, due mainly to a lack of appreciation of the problems involved. These problems are summarized, and various recognition techniques are reviewed in the contect of the solutions that they provide. It is pointed out that the majority of currently available speech recognition equipments employ a "whole-word' pattern matching approach which, although relatively simple, has proved particularly successful in its ability to recognize speech. The concepts of time-normalizing plays a central role in this type of recognition process and a family of such algorithms is described in detail. The technique of dynamic time warping is not only capable of providing good performance for isolated word recognition, but how it is also extended to the recognition of connected speech (thereby removing one of the most severe limitations of early speech recognition equipment).

  14. Hammerstein Model for Speech Coding

    Directory of Open Access Journals (Sweden)

    Turunen Jari

    2003-01-01

    Full Text Available A nonlinear Hammerstein model is proposed for coding speech signals. Using Tsay's nonlinearity test, we first show that the great majority of speech frames contain nonlinearities (over 80% in our test data when using 20-millisecond speech frames. Frame length correlates with the level of nonlinearity: the longer the frames the higher the percentage of nonlinear frames. Motivated by this result, we present a nonlinear structure using a frame-by-frame adaptive identification of the Hammerstein model parameters for speech coding. Finally, the proposed structure is compared with the LPC coding scheme for three phonemes /a/, /s/, and /k/ by calculating the Akaike information criterion of the corresponding residual signals. The tests show clearly that the residual of the nonlinear model presented in this paper contains significantly less information compared to that of the LPC scheme. The presented method is a potential tool to shape the residual signal in an encode-efficient form in speech coding.

  15. Early syllabic segmentation of fluent speech by infants acquiring French.

    Directory of Open Access Journals (Sweden)

    Louise Goyet

    Full Text Available Word form segmentation abilities emerge during the first year of life, and it has been proposed that infants initially rely on two types of cues to extract words from fluent speech: Transitional Probabilities (TPs and rhythmic units. The main goal of the present study was to use the behavioral method of the Headturn Preference Procedure (HPP to investigate again rhythmic segmentation of syllabic units by French-learning infants at the onset of segmentation abilities (around 8 months given repeated failure to find syllabic segmentation at such a young age. The second goal was to explore the interaction between the use of TPs and syllabic units for segmentation by French-learning infants. The rationale was that decreasing TP cues around target syllables embedded in bisyllabic words would block bisyllabic word segmentation and facilitate the observation of syllabic segmentation. In Experiments 1 and 2, infants were tested in a condition of moderate TP decrease; no evidence of either syllabic or bisyllabic word segmentation was found. In Experiment 3, infants were tested in a condition of more marked TP decrease, and a novelty syllabic segmentation effect was observed. Therefore, the present study first establishes early syllabic segmentation in French-learning infants, bringing support from a syllable-based language to the proposal that rhythmic units are used at the onset of segmentation abilities. Second, it confirms that French-learning infants are sensitive to TP cues. Third, it demonstrates that they are sensitive to the relative weight of TP and rhythmic cues, explaining why effects of syllabic segmentation are not observed in context of high TPs. These findings are discussed in relation to theories of word segmentation bootstrapping, and the larger debate about statistically- versus prosodically-based accounts of early language acquisition.

  16. Early syllabic segmentation of fluent speech by infants acquiring French.

    Science.gov (United States)

    Goyet, Louise; Nishibayashi, Léo-Lyuki; Nazzi, Thierry

    2013-01-01

    Word form segmentation abilities emerge during the first year of life, and it has been proposed that infants initially rely on two types of cues to extract words from fluent speech: Transitional Probabilities (TPs) and rhythmic units. The main goal of the present study was to use the behavioral method of the Headturn Preference Procedure (HPP) to investigate again rhythmic segmentation of syllabic units by French-learning infants at the onset of segmentation abilities (around 8 months) given repeated failure to find syllabic segmentation at such a young age. The second goal was to explore the interaction between the use of TPs and syllabic units for segmentation by French-learning infants. The rationale was that decreasing TP cues around target syllables embedded in bisyllabic words would block bisyllabic word segmentation and facilitate the observation of syllabic segmentation. In Experiments 1 and 2, infants were tested in a condition of moderate TP decrease; no evidence of either syllabic or bisyllabic word segmentation was found. In Experiment 3, infants were tested in a condition of more marked TP decrease, and a novelty syllabic segmentation effect was observed. Therefore, the present study first establishes early syllabic segmentation in French-learning infants, bringing support from a syllable-based language to the proposal that rhythmic units are used at the onset of segmentation abilities. Second, it confirms that French-learning infants are sensitive to TP cues. Third, it demonstrates that they are sensitive to the relative weight of TP and rhythmic cues, explaining why effects of syllabic segmentation are not observed in context of high TPs. These findings are discussed in relation to theories of word segmentation bootstrapping, and the larger debate about statistically- versus prosodically-based accounts of early language acquisition. PMID:24244536

  17. The proactive bilingual brain: Using interlocutor identity to generate predictions for language processing.

    Science.gov (United States)

    Martin, Clara D; Molnar, Monika; Carreiras, Manuel

    2016-01-01

    The present study investigated the proactive nature of the human brain in language perception. Specifically, we examined whether early proficient bilinguals can use interlocutor identity as a cue for language prediction, using an event-related potentials (ERP) paradigm. Participants were first familiarized, through video segments, with six novel interlocutors who were either monolingual or bilingual. Then, the participants completed an audio-visual lexical decision task in which all the interlocutors uttered words and pseudo-words. Critically, the speech onset started about 350 ms after the beginning of the video. ERP waves between the onset of the visual presentation of the interlocutors and the onset of their speech significantly differed for trials where the language was not predictable (bilingual interlocutors) and trials where the language was predictable (monolingual interlocutors), revealing that visual interlocutor identity can in fact function as a cue for language prediction, even before the onset of the auditory-linguistic signal. PMID:27173937

  18. PCA-Based Speech Enhancement for Distorted Speech Recognition

    Directory of Open Access Journals (Sweden)

    Tetsuya Takiguchi

    2007-09-01

    Full Text Available We investigated a robust speech feature extraction method using kernel PCA (Principal Component Analysis for distorted speech recognition. Kernel PCA has been suggested for various image processing tasks requiring an image model, such as denoising, where a noise-free image is constructed from a noisy input image. Much research for robust speech feature extraction has been done, but it remains difficult to completely remove additive or convolution noise (distortion. The most commonly used noise-removal techniques are based on the spectraldomain operation, and then for speech recognition, the MFCC (Mel Frequency Cepstral Coefficient is computed, where DCT (Discrete Cosine Transform is applied to the mel-scale filter bank output. This paper describes a new PCA-based speech enhancement algorithm using kernel PCA instead of DCT, where the main speech element is projected onto low-order features, while the noise or distortion element is projected onto high-order features. Its effectiveness is confirmed by word recognition experiments on distorted speech.

  19. Hate Speech or Free Speech: Can Broad Campus Speech Regulations Survive Current Judicial Reasoning?

    Science.gov (United States)

    Heiser, Gregory M.; Rossow, Lawrence F.

    1993-01-01

    Federal courts have found speech regulations overbroad in suits against the University of Michigan and the University of Wisconsin System. Attempts to assess the theoretical justification and probable fate of broad speech regulations that have not been explicitly rejected by the courts. Concludes that strong arguments for broader regulation will…

  20. Cue reactivity in virtual reality: the role of context.

    Science.gov (United States)

    Paris, Megan M; Carter, Brian L; Traylor, Amy C; Bordnick, Patrick S; Day, Susan X; Armsworth, Mary W; Cinciripini, Paul M

    2011-07-01

    Cigarette smokers in laboratory experiments readily respond to smoking stimuli with increased craving. An alternative to traditional cue-reactivity methods (e.g., exposure to cigarette photos), virtual reality (VR) has been shown to be a viable cue presentation method to elicit and assess cigarette craving within complex virtual environments. However, it remains poorly understood whether contextual cues from the environment contribute to craving increases in addition to specific cues, like cigarettes. This study examined the role of contextual cues in a VR environment to evoke craving. Smokers were exposed to a virtual convenience store devoid of any specific cigarette cues followed by exposure to the same convenience store with specific cigarette cues added. Smokers reported increased craving following exposure to the virtual convenience store without specific cues, and significantly greater craving following the convenience store with cigarette cues added. However, increased craving recorded after the second convenience store may have been due to the pre-exposure to the first convenience store. This study offers evidence that an environmental context where cigarette cues are normally present (but are not), elicits significant craving in the absence of specific cigarette cues. This finding suggests that VR may have stronger ecological validity over traditional cue reactivity exposure methods by exposing smokers to the full range of cigarette-related environmental stimuli, in addition to specific cigarette cues, that smokers typically experience in their daily lives. PMID:21349649

  1. Effects of similarity on environmental context cueing.

    Science.gov (United States)

    Smith, Steven M; Handy, Justin D; Angello, Genna; Manzano, Isabel

    2014-01-01

    Three experiments examined the prediction that context cues which are similar to study contexts can facilitate episodic recall, even if those cues are never seen before the recall test. Environmental context cueing effects have typically produced such small effect sizes that influences of moderating factors, such as the similarity between encoding and retrieval contexts, would be difficult to observe experimentally. Videos of environmental contexts, however, can be used to produce powerful context-dependent memory effects, particularly when only one memory target is associated with each video context, intentional item-context encoding is encouraged, and free recall tests are used. Experiment 1 showed that a not previously viewed video of the study context provided an effective recall cue, although it was not as effective as the originally viewed video context. Experiments 2 and 3 showed that videos of environments that were conceptually similar to encoding contexts (e.g., both were videos of ball field games) also cued recall, but not as well if the encoding contexts were given specific labels (e.g., "home run") incompatible with test contexts (e.g., a soccer scene). A fourth experiment that used incidental item-context encoding showed that video context reinstatement has a robust effect on paired associate memory, indicating that the video context reinstatement effect does not depend on interactive item-context encoding or free recall testing. PMID:23721293

  2. Object Cueing System For Infrared Images

    Science.gov (United States)

    Ranganath, H. S.; McIngvale, Pat; Speigle, Scott

    1987-09-01

    This paper considers the design of an object cueing system as a rule-based expert system. The architecture is modular and the control strategy permits dynamic scheduling of tasks. In this approach, results of several algorithms and many object recognition heuristics are combined to achieve better performance levels. Importance of spatial knowledge representatiOn is also discussed.

  3. Verbal Cueing as a Behavior Change Instrument.

    Science.gov (United States)

    Prieto, Alfonso G.; Rutherford, Robert B., Jr.

    A study involving four boys (9 to 14 years old) labeled as emotionally handicapped was conducted to examine the effect of a verbal cueing technique (involving an illogical statement which evokes psychological reactance) on behaviorally disordered children. Illogical statements made by the teacher produced positive change in target behaviors (such…

  4. Hate Speech/Free Speech: Using Feminist Perspectives To Foster On-Campus Dialogue.

    Science.gov (United States)

    Cornwell, Nancy; Orbe, Mark P.; Warren, Kiesha

    1999-01-01

    Explores the complex issues inherent in the tension between hate speech and free speech, focusing on the phenomenon of hate speech on college campuses. Describes the challenges to hate speech made by critical race theorists and explains how a feminist critique can reorient the parameters of hate speech. (SLD)

  5. Effect of context, rebinding and noise, on audiovisual speech fusion

    OpenAIRE

    Attigodu, Ganesh; Berthommier, Frédéric; Nahorna, Olha; Schwartz, Jean-Luc

    2013-01-01

    In a previous set of experiments we showed that audio-visual fusion during the McGurk effect may be modulated by context. A short context (2 to 4 syllables) composed of incoherent auditory and visual material significantly decreases the McGurk effect. We interpreted this as showing the existence of an audiovisual "binding" stage controlling the fusion process, and we also showed the existence of a "rebinding" process when an incoherent material is followed by a short coherent material. In thi...

  6. The ability of left- and right-hemisphere damaged individuals to produce prosodic cues to disambiguate Korean idiomatic sentences

    Directory of Open Access Journals (Sweden)

    Seung-Yun Yang

    2014-05-01

    Three speech language pathologists with training in phonetics participated as raters for vocal qualities. Nasality was significantly salient vocal quality of idiomatic utterances. Conclusion The findings support that (1 LHD negatively affected the production of durational cues and RHD negatively affected the production of fundamental frequency cues in idiomatic-literal contrasts; (2 healthy listeners successfully identified idiomatic and literal versions of ambiguous sentences produced by healthy speakers but not by RHD speakers; (3 Productions in brain-damaged participants approximated HC’s measures in the repetition tasks, but not in the elicitation tasks; (4 Nasal voice quality was judged to be associated with idiomatic utterances in all groups of participants. Findings agree with previous studies indicating HC’s abilities to discriminate literal versus idiomatic meanings in ditropically ambiguous idioms, as well as deficient processing of pitch production and impaired pragmatic ability in RHD.

  7. Who wants to be a blabbermouth? Prosodic cues to correct answers in the WWTBAM quiz show scenario

    DEFF Research Database (Denmark)

    Niebuhr, Oliver

    Starting from previous research on the prosodic patterns of emotion, psychological stress and deceptive speech, the paper investigates whether quizmasters convey telltale cues to correct answers in the popular four alternatives (a/b/c/d) framework of "Who Wants to Be a Millionaire?" (WWTBAM). We...... telltale signs of correct answers. These telltale signs were consistent across all quizmasters, but complex insofar as they differed across question positions (a/b/c/d) could not be found in the introductory letters. Cues to correct answers involved timing and range of F0 and intensity patterns, speaking...... rate, and degree of final lengthening; pause durations between answers and introductory letters were irrelevant. The results are discussed with respect to their implications for real quizshows and the elicitation of emotions and stress in the lab....

  8. The Stylistic Analysis of Public Speech

    Institute of Scientific and Technical Information of China (English)

    李龙

    2011-01-01

    Public speech is a very important part in our daily life.The ability to deliver a good public speech is something we need to learn and to have,especially,in the service sector.This paper attempts to analyze the style of public speech,in the hope of providing inspiration to us whenever delivering such a speech.

  9. Phonetic Recalibration Only Occurs in Speech Mode

    Science.gov (United States)

    Vroomen, Jean; Baart, Martijn

    2009-01-01

    Upon hearing an ambiguous speech sound dubbed onto lipread speech, listeners adjust their phonetic categories in accordance with the lipread information (recalibration) that tells what the phoneme should be. Here we used sine wave speech (SWS) to show that this tuning effect occurs if the SWS sounds are perceived as speech, but not if the sounds…

  10. Integranting prosodic information into a speech recogniser

    OpenAIRE

    López Soto, María Teresa

    2001-01-01

    In the last decade there has been an increasing tendency to incorporate language engineering strategies into speech technology. This technique combines linguistic and mathematical information in different applications: machine translation, natural language processing, speech synthesis and automatic speech recognition (ASR). In the field of speech synthesis, this hybrid approach (linguistic and mathematical/statistical) has led to the design of efficient models for reproducin...

  11. Infant Perception of Atypical Speech Signals

    Science.gov (United States)

    Vouloumanos, Athena; Gelfand, Hanna M.

    2013-01-01

    The ability to decode atypical and degraded speech signals as intelligible is a hallmark of speech perception. Human adults can perceive sounds as speech even when they are generated by a variety of nonhuman sources including computers and parrots. We examined how infants perceive the speech-like vocalizations of a parrot. Further, we examined how…

  12. From data to speech: a general approach

    NARCIS (Netherlands)

    Theune, M.; Klabbers, E.A.M.; Pijper, de J.R.; Krahmer, E.; Odijk, J.; Boguraev, B.; Tait, J.; Jacquemin, C.

    2001-01-01

    We present a data-to-speech system called D2S, which can be used for the creation of data-to-speech systems in different languages and domains. The most important characteristic of a data-to-speech system is that it combines language and speech generation: language generation is used to produce a na

  13. Automated Speech Rate Measurement in Dysarthria

    Science.gov (United States)

    Martens, Heidi; Dekens, Tomas; Van Nuffelen, Gwen; Latacz, Lukas; Verhelst, Werner; De Bodt, Marc

    2015-01-01

    Purpose: In this study, a new algorithm for automated determination of speech rate (SR) in dysarthric speech is evaluated. We investigated how reliably the algorithm calculates the SR of dysarthric speech samples when compared with calculation performed by speech-language pathologists. Method: The new algorithm was trained and tested using Dutch…

  14. The (unclear effects of invalid retro-cues.

    Directory of Open Access Journals (Sweden)

    Marcel eGressmann

    2016-03-01

    Full Text Available Studies with the retro-cue paradigm have shown that validly cueing objects in visual working memory long after encoding can still benefit performance on subsequent change detection tasks. With regard to the effects of invalid cues, the literature is less clear. Some studies reported costs, others did not. We here revisit two recent studies that made interesting suggestions concerning invalid retro-cues: One study suggested that costs only occur for larger set sizes, and another study suggested that inclusion of invalid retro-cues diminishes the retro-cue benefit. New data from one experiment and a reanalysis of published data are provided to address these conclusions. The new data clearly show costs (and benefits that were independent of set size, and the reanalysis suggests no influence of the inclusion of invalid retro-cues on the retro-cue benefit. Thus, previous interpretations may be taken with some caution at present.

  15. A pilot evaluation of two G-seat cueing schemes

    Science.gov (United States)

    Showalter, T. W.

    1978-01-01

    A comparison was made of two contrasting G-seat cueing schemes. The G-seat, an aircraft simulation subsystem, creates aircraft acceleration cues via seat contour changes. Of the two cueing schemes tested, one was designed to create skin pressure cues and the other was designed to create body position cues. Each cueing scheme was tested and evaluated subjectively by five pilots regarding its ability to cue the appropriate accelerations in each of four simple maneuvers: a pullout, a pushover, an S-turn maneuver, and a thrusting maneuver. A divergence of pilot opinion occurred, revealing that the perception and acceptance of G-seat stimuli is a highly individualistic phenomena. The creation of one acceptable G-seat cueing scheme was, therefore, deemed to be quite difficult.

  16. Speech recognition from spectral dynamics

    Indian Academy of Sciences (India)

    Hynek Hermansky

    2011-10-01

    Information is carried in changes of a signal. The paper starts with revisiting Dudley’s concept of the carrier nature of speech. It points to its close connection to modulation spectra of speech and argues against short-term spectral envelopes as dominant carriers of the linguistic information in speech. The history of spectral representations of speech is briefly discussed. Some of the history of gradual infusion of the modulation spectrum concept into Automatic recognition of speech (ASR) comes next, pointing to the relationship of modulation spectrum processing to wellaccepted ASR techniques such as dynamic speech features or RelAtive SpecTrAl (RASTA) filtering. Next, the frequency domain perceptual linear prediction technique for deriving autoregressive models of temporal trajectories of spectral power in individual frequency bands is reviewed. Finally, posterior-based features, which allow for straightforward application of modulation frequency domain information, are described. The paper is tutorial in nature, aims at a historical global overview of attempts for using spectral dynamics in machine recognition of speech, and does not always provide enough detail of the described techniques. However, extensive references to earlier work are provided to compensate for the lack of detail in the paper.

  17. INTEGRATING MACHINE TRANSLATION AND SPEECH SYNTHESIS COMPONENT FOR ENGLISH TO DRAVIDIAN LANGUAGE SPEECH TO SPEECH TRANSLATION SYSTEM

    Directory of Open Access Journals (Sweden)

    J. SANGEETHA

    2015-02-01

    Full Text Available This paper provides an interface between the machine translation and speech synthesis system for converting English speech to Tamil text in English to Tamil speech to speech translation system. The speech translation system consists of three modules: automatic speech recognition, machine translation and text to speech synthesis. Many procedures for incorporation of speech recognition and machine translation have been projected. Still speech synthesis system has not yet been measured. In this paper, we focus on integration of machine translation and speech synthesis, and report a subjective evaluation to investigate the impact of speech synthesis, machine translation and the integration of machine translation and speech synthesis components. Here we implement a hybrid machine translation (combination of rule based and statistical machine translation and concatenative syllable based speech synthesis technique. In order to retain the naturalness and intelligibility of synthesized speech Auto Associative Neural Network (AANN prosody prediction is used in this work. The results of this system investigation demonstrate that the naturalness and intelligibility of the synthesized speech are strongly influenced by the fluency and correctness of the translated text.

  18. Extinction of Drug Cue Reactivity in Methamphetamine-Dependent Individuals

    OpenAIRE

    Price, Kimber L.; Saladin, Michael E.; Baker, Nathaniel L.; Tolliver, Bryan K.; DeSantis, Stacia M.; McRae-Clark, Aimee L.; Brady, Kathleen T.

    2010-01-01

    Conditioned responses to drug-related environmental cues (such as craving) play a critical role in relapse to drug use. Animal models demonstrate that repeated exposure to drug-associated cues in the absence of drug administration leads to the extinction of conditioned responses, but the few existing clinical trials focused on extinction of conditioned responses to drug-related cues in drug-dependent individuals show equivocal results. The current study examined drug-related cue reactivity an...

  19. Cue Reactivity in Virtual Reality: The Role of Context

    OpenAIRE

    Paris, Megan M.; Carter, Brian L.; Traylor, Amy C.; Bordnick, Patrick S.; Day, Susan X.; Armsworth, Mary W.; Cinciripini, Paul M.

    2011-01-01

    Cigarette smokers in laboratory experiments readily respond to smoking stimuli with increased craving. An alternative to traditional cue-reactivity methods (e.g., exposure to cigarette photos), virtual reality (VR) has been shown to be a viable cue presentation method to elicit and assess cigarette craving within complex virtual environments. However, it remains poorly understood whether contextual cues from the environment contribute to craving increases in addition to specific cues, like ci...

  20. On the Motivational Properties of Reward Cues: Individual Differences

    OpenAIRE

    ROBINSON, TERRY E.; Yager, Lindsay M.; Cogan, Elizabeth S.; Saunders, Benjamin T.

    2013-01-01

    Cues associated with rewards, such as food or drugs of abuse, can themselves acquire motivational properties. Acting as incentive stimuli, such cues can exert powerful control over motivated behavior, and in the case of cues associated with drugs, they can goad continued drug-seeking behavior and relapse. However, recent studies reviewed here suggest that there are large individual differences in the extent to which food and drug cues are attributed with incentive salience. Rats prone to appr...

  1. Reactivity to Cannabis Cues in Virtual Reality Environments†

    OpenAIRE

    Bordnick, Patrick S.; Copp, Hilary L.; Traylor, Amy; Graap, Ken M.; Carter, Brian L.; Walton, Alicia; Ferrer, Mirtha

    2009-01-01

    Virtual reality (VR) cue environments have been developed and successfully tested in nicotine, cocaine, and alcohol abusers. Aims in the current article include the development and testing of a novel VR cannabis cue reactivity assessment system. It was hypothesized that subjective craving levels and attention to cannabis cues would be higher in VR environments merits with cannabis cues compared to VR neutral environments. Twenty nontreatment-seeking current cannabis smokers participated in th...

  2. Flexible cue use in food-caching birds.

    Science.gov (United States)

    LaDage, Lara D; Roth, Timothy C; Fox, Rebecca A; Pravosudov, Vladimir V

    2009-05-01

    An animal's memory may be limited in capacity, which may result in competition among available memory cues. If such competition exists, natural selection may favor prioritization of different memory cues based on cue reliability and on associated differences in the environment and life history. Food-caching birds store numerous food items and appear to rely on memory to retrieve caches. Previous studies suggested that caching species should always prioritize spatial cues over non-spatial cues when both are available, because non-spatial cues may be unreliable in a changing environment; however, it remains unclear whether non-spatial cues should always be ignored when spatial cues are available. We tested whether mountain chickadees (Poecile gambeli), a food-caching species, prioritize memory for spatial cues over color cues when relocating previously found food in an associative learning task. In training trials, birds were exposed to food in a feeder where both spatial location and color were associated. During subsequent unrewarded test trials, color was dissociated from spatial location. Chickadees showed a significant pattern of inspecting feeders associated with correct color first, prior to visiting correct spatial locations. Our findings argue against the hypothesis that the memory of spatial cues should always take priority over any non-spatial cues, including color cues, in food-caching species, because in our experiment mountain chickadees chose color over spatial cues. Our results thus suggest that caching species may be more flexible in cue use than previously thought, possibly dependent upon the environment and complexity of available cues. PMID:19050946

  3. Discriminating native from non-native speech using fusion of visual cues

    NARCIS (Netherlands)

    Georgakis, Christos; Petridis, Stavros; Pantic, Maja

    2014-01-01

    The task of classifying accent, as belonging to a native language speaker or a foreign language speaker, has been so far addressed by means of the audio modality only. However, features extracted from the visual modality have been successfully used to extend or substitute audio-only approaches devel

  4. Predicting the Attitude Flow in Dialogue Based on Multi-Modal Speech Cues

    DEFF Research Database (Denmark)

    Juel Henrichsen, Peter; Allwood, Jens

    We present our experiments on attitude detection based on annotated multi-modal dialogue data1. Our long-term goal is to establish a computational model able to predict the attitudinal patterns in humanhuman dialogue. We believe, such prediction algorithms are useful tools in the pursuit of reali...

  5. Speech Presentation Cues Moderate Frontal EEG Asymmetry in Socially Withdrawn Young Adults

    OpenAIRE

    Cole, Claire; Zapp, Daniel J.; Nelson, S. Katherine; Pérez-Edgar, Koraly

    2011-01-01

    Socially withdrawn individuals display solitary behavior across wide contexts with both unfamiliar and familiar peers. This tendency to withdraw may be driven by either past or anticipated negative social encounters. In addition, socially withdrawn individuals often exhibit right frontal electroencephalogram (EEG) asymmetry at baseline and when under stress. In the current study we examined shifts in frontal EEG activity in young adults (N=41) at baseline, as they viewed either an anxiety-pro...

  6. Speech Presentation Cues Moderate Frontal EEG Asymmetry in Socially Withdrawn Young Adults

    Science.gov (United States)

    Cole, Claire; Zapp, Daniel J.; Nelson, S. Katherine; Perez-Edgar, Koraly

    2012-01-01

    Socially withdrawn individuals display solitary behavior across wide contexts with both unfamiliar and familiar peers. This tendency to withdraw may be driven by either past or anticipated negative social encounters. In addition, socially withdrawn individuals often exhibit right frontal electroencephalogram (EEG) asymmetry at baseline and when…

  7. Visualizing structures of speech expressiveness

    DEFF Research Database (Denmark)

    Herbelin, Bruno; Jensen, Karl Kristoffer; Graugaard, Lars

    2008-01-01

    Speech is both beautiful and informative. In this work, a conceptual study of the speech, through investigation of the tower of Babel, the archetypal phonemes, and a study of the reasons of uses of language is undertaken in order to create an artistic work investigating the nature of speech. The...... Babel myth speaks about distance created when aspiring to the heaven as the reason for language division. Meanwhile, Locquin states through thorough investigations that only a few phonemes are present throughout history. Our interpretation is that a system able to recognize archetypal phonemes through...

  8. Speech enhancement theory and practice

    CERN Document Server

    Loizou, Philipos C

    2013-01-01

    With the proliferation of mobile devices and hearing devices, including hearing aids and cochlear implants, there is a growing and pressing need to design algorithms that can improve speech intelligibility without sacrificing quality. Responding to this need, Speech Enhancement: Theory and Practice, Second Edition introduces readers to the basic problems of speech enhancement and the various algorithms proposed to solve these problems. Updated and expanded, this second edition of the bestselling textbook broadens its scope to include evaluation measures and enhancement algorithms aimed at impr

  9. Speech recovery device

    Energy Technology Data Exchange (ETDEWEB)

    Frankle, Christen M.

    2004-04-20

    There is provided an apparatus and method for assisting speech recovery in people with inability to speak due to aphasia, apraxia or another condition with similar effect. A hollow, rigid, thin-walled tube with semi-circular or semi-elliptical cut out shapes at each open end is positioned such that one end mates with the throat/voice box area of the neck of the assistor and the other end mates with the throat/voice box area of the assisted. The speaking person (assistor) makes sounds that produce standing wave vibrations at the same frequency in the vocal cords of the assisted person. Driving the assisted person's vocal cords with the assisted person being able to hear the correct tone enables the assisted person to speak by simply amplifying the vibration of membranes in their throat.

  10. Speech Enhancement via EMD

    Directory of Open Access Journals (Sweden)

    Monia Turki-Hadj Alouane

    2008-06-01

    Full Text Available In this study, two new approaches for speech signal noise reduction based on the empirical mode decomposition (EMD recently introduced by Huang et al. (1998 are proposed. Based on the EMD, both reduction schemes are fully data-driven approaches. Noisy signal is decomposed adaptively into oscillatory components called intrinsic mode functions (IMFs, using a temporal decomposition called sifting process. Two strategies for noise reduction are proposed: filtering and thresholding. The basic principle of these two methods is the signal reconstruction with IMFs previously filtered, using the minimum mean-squared error (MMSE filter introduced by I. Y. Soon et al. (1998, or thresholded using a shrinkage function. The performance of these methods is analyzed and compared with those of the MMSE filter and wavelet shrinkage. The study is limited to signals corrupted by additive white Gaussian noise. The obtained results show that the proposed denoising schemes perform better than the MMSE filter and wavelet approach.

  11. Silog: Speech Input Logon

    Science.gov (United States)

    Grau, Sergio; Allen, Tony; Sherkat, Nasser

    Silog is a biometrie authentication system that extends the conventional PC logon process using voice verification. Users enter their ID and password using a conventional Windows logon procedure but then the biometrie authentication stage makes a Voice over IP (VoIP) call to a VoiceXML (VXML) server. User interaction with this speech-enabled component then allows the user's voice characteristics to be extracted as part of a simple user/system spoken dialogue. If the captured voice characteristics match those of a previously registered voice profile, then network access is granted. If no match is possible, then a potential unauthorised system access has been detected and the logon process is aborted.

  12. Speech recovery device

    Energy Technology Data Exchange (ETDEWEB)

    Frankle, Christen M.

    2000-10-19

    There is provided an apparatus and method for assisting speech recovery in people with inability to speak due to aphasia, apraxia or another condition with similar effect. A hollow, rigid, thin-walled tube with semi-circular or semi-elliptical cut out shapes at each open end is positioned such that one end mates with the throat/voice box area of the neck of the assistor and the other end mates with the throat/voice box area of the assisted. The speaking person (assistor) makes sounds that produce standing wave vibrations at the same frequency in the vocal cords of the assisted person. Driving the assisted person's vocal cords with the assisted person being able to hear the correct tone enables the assisted person to speak by simply amplifying the vibration of membranes in their throat.

  13. Speech is Golden

    DEFF Research Database (Denmark)

    Juel Henrichsen, Peter

    2014-01-01

    on the supply side. The present article reports on a new public action strategy which has taken shape in the course of 2013-14. While Denmark is a small language area, our public sector is well organised and has considerable purchasing power. Across this past year, Danish local authorities have...... organised around the speech technology challenge, they have formulated a number of joint questions and new requirements to be met by suppliers and have deliberately worked towards formulating tendering material which will allow fair competition. Public researchers have contributed to this work, including...... the author of the present article, in the role of economically neutral advisers. The aim of the initiative is to pave the way for the first profitable contract in the field - which we hope to see in 2014 - an event which would precisely break the present deadlock and open up a billion EUR market for...

  14. Neural encoding of the speech envelope by children with developmental dyslexia.

    Science.gov (United States)

    Power, Alan J; Colling, Lincoln J; Mead, Natasha; Barnes, Lisa; Goswami, Usha

    2016-09-01

    Developmental dyslexia is consistently associated with difficulties in processing phonology (linguistic sound structure) across languages. One view is that dyslexia is characterised by a cognitive impairment in the "phonological representation" of word forms, which arises long before the child presents with a reading problem. Here we investigate a possible neural basis for developmental phonological impairments. We assess the neural quality of speech encoding in children with dyslexia by measuring the accuracy of low-frequency speech envelope encoding using EEG. We tested children with dyslexia and chronological age-matched (CA) and reading-level matched (RL) younger children. Participants listened to semantically-unpredictable sentences in a word report task. The sentences were noise-vocoded to increase reliance on envelope cues. Envelope reconstruction for envelopes between 0 and 10Hz showed that the children with dyslexia had significantly poorer speech encoding in the 0-2Hz band compared to both CA and RL controls. These data suggest that impaired neural encoding of low frequency speech envelopes, related to speech prosody, may underpin the phonological deficit that causes dyslexia across languages. PMID:27433986

  15. The Effects of Overt and Covert Cues on Written Syntax.

    Science.gov (United States)

    Combs, Warren E.; Smith, William L.

    1980-01-01

    Experiments conducted with freshman composition students suggested that (1) the repeated use of a control stimulus passage does not result in increased syntactic complexity; (2) both overt and covert cues elicit more complex writing than do no-cue situations; and (3) the effect of overt cues seems to be retained, at least across a short duration.…

  16. Microphone Array Speech Recognition : Experiments on Overlapping Speech in Meetings

    OpenAIRE

    Moore, Darren; McCowan, Iain A.

    2002-01-01

    This paper investigates the use of microphone arrays to acquire and recognise speech in meetings. Meetings pose several interesting problems for speech processing, as they consist of multiple competing speakers within a small space, typically around a table. Due to their ability to provide hands-free acquisition and directional discrimination, microphone arrays present a potential alternative to close-talking microphones in such an application. We first propose an appropriate microphone array...

  17. Delayed Speech or Language Development

    Science.gov (United States)

    ... distinction between the two: Speech is the verbal expression of language and includes articulation, which is the ... sounds or words repeatedly and can't use oral language to communicate more than his or her ...

  18. Emotion Recognition using Speech Features

    CERN Document Server

    Rao, K Sreenivasa

    2013-01-01

    “Emotion Recognition Using Speech Features” covers emotion-specific features present in speech and discussion of suitable models for capturing emotion-specific information for distinguishing different emotions.  The content of this book is important for designing and developing  natural and sophisticated speech systems. Drs. Rao and Koolagudi lead a discussion of how emotion-specific information is embedded in speech and how to acquire emotion-specific knowledge using appropriate statistical models. Additionally, the authors provide information about using evidence derived from various features and models. The acquired emotion-specific knowledge is useful for synthesizing emotions. Discussion includes global and local prosodic features at syllable, word and phrase levels, helpful for capturing emotion-discriminative information; use of complementary evidences obtained from excitation sources, vocal tract systems and prosodic features in order to enhance the emotion recognition performance;  and pro...

  19. Speech and Language Developmental Milestones

    Science.gov (United States)

    ... of “brain plasticity”—the ways in which the brain is influenced by health conditions or life experiences—and how it can be used to develop learning strategies that encourage healthy language and speech development in ...

  20. Introspective responses to cues and motivation to reduce cigarette smoking influence state and behavioral responses to cue exposure.

    Science.gov (United States)

    Veilleux, Jennifer C; Skinner, Kayla D

    2016-09-01

    In the current study, we aimed to extend smoking cue-reactivity research by evaluating delay discounting as an outcome of cigarette cue exposure. We also separated introspection in response to cues (e.g., self-reporting craving and affect) from cue exposure alone, to determine if introspection changes behavioral responses to cigarette cues. Finally, we included measures of quit motivation and resistance to smoking to assess motivational influences on cue exposure. Smokers were invited to participate in an online cue-reactivity study. Participants were randomly assigned to view smoking images or neutral images, and were randomized to respond to cues with either craving and affect questions (e.g., introspection) or filler questions. Following cue exposure, participants completed a delay discounting task and then reported state affect, craving, and resistance to smoking, as well as an assessment of quit motivation. We found that after controlling for trait impulsivity, participants who introspected on craving and affect showed higher delay discounting, irrespective of cue type, but we found no effect of response condition on subsequent craving (e.g., craving reactivity). We also found that motivation to quit interacted with experimental conditions to predict state craving and state resistance to smoking. Although asking about craving during cue exposure did not increase later craving, it resulted in greater delaying of discounted rewards. Overall, our findings suggest the need to further assess the implications of introspection and motivation on behavioral outcomes of cue exposure. PMID:27115733