WorldWideScience

Sample records for integrate speech cues

  1. Speech cues contribute to audiovisual spatial integration.

    Directory of Open Access Journals (Sweden)

    Christopher W Bishop

    Full Text Available Speech is the most important form of human communication but ambient sounds and competing talkers often degrade its acoustics. Fortunately the brain can use visual information, especially its highly precise spatial information, to improve speech comprehension in noisy environments. Previous studies have demonstrated that audiovisual integration depends strongly on spatiotemporal factors. However, some integrative phenomena such as McGurk interference persist even with gross spatial disparities, suggesting that spatial alignment is not necessary for robust integration of audiovisual place-of-articulation cues. It is therefore unclear how speech-cues interact with audiovisual spatial integration mechanisms. Here, we combine two well established psychophysical phenomena, the McGurk effect and the ventriloquist's illusion, to explore this dependency. Our results demonstrate that conflicting spatial cues may not interfere with audiovisual integration of speech, but conflicting speech-cues can impede integration in space. This suggests a direct but asymmetrical influence between ventral 'what' and dorsal 'where' pathways.

  2. Modeling the Development of Audiovisual Cue Integration in Speech Perception.

    Science.gov (United States)

    Getz, Laura M; Nordeen, Elke R; Vrabic, Sarah C; Toscano, Joseph C

    2017-03-21

    Adult speech perception is generally enhanced when information is provided from multiple modalities. In contrast, infants do not appear to benefit from combining auditory and visual speech information early in development. This is true despite the fact that both modalities are important to speech comprehension even at early stages of language acquisition. How then do listeners learn how to process auditory and visual information as part of a unified signal? In the auditory domain, statistical learning processes provide an excellent mechanism for acquiring phonological categories. Is this also true for the more complex problem of acquiring audiovisual correspondences, which require the learner to integrate information from multiple modalities? In this paper, we present simulations using Gaussian mixture models (GMMs) that learn cue weights and combine cues on the basis of their distributional statistics. First, we simulate the developmental process of acquiring phonological categories from auditory and visual cues, asking whether simple statistical learning approaches are sufficient for learning multi-modal representations. Second, we use this time course information to explain audiovisual speech perception in adult perceivers, including cases where auditory and visual input are mismatched. Overall, we find that domain-general statistical learning techniques allow us to model the developmental trajectory of audiovisual cue integration in speech, and in turn, allow us to better understand the mechanisms that give rise to unified percepts based on multiple cues.

  3. What Information Is Necessary for Speech Categorization? Harnessing Variability in the Speech Signal by Integrating Cues Computed Relative to Expectations

    Science.gov (United States)

    McMurray, Bob; Jongman, Allard

    2011-01-01

    Most theories of categorization emphasize how continuous perceptual information is mapped to categories. However, equally important are the informational assumptions of a model, the type of information subserving this mapping. This is crucial in speech perception where the signal is variable and context dependent. This study assessed the…

  4. Preschoolers Benefit from Visually Salient Speech Cues

    Science.gov (United States)

    Lalonde, Kaylah; Holt, Rachael Frush

    2015-01-01

    Purpose: This study explored visual speech influence in preschoolers using 3 developmentally appropriate tasks that vary in perceptual difficulty and task demands. They also examined developmental differences in the ability to use visually salient speech cues and visual phonological knowledge. Method: Twelve adults and 27 typically developing 3-…

  5. Audiovisual Cues and Perceptual Learning of Spectrally Distorted Speech

    Science.gov (United States)

    Pilling, Michael; Thomas, Sharon

    2011-01-01

    Two experiments investigate the effectiveness of audiovisual (AV) speech cues (cues derived from both seeing and hearing a talker speak) in facilitating perceptual learning of spectrally distorted speech. Speech was distorted through an eight channel noise-vocoder which shifted the spectral envelope of the speech signal to simulate the properties…

  6. Acoustic cues identifying phonetic transitions for speech segmentation

    CSIR Research Space (South Africa)

    Van Niekerk, DR

    2008-11-01

    Full Text Available The quality of corpus-based text-to-speech (TTS) systems depends strongly on the consistency of boundary placements during phonetic alignments. Expert human transcribers use visually represented acoustic cues in order to consistently place...

  7. Segmentation cues in conversational speech: Robust semantics and fragile phonotactics

    Directory of Open Access Journals (Sweden)

    Laurence eWhite

    2012-10-01

    Full Text Available Multiple cues influence listeners’ segmentation of connected speech into words, but most previous studies have used stimuli elicited in careful readings rather than natural conversation. Discerning word boundaries in conversational speech may differ from the laboratory setting. In particular, a speaker’s articulatory effort – hyperarticulation vs hypoarticulation (H&H – may vary according to communicative demands, suggesting a compensatory relationship whereby acoustic-phonetic cues are attenuated when other information sources strongly guide segmentation. We examined how listeners’ interpretation of segmentation cues is affected by speech style (spontaneous conversation vs read, using cross-modal identity priming. To elicit spontaneous stimuli, we used a map task in which speakers discussed routes around stylised landmarks. These landmarks were two-word phrases in which the strength of potential segmentation cues – semantic likelihood and cross-boundary diphone phonotactics – was systematically varied. Landmark-carrying utterances were transcribed and later re-recorded as read speech.Independent of speech style, we found an interaction between cue valence (favourable/unfavourable and cue type (phonotactics/semantics. Thus, there was an effect of semantic plausibility, but no effect of cross-boundary phonotactics, indicating that the importance of phonotactic segmentation may have been overstated in studies where lexical information was artificially suppressed. These patterns were unaffected by whether the stimuli were elicited in a spontaneous or read context, even though the difference in speech styles was evident in a main effect. Durational analyses suggested speaker-driven cue trade-offs congruent with an H&H account, but these modulations did not impact on listener behaviour. We conclude that previous research exploiting read speech is reliable in indicating the primacy of lexically-based cues in the segmentation of natural

  8. The Neural Basis of Speech Perception through Lipreading and Manual Cues: Evidence from Deaf Native Users of Cued Speech

    Science.gov (United States)

    Aparicio, Mario; Peigneux, Philippe; Charlier, Brigitte; Balériaux, Danielle; Kavec, Martin; Leybaert, Jacqueline

    2017-01-01

    We present here the first neuroimaging data for perception of Cued Speech (CS) by deaf adults who are native users of CS. CS is a visual mode of communicating a spoken language through a set of manual cues which accompany lipreading and disambiguate it. With CS, sublexical units of the oral language are conveyed clearly and completely through the visual modality without requiring hearing. The comparison of neural processing of CS in deaf individuals with processing of audiovisual (AV) speech in normally hearing individuals represents a unique opportunity to explore the similarities and differences in neural processing of an oral language delivered in a visuo-manual vs. an AV modality. The study included deaf adult participants who were early CS users and native hearing users of French who process speech audiovisually. Words were presented in an event-related fMRI design. Three conditions were presented to each group of participants. The deaf participants saw CS words (manual + lipread), words presented as manual cues alone, and words presented to be lipread without manual cues. The hearing group saw AV spoken words, audio-alone and lipread-alone. Three findings are highlighted. First, the middle and superior temporal gyrus (excluding Heschl’s gyrus) and left inferior frontal gyrus pars triangularis constituted a common, amodal neural basis for AV and CS perception. Second, integration was inferred in posterior parts of superior temporal sulcus for audio and lipread information in AV speech, but in the occipito-temporal junction, including MT/V5, for the manual cues and lipreading in CS. Third, the perception of manual cues showed a much greater overlap with the regions activated by CS (manual + lipreading) than lipreading alone did. This supports the notion that manual cues play a larger role than lipreading for CS processing. The present study contributes to a better understanding of the role of manual cues as support of visual speech perception in the framework

  9. The contribution of dynamic visual cues to audiovisual speech perception.

    Science.gov (United States)

    Jaekl, Philip; Pesquita, Ana; Alsius, Agnes; Munhall, Kevin; Soto-Faraco, Salvador

    2015-08-01

    Seeing a speaker's facial gestures can significantly improve speech comprehension, especially in noisy environments. However, the nature of the visual information from the speaker's facial movements that is relevant for this enhancement is still unclear. Like auditory speech signals, visual speech signals unfold over time and contain both dynamic configural information and luminance-defined local motion cues; two information sources that are thought to engage anatomically and functionally separate visual systems. Whereas, some past studies have highlighted the importance of local, luminance-defined motion cues in audiovisual speech perception, the contribution of dynamic configural information signalling changes in form over time has not yet been assessed. We therefore attempted to single out the contribution of dynamic configural information to audiovisual speech processing. To this aim, we measured word identification performance in noise using unimodal auditory stimuli, and with audiovisual stimuli. In the audiovisual condition, speaking faces were presented as point light displays achieved via motion capture of the original talker. Point light displays could be isoluminant, to minimise the contribution of effective luminance-defined local motion information, or with added luminance contrast, allowing the combined effect of dynamic configural cues and local motion cues. Audiovisual enhancement was found in both the isoluminant and contrast-based luminance conditions compared to an auditory-only condition, demonstrating, for the first time the specific contribution of dynamic configural cues to audiovisual speech improvement. These findings imply that globally processed changes in a speaker's facial shape contribute significantly towards the perception of articulatory gestures and the analysis of audiovisual speech. Copyright © 2015 Elsevier Ltd. All rights reserved.

  10. Cross-language differences in cue use for speech segmentation

    NARCIS (Netherlands)

    Tyler, M.D.; Cutler, A.

    2009-01-01

    Two artificial-language learning experiments directly compared English, French, and Dutch listeners' use of suprasegmental cues for continuous-speech segmentation. In both experiments, listeners heard unbroken sequences of consonant-vowel syllables, composed of recurring three- and four-syllable

  11. Should visual speech cues (speechreading) be considered when fitting hearing aids?

    Science.gov (United States)

    Grant, Ken

    2002-05-01

    When talker and listener are face-to-face, visual speech cues become an important part of the communication environment, and yet, these cues are seldom considered when designing hearing aids. Models of auditory-visual speech recognition highlight the importance of complementary versus redundant speech information for predicting auditory-visual recognition performance. Thus, for hearing aids to work optimally when visual speech cues are present, it is important to know whether the cues provided by amplification and the cues provided by speechreading complement each other. In this talk, data will be reviewed that show nonmonotonicity between auditory-alone speech recognition and auditory-visual speech recognition, suggesting that efforts designed solely to improve auditory-alone recognition may not always result in improved auditory-visual recognition. Data will also be presented showing that one of the most important speech cues for enhancing auditory-visual speech recognition performance, voicing, is often the cue that benefits least from amplification.

  12. Discriminating individually considerate and authoritarian leaders by speech activity cues

    OpenAIRE

    Feese, Sebastian; Muaremi, Amir; Arnrich, Bert; Tröster, Gerhard; Meyer, Bertolt; Jonas, Klaus

    2011-01-01

    Effective leadership can increase team performance, however up to now the influence of specific micro-level behavioral patterns on team performance is unclear. At the same time, current behavior observation methods in social psychology mostly rely on manual video annotations that impede research. In our work, we follow a sensor-based approach to automatically extract speech activity cues to discriminate individualized considerate from authoritarian leadership. On a subset of 35 selected...

  13. Suprasegmental lexical stress cues in visual speech can guide spoken-word recognition

    NARCIS (Netherlands)

    Jesse, A.; McQueen, J.M.

    2014-01-01

    Visual cues to the individual segments of speech and to sentence prosody guide speech recognition. The present study tested whether visual suprasegmental cues to the stress patterns of words can also constrain recognition. Dutch listeners use acoustic suprasegmental cues to lexical stress (changes

  14. Use of amplitude modulation cues recovered from frequency modulation for cochlear implant users when original speech cues are severely degraded.

    Science.gov (United States)

    Won, Jong Ho; Shim, Hyun Joon; Lorenzi, Christian; Rubinstein, Jay T

    2014-06-01

    Won et al. (J Acoust Soc Am 132:1113-1119, 2012) reported that cochlear implant (CI) speech processors generate amplitude-modulation (AM) cues recovered from broadband speech frequency modulation (FM) and that CI users can use these cues for speech identification in quiet. The present study was designed to extend this finding for a wide range of listening conditions, where the original speech cues were severely degraded by manipulating either the acoustic signals or the speech processor. The manipulation of the acoustic signals included the presentation of background noise, simulation of reverberation, and amplitude compression. The manipulation of the speech processor included changing the input dynamic range and the number of channels. For each of these conditions, multiple levels of speech degradation were tested. Speech identification was measured for CI users and compared for stimuli having both AM and FM information (intact condition) or FM information only (FM condition). Each manipulation degraded speech identification performance for both intact and FM conditions. Performance for the intact and FM conditions became similar for stimuli having the most severe degradations. Identification performance generally overlapped for the intact and FM conditions. Moreover, identification performance for the FM condition was better than chance performance even at the maximum level of distortion. Finally, significant correlations were found between speech identification scores for the intact and FM conditions. Altogether, these results suggest that despite poor frequency selectivity, CI users can make efficient use of AM cues recovered from speech FM in difficult listening situations.

  15. Experience with speech sounds is not necessary for cue trading by budgerigars (Melopsittacus undulatus.

    Directory of Open Access Journals (Sweden)

    Mary Flaherty

    Full Text Available The influence of experience with human speech sounds on speech perception in budgerigars, vocal mimics whose speech exposure can be tightly controlled in a laboratory setting, was measured. Budgerigars were divided into groups that differed in auditory exposure and then tested on a cue-trading identification paradigm with synthetic speech. Phonetic cue trading is a perceptual phenomenon observed when changes on one cue dimension are offset by changes in another cue dimension while still maintaining the same phonetic percept. The current study examined whether budgerigars would trade the cues of voice onset time (VOT and the first formant onset frequency when identifying syllable initial stop consonants and if this would be influenced by exposure to speech sounds. There were a total of four different exposure groups: No speech exposure (completely isolated, Passive speech exposure (regular exposure to human speech, and two Speech-trained groups. After the exposure period, all budgerigars were tested for phonetic cue trading using operant conditioning procedures. Birds were trained to peck keys in response to different synthetic speech sounds that began with "d" or "t" and varied in VOT and frequency of the first formant at voicing onset. Once training performance criteria were met, budgerigars were presented with the entire intermediate series, including ambiguous sounds. Responses on these trials were used to determine which speech cues were used, if a trading relation between VOT and the onset frequency of the first formant was present, and whether speech exposure had an influence on perception. Cue trading was found in all birds and these results were largely similar to those of a group of humans. Results indicated that prior speech experience was not a requirement for cue trading by budgerigars. The results are consistent with theories that explain phonetic cue trading in terms of a rich auditory encoding of the speech signal.

  16. Suprasegmental lexical stress cues in visual speech can guide spoken-word recognition

    OpenAIRE

    Jesse, A.; McQueen, J.

    2014-01-01

    Visual cues to the individual segments of speech and to sentence prosody guide speech recognition. The present study tested whether visual suprasegmental cues to the stress patterns of words can also constrain recognition. Dutch listeners use acoustic suprasegmental cues to lexical stress (changes in duration, amplitude, and pitch) in spoken-word recognition. We asked here whether they can also use visual suprasegmental cues. In two categorization experiments, Dutch participants saw a speaker...

  17. Crossmodal and incremental perception of audiovisual cues to emotional speech.

    Science.gov (United States)

    Barkhuysen, Pashiera; Krahmer, Emiel; Swerts, Marc

    2010-01-01

    In this article we report on two experiments about the perception of audiovisual cues to emotional speech. The article addresses two questions: 1) how do visual cues from a speaker's face to emotion relate to auditory cues, and (2) what is the recognition speed for various facial cues to emotion? Both experiments reported below are based on tests with video clips of emotional utterances collected via a variant of the well-known Velten method. More specifically, we recorded speakers who displayed positive or negative emotions, which were congruent or incongruent with the (emotional) lexical content of the uttered sentence. In order to test this, we conducted two experiments. The first experiment is a perception experiment in which Czech participants, who do not speak Dutch, rate the perceived emotional state of Dutch speakers in a bimodal (audiovisual) or a unimodal (audio- or vision-only) condition. It was found that incongruent emotional speech leads to significantly more extreme perceived emotion scores than congruent emotional speech, where the difference between congruent and incongruent emotional speech is larger for the negative than for the positive conditions. Interestingly, the largest overall differences between congruent and incongruent emotions were found for the audio-only condition, which suggests that posing an incongruent emotion has a particularly strong effect on the spoken realization of emotions. The second experiment uses a gating paradigm to test the recognition speed for various emotional expressions from a speaker's face. In this experiment participants were presented with the same clips as experiment I, but this time presented vision-only. The clips were shown in successive segments (gates) of increasing duration. Results show that participants are surprisingly accurate in their recognition of the various emotions, as they already reach high recognition scores in the first gate (after only 160 ms). Interestingly, the recognition scores

  18. Learning to Match Auditory and Visual Speech Cues: Social Influences on Acquisition of Phonological Categories

    Science.gov (United States)

    Altvater-Mackensen, Nicole; Grossmann, Tobias

    2015-01-01

    Infants' language exposure largely involves face-to-face interactions providing acoustic and visual speech cues but also social cues that might foster language learning. Yet, both audiovisual speech information and social information have so far received little attention in research on infants' early language development. Using a preferential…

  19. Perception of Speech Modulation Cues by 6-Month-Old Infants

    Science.gov (United States)

    Cabrera, Laurianne; Bertoncini, Josiane; Lorenzi, Christian

    2013-01-01

    Purpose: The capacity of 6-month-old infants to discriminate a voicing contrast (/aba/--/apa/) on the basis of "amplitude modulation (AM) cues" and "frequency modulation (FM) cues" was evaluated. Method: Several vocoded speech conditions were designed to either degrade FM cues in 4 or 32 bands or degrade AM in 32 bands. Infants…

  20. [Intermodal timing cues for audio-visual speech recognition].

    Science.gov (United States)

    Hashimoto, Masahiro; Kumashiro, Masaharu

    2004-06-01

    The purpose of this study was to investigate the limitations of lip-reading advantages for Japanese young adults by desynchronizing visual and auditory information in speech. In the experiment, audio-visual speech stimuli were presented under the six test conditions: audio-alone, and audio-visually with either 0, 60, 120, 240 or 480 ms of audio delay. The stimuli were the video recordings of a face of a female Japanese speaking long and short Japanese sentences. The intelligibility of the audio-visual stimuli was measured as a function of audio delays in sixteen untrained young subjects. Speech intelligibility under the audio-delay condition of less than 120 ms was significantly better than that under the audio-alone condition. On the other hand, the delay of 120 ms corresponded to the mean mora duration measured for the audio stimuli. The results implied that audio delays of up to 120 ms would not disrupt lip-reading advantage, because visual and auditory information in speech seemed to be integrated on a syllabic time scale. Potential applications of this research include noisy workplace in which a worker must extract relevant speech from all the other competing noises.

  1. When Meaning Is Not Enough: Distributional and Semantic Cues to Word Categorization in Child Directed Speech.

    Science.gov (United States)

    Feijoo, Sara; Muñoz, Carmen; Amadó, Anna; Serrat, Elisabet

    2017-01-01

    One of the most important tasks in first language development is assigning words to their grammatical category. The Semantic Bootstrapping Hypothesis postulates that, in order to accomplish this task, children are guided by a neat correspondence between semantic and grammatical categories, since nouns typically refer to objects and verbs to actions. It is this correspondence that guides children's initial word categorization. Other approaches, on the other hand, suggest that children might make use of distributional cues and word contexts to accomplish the word categorization task. According to such approaches, the Semantic Bootstrapping assumption offers an important limitation, as it might not be true that all the nouns that children hear refer to specific objects or people. In order to explore that, we carried out two studies based on analyses of children's linguistic input. We analyzed child-directed speech addressed to four children under the age of 2;6, taken from the CHILDES database. The corpora were selected from the Manchester corpus. The corpora from the four selected children contained a total of 10,681 word types and 364,196 word tokens. In our first study, discriminant analyses were performed using semantic cues alone. The results show that many of the nouns found in parents' speech do not relate to specific objects and that semantic information alone might not be sufficient for successful word categorization. Given that there must be an additional source of information which, alongside with semantics, might assist young learners in word categorization, our second study explores the availability of both distributional and semantic cues in child-directed speech. Our results confirm that this combination might yield better results for word categorization. These results are in line with theories that suggest the need for an integration of multiple cues from different sources in language development.

  2. The role of reverberation-related binaural cues in the externalization of speech

    DEFF Research Database (Denmark)

    Catic, Jasmina; Santurette, Sébastien; Dau, Torsten

    2015-01-01

    The perception of externalization of speech sounds was investigated with respect to the monaural and binaural cues available at the listeners’ ears in a reverberant environment. Individualized binaural room impulse responses (BRIRs) were used to simulate externalized sound sources via headphones....... The measured BRIRs were subsequently modified such that the proportion of the response containing binaural vs monaural information was varied. Normal-hearing listeners were presented with speech sounds convolved with such modified BRIRs. Monaural reverberation cues were found to be sufficient...

  3. Treatment of Speech Anxiety by Cue-Controlled Relaxation and Desensitization with Professional and Paraprofessional Counselors

    Science.gov (United States)

    Russell, Richard K.; Wise, Fred

    1976-01-01

    This investigation compared the relative effectiveness of group-administered cue-controlled relaxation and group systematic desensitization in the treatment of speech anxiety. Also examined was the role of professional versus paraprofessional counselors in implementing the treatment program. A description of the cue-controlled relaxation technique…

  4. Training the Brain to Weight Speech Cues Differently: A Study of Finnish Second-language Users of English

    Science.gov (United States)

    Ylinen, Sari; Uther, Maria; Latvala, Antti; Vepsalainen, Sara; Iverson, Paul; Akahane-Yamada, Reiko; Naatanen, Risto

    2010-01-01

    Foreign-language learning is a prime example of a task that entails perceptual learning. The correct comprehension of foreign-language speech requires the correct recognition of speech sounds. The most difficult speech-sound contrasts for foreign-language learners often are the ones that have multiple phonetic cues, especially if the cues are…

  5. Phonetic Category Cues in Adult-Directed Speech: Evidence from Three Languages with Distinct Vowel Characteristics

    Science.gov (United States)

    Pons, Ferran; Biesanz, Jeremy C.; Kajikawa, Sachiyo; Fais, Laurel; Narayan, Chandan R.; Amano, Shigeaki; Werker, Janet F.

    2012-01-01

    Using an artificial language learning manipulation, Maye, Werker, and Gerken (2002) demonstrated that infants' speech sound categories change as a function of the distributional properties of the input. In a recent study, Werker et al. (2007) showed that Infant-directed Speech (IDS) input contains reliable acoustic cues that support distributional…

  6. The role of continuous low-frequency harmonicity cues for interrupted speech perception in bimodal hearing.

    Science.gov (United States)

    Oh, Soo Hee; Donaldson, Gail S; Kong, Ying-Yee

    2016-04-01

    Low-frequency acoustic cues have been shown to enhance speech perception by cochlear-implant users, particularly when target speech occurs in a competing background. The present study examined the extent to which a continuous representation of low-frequency harmonicity cues contributes to bimodal benefit in simulated bimodal listeners. Experiment 1 examined the benefit of restoring a continuous temporal envelope to the low-frequency ear while the vocoder ear received a temporally interrupted stimulus. Experiment 2 examined the effect of providing continuous harmonicity cues in the low-frequency ear as compared to restoring a continuous temporal envelope in the vocoder ear. Findings indicate that bimodal benefit for temporally interrupted speech increases when continuity is restored to either or both ears. The primary benefit appears to stem from the continuous temporal envelope in the low-frequency region providing additional phonetic cues related to manner and F1 frequency; a secondary contribution is provided by low-frequency harmonicity cues when a continuous representation of the temporal envelope is present in the low-frequency, or both ears. The continuous temporal envelope and harmonicity cues of low-frequency speech are thought to support bimodal benefit by facilitating identification of word and syllable boundaries, and by restoring partial phonetic cues that occur during gaps in the temporally interrupted stimulus.

  7. Audiovisual cues benefit recognition of accented speech in noise but not perceptual adaptation.

    Science.gov (United States)

    Banks, Briony; Gowen, Emma; Munro, Kevin J; Adank, Patti

    2015-01-01

    Perceptual adaptation allows humans to recognize different varieties of accented speech. We investigated whether perceptual adaptation to accented speech is facilitated if listeners can see a speaker's facial and mouth movements. In Study 1, participants listened to sentences in a novel accent and underwent a period of training with audiovisual or audio-only speech cues, presented in quiet or in background noise. A control group also underwent training with visual-only (speech-reading) cues. We observed no significant difference in perceptual adaptation between any of the groups. To address a number of remaining questions, we carried out a second study using a different accent, speaker and experimental design, in which participants listened to sentences in a non-native (Japanese) accent with audiovisual or audio-only cues, without separate training. Participants' eye gaze was recorded to verify that they looked at the speaker's face during audiovisual trials. Recognition accuracy was significantly better for audiovisual than for audio-only stimuli; however, no statistical difference in perceptual adaptation was observed between the two modalities. Furthermore, Bayesian analysis suggested that the data supported the null hypothesis. Our results suggest that although the availability of visual speech cues may be immediately beneficial for recognition of unfamiliar accented speech in noise, it does not improve perceptual adaptation.

  8. Integration of auditory and visual speech information

    NARCIS (Netherlands)

    Hall, M.; Smeele, P.M.T.; Kuhl, P.K.

    1998-01-01

    The integration of auditory and visual speech is observed when modes specify different places of articulation. Influences of auditory variation on integration were examined using consonant identifi-cation, plus quality and similarity ratings. Auditory identification predicted auditory-visual

  9. Spectral integration in speech and non-speech sounds

    Science.gov (United States)

    Jacewicz, Ewa

    2005-04-01

    Spectral integration (or formant averaging) was proposed in vowel perception research to account for the observation that a reduction of the intensity of one of two closely spaced formants (as in /u/) produced a predictable shift in vowel quality [Delattre et al., Word 8, 195-210 (1952)]. A related observation was reported in psychoacoustics, indicating that when the components of a two-tone periodic complex differ in amplitude and frequency, its perceived pitch is shifted toward that of the more intense tone [Helmholtz, App. XIV (1875/1948)]. Subsequent research in both fields focused on the frequency interval that separates these two spectral components, in an attempt to determine the size of the bandwidth for spectral integration to occur. This talk will review the accumulated evidence for and against spectral integration within the hypothesized limit of 3.5 Bark for static and dynamic signals in speech perception and psychoacoustics. Based on similarities in the processing of speech and non-speech sounds, it is suggested that spectral integration may reflect a general property of the auditory system. A larger frequency bandwidth, possibly close to 3.5 Bark, may be utilized in integrating acoustic information, including speech, complex signals, or sound quality of a violin.

  10. The role of reverberation-related binaural cues in the externalization of speech.

    Science.gov (United States)

    Catic, Jasmina; Santurette, Sébastien; Dau, Torsten

    2015-08-01

    The perception of externalization of speech sounds was investigated with respect to the monaural and binaural cues available at the listeners' ears in a reverberant environment. Individualized binaural room impulse responses (BRIRs) were used to simulate externalized sound sources via headphones. The measured BRIRs were subsequently modified such that the proportion of the response containing binaural vs monaural information was varied. Normal-hearing listeners were presented with speech sounds convolved with such modified BRIRs. Monaural reverberation cues were found to be sufficient for the externalization of a lateral sound source. In contrast, for a frontal source, an increased amount of binaural cues from reflections was required in order to obtain well externalized sound images. It was demonstrated that the interaction between the interaural cues of the direct sound and the reverberation strongly affects the perception of externalization. An analysis of the short-term binaural cues showed that the amount of fluctuations of the binaural cues corresponded well to the externalization ratings obtained in the listening tests. The results further suggested that the precedence effect is involved in the auditory processing of the dynamic binaural cues that are utilized for externalization perception.

  11. Psychoacoustic cues to emotion in speech prosody and music.

    Science.gov (United States)

    Coutinho, Eduardo; Dibben, Nicola

    2013-01-01

    There is strong evidence of shared acoustic profiles common to the expression of emotions in music and speech, yet relatively limited understanding of the specific psychoacoustic features involved. This study combined a controlled experiment and computational modelling to investigate the perceptual codes associated with the expression of emotion in the acoustic domain. The empirical stage of the study provided continuous human ratings of emotions perceived in excerpts of film music and natural speech samples. The computational stage created a computer model that retrieves the relevant information from the acoustic stimuli and makes predictions about the emotional expressiveness of speech and music close to the responses of human subjects. We show that a significant part of the listeners' second-by-second reported emotions to music and speech prosody can be predicted from a set of seven psychoacoustic features: loudness, tempo/speech rate, melody/prosody contour, spectral centroid, spectral flux, sharpness, and roughness. The implications of these results are discussed in the context of cross-modal similarities in the communication of emotion in the acoustic domain.

  12. Neural networks supporting audiovisual integration for speech: A large-scale lesion study.

    Science.gov (United States)

    Hickok, Gregory; Rogalsky, Corianne; Matchin, William; Basilakos, Alexandra; Cai, Julia; Pillay, Sara; Ferrill, Michelle; Mickelsen, Soren; Anderson, Steven W; Love, Tracy; Binder, Jeffrey; Fridriksson, Julius

    2018-06-01

    Auditory and visual speech information are often strongly integrated resulting in perceptual enhancements for audiovisual (AV) speech over audio alone and sometimes yielding compelling illusory fusion percepts when AV cues are mismatched, the McGurk-MacDonald effect. Previous research has identified three candidate regions thought to be critical for AV speech integration: the posterior superior temporal sulcus (STS), early auditory cortex, and the posterior inferior frontal gyrus. We assess the causal involvement of these regions (and others) in the first large-scale (N = 100) lesion-based study of AV speech integration. Two primary findings emerged. First, behavioral performance and lesion maps for AV enhancement and illusory fusion measures indicate that classic metrics of AV speech integration are not necessarily measuring the same process. Second, lesions involving superior temporal auditory, lateral occipital visual, and multisensory zones in the STS are the most disruptive to AV speech integration. Further, when AV speech integration fails, the nature of the failure-auditory vs visual capture-can be predicted from the location of the lesions. These findings show that AV speech processing is supported by unimodal auditory and visual cortices as well as multimodal regions such as the STS at their boundary. Motor related frontal regions do not appear to play a role in AV speech integration. Copyright © 2018 Elsevier Ltd. All rights reserved.

  13. Common cues to emotion in the dynamic facial expressions of speech and song.

    Science.gov (United States)

    Livingstone, Steven R; Thompson, William F; Wanderley, Marcelo M; Palmer, Caroline

    2015-01-01

    Speech and song are universal forms of vocalization that may share aspects of emotional expression. Research has focused on parallels in acoustic features, overlooking facial cues to emotion. In three experiments, we compared moving facial expressions in speech and song. In Experiment 1, vocalists spoke and sang statements each with five emotions. Vocalists exhibited emotion-dependent movements of the eyebrows and lip corners that transcended speech-song differences. Vocalists' jaw movements were coupled to their acoustic intensity, exhibiting differences across emotion and speech-song. Vocalists' emotional movements extended beyond vocal sound to include large sustained expressions, suggesting a communicative function. In Experiment 2, viewers judged silent videos of vocalists' facial expressions prior to, during, and following vocalization. Emotional intentions were identified accurately for movements during and after vocalization, suggesting that these movements support the acoustic message. Experiment 3 compared emotional identification in voice-only, face-only, and face-and-voice recordings. Emotion judgements for voice-only singing were poorly identified, yet were accurate for all other conditions, confirming that facial expressions conveyed emotion more accurately than the voice in song, yet were equivalent in speech. Collectively, these findings highlight broad commonalities in the facial cues to emotion in speech and song, yet highlight differences in perception and acoustic-motor production.

  14. Speech-specificity of two audiovisual integration effects

    DEFF Research Database (Denmark)

    Eskelund, Kasper; Tuomainen, Jyrki; Andersen, Tobias

    2010-01-01

    Seeing the talker’s articulatory mouth movements can influence the auditory speech percept both in speech identification and detection tasks. Here we show that these audiovisual integration effects also occur for sine wave speech (SWS), which is an impoverished speech signal that naïve observers...... often fail to perceive as speech. While audiovisual integration in the identification task only occurred when observers were informed of the speech-like nature of SWS, integration occurred in the detection task both for informed and naïve observers. This shows that both speech-specific and general...... mechanisms underlie audiovisual integration of speech....

  15. Visual-vestibular cue integration for heading perception: applications of optimal cue integration theory.

    Science.gov (United States)

    Fetsch, Christopher R; Deangelis, Gregory C; Angelaki, Dora E

    2010-05-01

    The perception of self-motion is crucial for navigation, spatial orientation and motor control. In particular, estimation of one's direction of translation, or heading, relies heavily on multisensory integration in most natural situations. Visual and nonvisual (e.g., vestibular) information can be used to judge heading, but each modality alone is often insufficient for accurate performance. It is not surprising, then, that visual and vestibular signals converge frequently in the nervous system, and that these signals interact in powerful ways at the level of behavior and perception. Early behavioral studies of visual-vestibular interactions consisted mainly of descriptive accounts of perceptual illusions and qualitative estimation tasks, often with conflicting results. In contrast, cue integration research in other modalities has benefited from the application of rigorous psychophysical techniques, guided by normative models that rest on the foundation of ideal-observer analysis and Bayesian decision theory. Here we review recent experiments that have attempted to harness these so-called optimal cue integration models for the study of self-motion perception. Some of these studies used nonhuman primate subjects, enabling direct comparisons between behavioral performance and simultaneously recorded neuronal activity. The results indicate that humans and monkeys can integrate visual and vestibular heading cues in a manner consistent with optimal integration theory, and that single neurons in the dorsal medial superior temporal area show striking correlates of the behavioral effects. This line of research and other applications of normative cue combination models should continue to shed light on mechanisms of self-motion perception and the neuronal basis of multisensory integration.

  16. When speaker identity is unavoidable: Neural processing of speaker identity cues in natural speech.

    Science.gov (United States)

    Tuninetti, Alba; Chládková, Kateřina; Peter, Varghese; Schiller, Niels O; Escudero, Paola

    2017-11-01

    Speech sound acoustic properties vary largely across speakers and accents. When perceiving speech, adult listeners normally disregard non-linguistic variation caused by speaker or accent differences, in order to comprehend the linguistic message, e.g. to correctly identify a speech sound or a word. Here we tested whether the process of normalizing speaker and accent differences, facilitating the recognition of linguistic information, is found at the level of neural processing, and whether it is modulated by the listeners' native language. In a multi-deviant oddball paradigm, native and nonnative speakers of Dutch were exposed to naturally-produced Dutch vowels varying in speaker, sex, accent, and phoneme identity. Unexpectedly, the analysis of mismatch negativity (MMN) amplitudes elicited by each type of change shows a large degree of early perceptual sensitivity to non-linguistic cues. This finding on perception of naturally-produced stimuli contrasts with previous studies examining the perception of synthetic stimuli wherein adult listeners automatically disregard acoustic cues to speaker identity. The present finding bears relevance to speech normalization theories, suggesting that at an unattended level of processing, listeners are indeed sensitive to changes in fundamental frequency in natural speech tokens. Copyright © 2017 Elsevier Inc. All rights reserved.

  17. Brain networks engaged in audiovisual integration during speech perception revealed by persistent homology-based network filtration.

    Science.gov (United States)

    Kim, Heejung; Hahm, Jarang; Lee, Hyekyoung; Kang, Eunjoo; Kang, Hyejin; Lee, Dong Soo

    2015-05-01

    The human brain naturally integrates audiovisual information to improve speech perception. However, in noisy environments, understanding speech is difficult and may require much effort. Although the brain network is supposed to be engaged in speech perception, it is unclear how speech-related brain regions are connected during natural bimodal audiovisual or unimodal speech perception with counterpart irrelevant noise. To investigate the topological changes of speech-related brain networks at all possible thresholds, we used a persistent homological framework through hierarchical clustering, such as single linkage distance, to analyze the connected component of the functional network during speech perception using functional magnetic resonance imaging. For speech perception, bimodal (audio-visual speech cue) or unimodal speech cues with counterpart irrelevant noise (auditory white-noise or visual gum-chewing) were delivered to 15 subjects. In terms of positive relationship, similar connected components were observed in bimodal and unimodal speech conditions during filtration. However, during speech perception by congruent audiovisual stimuli, the tighter couplings of left anterior temporal gyrus-anterior insula component and right premotor-visual components were observed than auditory or visual speech cue conditions, respectively. Interestingly, visual speech is perceived under white noise by tight negative coupling in the left inferior frontal region-right anterior cingulate, left anterior insula, and bilateral visual regions, including right middle temporal gyrus, right fusiform components. In conclusion, the speech brain network is tightly positively or negatively connected, and can reflect efficient or effortful processes during natural audiovisual integration or lip-reading, respectively, in speech perception.

  18. Audiovisual integration in speech perception: a multi-stage process

    DEFF Research Database (Denmark)

    Eskelund, Kasper; Tuomainen, Jyrki; Andersen, Tobias

    2011-01-01

    investigate whether the integration of auditory and visual speech observed in these two audiovisual integration effects are specific traits of speech perception. We further ask whether audiovisual integration is undertaken in a single processing stage or multiple processing stages....

  19. Discrimination and streaming of speech sounds based on differences in interaural and spectral cues.

    Science.gov (United States)

    David, Marion; Lavandier, Mathieu; Grimault, Nicolas; Oxenham, Andrew J

    2017-09-01

    Differences in spatial cues, including interaural time differences (ITDs), interaural level differences (ILDs) and spectral cues, can lead to stream segregation of alternating noise bursts. It is unknown how effective such cues are for streaming sounds with realistic spectro-temporal variations. In particular, it is not known whether the high-frequency spectral cues associated with elevation remain sufficiently robust under such conditions. To answer these questions, sequences of consonant-vowel tokens were generated and filtered by non-individualized head-related transfer functions to simulate the cues associated with different positions in the horizontal and median planes. A discrimination task showed that listeners could discriminate changes in interaural cues both when the stimulus remained constant and when it varied between presentations. However, discrimination of changes in spectral cues was much poorer in the presence of stimulus variability. A streaming task, based on the detection of repeated syllables in the presence of interfering syllables, revealed that listeners can use both interaural and spectral cues to segregate alternating syllable sequences, despite the large spectro-temporal differences between stimuli. However, only the full complement of spatial cues (ILDs, ITDs, and spectral cues) resulted in obligatory streaming in a task that encouraged listeners to integrate the tokens into a single stream.

  20. Integration of speech and gesture in aphasia.

    Science.gov (United States)

    Cocks, Naomi; Byrne, Suzanne; Pritchard, Madeleine; Morgan, Gary; Dipper, Lucy

    2018-02-07

    Information from speech and gesture is often integrated to comprehend a message. This integration process requires the appropriate allocation of cognitive resources to both the gesture and speech modalities. People with aphasia are likely to find integration of gesture and speech difficult. This is due to a reduction in cognitive resources, a difficulty with resource allocation or a combination of the two. Despite it being likely that people who have aphasia will have difficulty with integration, empirical evidence describing this difficulty is limited. Such a difficulty was found in a single case study by Cocks et al. in 2009, and is replicated here with a greater number of participants. To determine whether individuals with aphasia have difficulties understanding messages in which they have to integrate speech and gesture. Thirty-one participants with aphasia (PWA) and 30 control participants watched videos of an actor communicating a message in three different conditions: verbal only, gesture only, and verbal and gesture message combined. The message related to an action in which the name of the action (e.g., 'eat') was provided verbally and the manner of the action (e.g., hands in a position as though eating a burger) was provided gesturally. Participants then selected a picture that 'best matched' the message conveyed from a choice of four pictures which represented a gesture match only (G match), a verbal match only (V match), an integrated verbal-gesture match (Target) and an unrelated foil (UR). To determine the gain that participants obtained from integrating gesture and speech, a measure of multimodal gain (MMG) was calculated. The PWA were less able to integrate gesture and speech than the control participants and had significantly lower MMG scores. When the PWA had difficulty integrating, they more frequently selected the verbal match. The findings suggest that people with aphasia can have difficulty integrating speech and gesture in order to obtain

  1. Suprasegmental lexical stress cues in visual speech can guide spoken-word recognition.

    Science.gov (United States)

    Jesse, Alexandra; McQueen, James M

    2014-01-01

    Visual cues to the individual segments of speech and to sentence prosody guide speech recognition. The present study tested whether visual suprasegmental cues to the stress patterns of words can also constrain recognition. Dutch listeners use acoustic suprasegmental cues to lexical stress (changes in duration, amplitude, and pitch) in spoken-word recognition. We asked here whether they can also use visual suprasegmental cues. In two categorization experiments, Dutch participants saw a speaker say fragments of word pairs that were segmentally identical but differed in their stress realization (e.g., 'ca-vi from cavia "guinea pig" vs. 'ka-vi from kaviaar "caviar"). Participants were able to distinguish between these pairs from seeing a speaker alone. Only the presence of primary stress in the fragment, not its absence, was informative. Participants were able to distinguish visually primary from secondary stress on first syllables, but only when the fragment-bearing target word carried phrase-level emphasis. Furthermore, participants distinguished fragments with primary stress on their second syllable from those with secondary stress on their first syllable (e.g., pro-'jec from projector "projector" vs. 'pro-jec from projectiel "projectile"), independently of phrase-level emphasis. Seeing a speaker thus contributes to spoken-word recognition by providing suprasegmental information about the presence of primary lexical stress.

  2. Comparison of Cue-Controlled Desensitization, Rational Restructuring, and a Credible Placebo in the Treatment of Speech Anxiety.

    Science.gov (United States)

    Lent, Robert W.; And Others

    1981-01-01

    The efficacy of cue-controlled desensitization and systematic rational restructuring was compared with a placebo method and a waiting-list control in reducing public speaking and nontargeted anxieties. Cue-controlled desensitization was generally more effective than the other groups in reducing subjective speech anxiety. (Author)

  3. Multistage audiovisual integration of speech: dissociating identification and detection.

    Science.gov (United States)

    Eskelund, Kasper; Tuomainen, Jyrki; Andersen, Tobias S

    2011-02-01

    Speech perception integrates auditory and visual information. This is evidenced by the McGurk illusion where seeing the talking face influences the auditory phonetic percept and by the audiovisual detection advantage where seeing the talking face influences the detectability of the acoustic speech signal. Here, we show that identification of phonetic content and detection can be dissociated as speech-specific and non-specific audiovisual integration effects. To this end, we employed synthetically modified stimuli, sine wave speech (SWS), which is an impoverished speech signal that only observers informed of its speech-like nature recognize as speech. While the McGurk illusion only occurred for informed observers, the audiovisual detection advantage occurred for naïve observers as well. This finding supports a multistage account of audiovisual integration of speech in which the many attributes of the audiovisual speech signal are integrated by separate integration processes.

  4. INTEGRATING MACHINE TRANSLATION AND SPEECH SYNTHESIS COMPONENT FOR ENGLISH TO DRAVIDIAN LANGUAGE SPEECH TO SPEECH TRANSLATION SYSTEM

    Directory of Open Access Journals (Sweden)

    J. SANGEETHA

    2015-02-01

    Full Text Available This paper provides an interface between the machine translation and speech synthesis system for converting English speech to Tamil text in English to Tamil speech to speech translation system. The speech translation system consists of three modules: automatic speech recognition, machine translation and text to speech synthesis. Many procedures for incorporation of speech recognition and machine translation have been projected. Still speech synthesis system has not yet been measured. In this paper, we focus on integration of machine translation and speech synthesis, and report a subjective evaluation to investigate the impact of speech synthesis, machine translation and the integration of machine translation and speech synthesis components. Here we implement a hybrid machine translation (combination of rule based and statistical machine translation and concatenative syllable based speech synthesis technique. In order to retain the naturalness and intelligibility of synthesized speech Auto Associative Neural Network (AANN prosody prediction is used in this work. The results of this system investigation demonstrate that the naturalness and intelligibility of the synthesized speech are strongly influenced by the fluency and correctness of the translated text.

  5. Cue integration vs. exemplar-based reasoning in multi-attribute decisions from memory: A matter of cue representation

    OpenAIRE

    Arndt Broeder; Ben R. Newell; Christine Platzer

    2010-01-01

    Inferences about target variables can be achieved by deliberate integration of probabilistic cues or by retrieving similar cue-patterns (exemplars) from memory. In tasks with cue information presented in on-screen displays, rule-based strategies tend to dominate unless the abstraction of cue-target relations is unfeasible. This dominance has also been demonstrated --- surprisingly --- in experiments that demanded the retrieval of cue values from memory (M. Persson \\& J. Rieskamp, 2009). In th...

  6. A chimpanzee recognizes synthetic speech with significantly reduced acoustic cues to phonetic content.

    Science.gov (United States)

    Heimbauer, Lisa A; Beran, Michael J; Owren, Michael J

    2011-07-26

    A long-standing debate concerns whether humans are specialized for speech perception, which some researchers argue is demonstrated by the ability to understand synthetic speech with significantly reduced acoustic cues to phonetic content. We tested a chimpanzee (Pan troglodytes) that recognizes 128 spoken words, asking whether she could understand such speech. Three experiments presented 48 individual words, with the animal selecting a corresponding visuographic symbol from among four alternatives. Experiment 1 tested spectrally reduced, noise-vocoded (NV) synthesis, originally developed to simulate input received by human cochlear-implant users. Experiment 2 tested "impossibly unspeechlike" sine-wave (SW) synthesis, which reduces speech to just three moving tones. Although receiving only intermittent and noncontingent reward, the chimpanzee performed well above chance level, including when hearing synthetic versions for the first time. Recognition of SW words was least accurate but improved in experiment 3 when natural words in the same session were rewarded. The chimpanzee was more accurate with NV than SW versions, as were 32 human participants hearing these items. The chimpanzee's ability to spontaneously recognize acoustically reduced synthetic words suggests that experience rather than specialization is critical for speech-perception capabilities that some have suggested are uniquely human. Copyright © 2011 Elsevier Ltd. All rights reserved.

  7. Assessment of Spectral and Temporal Resolution in Cochlear Implant Users Using Psychoacoustic Discrimination and Speech Cue Categorization.

    Science.gov (United States)

    Winn, Matthew B; Won, Jong Ho; Moon, Il Joon

    This study was conducted to measure auditory perception by cochlear implant users in the spectral and temporal domains, using tests of either categorization (using speech-based cues) or discrimination (using conventional psychoacoustic tests). The authors hypothesized that traditional nonlinguistic tests assessing spectral and temporal auditory resolution would correspond to speech-based measures assessing specific aspects of phonetic categorization assumed to depend on spectral and temporal auditory resolution. The authors further hypothesized that speech-based categorization performance would ultimately be a superior predictor of speech recognition performance, because of the fundamental nature of speech recognition as categorization. Nineteen cochlear implant listeners and 10 listeners with normal hearing participated in a suite of tasks that included spectral ripple discrimination, temporal modulation detection, and syllable categorization, which was split into a spectral cue-based task (targeting the /ba/-/da/ contrast) and a timing cue-based task (targeting the /b/-/p/ and /d/-/t/ contrasts). Speech sounds were manipulated to contain specific spectral or temporal modulations (formant transitions or voice onset time, respectively) that could be categorized. Categorization responses were quantified using logistic regression to assess perceptual sensitivity to acoustic phonetic cues. Word recognition testing was also conducted for cochlear implant listeners. Cochlear implant users were generally less successful at utilizing both spectral and temporal cues for categorization compared with listeners with normal hearing. For the cochlear implant listener group, spectral ripple discrimination was significantly correlated with the categorization of formant transitions; both were correlated with better word recognition. Temporal modulation detection using 100- and 10-Hz-modulated noise was not correlated either with the cochlear implant subjects' categorization of

  8. Electrophysiological evidence for speech-specific audiovisual integration

    NARCIS (Netherlands)

    Baart, M.; Stekelenburg, J.J.; Vroomen, J.

    2014-01-01

    Lip-read speech is integrated with heard speech at various neural levels. Here, we investigated the extent to which lip-read induced modulations of the auditory N1 and P2 (measured with EEG) are indicative of speech-specific audiovisual integration, and we explored to what extent the ERPs were

  9. Electrophysiological evidence for speech-specific audiovisual integration.

    Science.gov (United States)

    Baart, Martijn; Stekelenburg, Jeroen J; Vroomen, Jean

    2014-01-01

    Lip-read speech is integrated with heard speech at various neural levels. Here, we investigated the extent to which lip-read induced modulations of the auditory N1 and P2 (measured with EEG) are indicative of speech-specific audiovisual integration, and we explored to what extent the ERPs were modulated by phonetic audiovisual congruency. In order to disentangle speech-specific (phonetic) integration from non-speech integration, we used Sine-Wave Speech (SWS) that was perceived as speech by half of the participants (they were in speech-mode), while the other half was in non-speech mode. Results showed that the N1 obtained with audiovisual stimuli peaked earlier than the N1 evoked by auditory-only stimuli. This lip-read induced speeding up of the N1 occurred for listeners in speech and non-speech mode. In contrast, if listeners were in speech-mode, lip-read speech also modulated the auditory P2, but not if listeners were in non-speech mode, thus revealing speech-specific audiovisual binding. Comparing ERPs for phonetically congruent audiovisual stimuli with ERPs for incongruent stimuli revealed an effect of phonetic stimulus congruency that started at ~200 ms after (in)congruence became apparent. Critically, akin to the P2 suppression, congruency effects were only observed if listeners were in speech mode, and not if they were in non-speech mode. Using identical stimuli, we thus confirm that audiovisual binding involves (partially) different neural mechanisms for sound processing in speech and non-speech mode. © 2013 Published by Elsevier Ltd.

  10. Multistage audiovisual integration of speech: dissociating identification and detection

    DEFF Research Database (Denmark)

    Eskelund, Kasper; Tuomainen, Jyrki; Andersen, Tobias

    2011-01-01

    Speech perception integrates auditory and visual information. This is evidenced by the McGurk illusion where seeing the talking face influences the auditory phonetic percept and by the audiovisual detection advantage where seeing the talking face influences the detectability of the acoustic speech...... signal. Here we show that identification of phonetic content and detection can be dissociated as speech-specific and non-specific audiovisual integration effects. To this end, we employed synthetically modified stimuli, sine wave speech (SWS), which is an impoverished speech signal that only observers...... informed of its speech-like nature recognize as speech. While the McGurk illusion only occurred for informed observers the audiovisual detection advantage occurred for naïve observers as well. This finding supports a multi-stage account of audiovisual integration of speech in which the many attributes...

  11. Atypical audiovisual speech integration in infants at risk for autism.

    Directory of Open Access Journals (Sweden)

    Jeanne A Guiraud

    Full Text Available The language difficulties often seen in individuals with autism might stem from an inability to integrate audiovisual information, a skill important for language development. We investigated whether 9-month-old siblings of older children with autism, who are at an increased risk of developing autism, are able to integrate audiovisual speech cues. We used an eye-tracker to record where infants looked when shown a screen displaying two faces of the same model, where one face is articulating/ba/and the other/ga/, with one face congruent with the syllable sound being presented simultaneously, the other face incongruent. This method was successful in showing that infants at low risk can integrate audiovisual speech: they looked for the same amount of time at the mouths in both the fusible visual/ga/- audio/ba/and the congruent visual/ba/- audio/ba/displays, indicating that the auditory and visual streams fuse into a McGurk-type of syllabic percept in the incongruent condition. It also showed that low-risk infants could perceive a mismatch between auditory and visual cues: they looked longer at the mouth in the mismatched, non-fusible visual/ba/- audio/ga/display compared with the congruent visual/ga/- audio/ga/display, demonstrating that they perceive an uncommon, and therefore interesting, speech-like percept when looking at the incongruent mouth (repeated ANOVA: displays x fusion/mismatch conditions interaction: F(1,16 = 17.153, p = 0.001. The looking behaviour of high-risk infants did not differ according to the type of display, suggesting difficulties in matching auditory and visual information (repeated ANOVA, displays x conditions interaction: F(1,25 = 0.09, p = 0.767, in contrast to low-risk infants (repeated ANOVA: displays x conditions x low/high-risk groups interaction: F(1,41 = 4.466, p = 0.041. In some cases this reduced ability might lead to the poor communication skills characteristic of autism.

  12. Comparison of Gated Audiovisual Speech Identification in Elderly Hearing Aid Users and Elderly Normal-Hearing Individuals: Effects of Adding Visual Cues to Auditory Speech Stimuli.

    Science.gov (United States)

    Moradi, Shahram; Lidestam, Björn; Rönnberg, Jerker

    2016-06-17

    The present study compared elderly hearing aid (EHA) users (n = 20) with elderly normal-hearing (ENH) listeners (n = 20) in terms of isolation points (IPs, the shortest time required for correct identification of a speech stimulus) and accuracy of audiovisual gated speech stimuli (consonants, words, and final words in highly and less predictable sentences) presented in silence. In addition, we compared the IPs of audiovisual speech stimuli from the present study with auditory ones extracted from a previous study, to determine the impact of the addition of visual cues. Both participant groups achieved ceiling levels in terms of accuracy in the audiovisual identification of gated speech stimuli; however, the EHA group needed longer IPs for the audiovisual identification of consonants and words. The benefit of adding visual cues to auditory speech stimuli was more evident in the EHA group, as audiovisual presentation significantly shortened the IPs for consonants, words, and final words in less predictable sentences; in the ENH group, audiovisual presentation only shortened the IPs for consonants and words. In conclusion, although the audiovisual benefit was greater for EHA group, this group had inferior performance compared with the ENH group in terms of IPs when supportive semantic context was lacking. Consequently, EHA users needed the initial part of the audiovisual speech signal to be longer than did their counterparts with normal hearing to reach the same level of accuracy in the absence of a semantic context. © The Author(s) 2016.

  13. Multisensory integration of speech sounds with letters vs. visual speech : only visual speech induces the mismatch negativity

    NARCIS (Netherlands)

    Stekelenburg, J.J.; Keetels, M.N.; Vroomen, J.H.M.

    2018-01-01

    Numerous studies have demonstrated that the vision of lip movements can alter the perception of auditory speech syllables (McGurk effect). While there is ample evidence for integration of text and auditory speech, there are only a few studies on the orthographic equivalent of the McGurk effect.

  14. Effects of noise and audiovisual cues on speech processing in adults with and without ADHD.

    Science.gov (United States)

    Michalek, Anne M P; Watson, Silvana M; Ash, Ivan; Ringleb, Stacie; Raymer, Anastasia

    2014-03-01

    This study examined the interplay among internal (e.g. attention, working memory abilities) and external (e.g. background noise, visual information) factors in individuals with and without ADHD. A 2 × 2 × 6 mixed design with correlational analyses was used to compare participant results on a standardized listening in noise sentence repetition task (QuickSin; Killion et al, 2004 ), presented in an auditory and an audiovisual condition as signal-to-noise ratio (SNR) varied from 25-0 dB and to determine individual differences in working memory capacity and short-term recall. Thirty-eight young adults without ADHD and twenty-five young adults with ADHD. Diagnosis, modality, and signal-to-noise ratio all affected the ability to process speech in noise. The interaction between the diagnosis of ADHD, the presence of visual cues, and the level of noise had an effect on a person's ability to process speech in noise. conclusion: Young adults with ADHD benefited less from visual information during noise than young adults without ADHD, an effect influenced by working memory abilities.

  15. A psychophysical imaging method evidencing auditory cue extraction during speech perception: a group analysis of auditory classification images.

    Science.gov (United States)

    Varnet, Léo; Knoblauch, Kenneth; Serniclaes, Willy; Meunier, Fanny; Hoen, Michel

    2015-01-01

    Although there is a large consensus regarding the involvement of specific acoustic cues in speech perception, the precise mechanisms underlying the transformation from continuous acoustical properties into discrete perceptual units remains undetermined. This gap in knowledge is partially due to the lack of a turnkey solution for isolating critical speech cues from natural stimuli. In this paper, we describe a psychoacoustic imaging method known as the Auditory Classification Image technique that allows experimenters to estimate the relative importance of time-frequency regions in categorizing natural speech utterances in noise. Importantly, this technique enables the testing of hypotheses on the listening strategies of participants at the group level. We exemplify this approach by identifying the acoustic cues involved in da/ga categorization with two phonetic contexts, Al- or Ar-. The application of Auditory Classification Images to our group of 16 participants revealed significant critical regions on the second and third formant onsets, as predicted by the literature, as well as an unexpected temporal cue on the first formant. Finally, through a cluster-based nonparametric test, we demonstrate that this method is sufficiently sensitive to detect fine modifications of the classification strategies between different utterances of the same phoneme.

  16. A psychophysical imaging method evidencing auditory cue extraction during speech perception: a group analysis of auditory classification images.

    Directory of Open Access Journals (Sweden)

    Léo Varnet

    Full Text Available Although there is a large consensus regarding the involvement of specific acoustic cues in speech perception, the precise mechanisms underlying the transformation from continuous acoustical properties into discrete perceptual units remains undetermined. This gap in knowledge is partially due to the lack of a turnkey solution for isolating critical speech cues from natural stimuli. In this paper, we describe a psychoacoustic imaging method known as the Auditory Classification Image technique that allows experimenters to estimate the relative importance of time-frequency regions in categorizing natural speech utterances in noise. Importantly, this technique enables the testing of hypotheses on the listening strategies of participants at the group level. We exemplify this approach by identifying the acoustic cues involved in da/ga categorization with two phonetic contexts, Al- or Ar-. The application of Auditory Classification Images to our group of 16 participants revealed significant critical regions on the second and third formant onsets, as predicted by the literature, as well as an unexpected temporal cue on the first formant. Finally, through a cluster-based nonparametric test, we demonstrate that this method is sufficiently sensitive to detect fine modifications of the classification strategies between different utterances of the same phoneme.

  17. Cueing musical emotions: An empirical analysis of 24-piece sets by Bach and Chopin documents parallels with emotional speech

    Directory of Open Access Journals (Sweden)

    Matthew ePoon

    2015-11-01

    Full Text Available Acoustic cues such as pitch height and timing are effective at communicating emotion in both music and speech. Numerous experiments altering musical passages have shown that higher and faster melodies generally sound happier than lower and slower melodies, findings consistent with corpus analyses of emotional speech. However, equivalent corpus analyses of complex time-varying cues in music are less common, due in part to the challenges of assembling an appropriate corpus. Here we describe a novel, score-based exploration of the use of pitch height and timing in a set of balanced major and minor key compositions. Our corpus contained all 24 Preludes and 24 Fugues from Bach’s Well Tempered Clavier (book 1, as well as all 24 of Chopin’s Preludes for piano. These three sets are balanced with respect to both modality (major/minor and key chroma (A, B, C, etc.. Consistent with predictions derived from speech, we found major-key (nominally happy pieces to be two semitones higher in pitch height and 29% faster than minor-key (nominally sad pieces. This demonstrates that our balanced corpus of major and minor key pieces uses low-level acoustic cues for emotion in a manner consistent with speech. A series of post-hoc analyses illustrate interesting trade-offs, with

  18. Cueing musical emotions: An empirical analysis of 24-piece sets by Bach and Chopin documents parallels with emotional speech.

    Science.gov (United States)

    Poon, Matthew; Schutz, Michael

    2015-01-01

    Acoustic cues such as pitch height and timing are effective at communicating emotion in both music and speech. Numerous experiments altering musical passages have shown that higher and faster melodies generally sound "happier" than lower and slower melodies, findings consistent with corpus analyses of emotional speech. However, equivalent corpus analyses of complex time-varying cues in music are less common, due in part to the challenges of assembling an appropriate corpus. Here, we describe a novel, score-based exploration of the use of pitch height and timing in a set of "balanced" major and minor key compositions. Our analysis included all 24 Preludes and 24 Fugues from Bach's Well-Tempered Clavier (book 1), as well as all 24 of Chopin's Preludes for piano. These three sets are balanced with respect to both modality (major/minor) and key chroma ("A," "B," "C," etc.). Consistent with predictions derived from speech, we found major-key (nominally "happy") pieces to be two semitones higher in pitch height and 29% faster than minor-key (nominally "sad") pieces. This demonstrates that our balanced corpus of major and minor key pieces uses low-level acoustic cues for emotion in a manner consistent with speech. A series of post hoc analyses illustrate interesting trade-offs, with sets featuring greater emphasis on timing distinctions between modalities exhibiting the least pitch distinction, and vice-versa. We discuss these findings in the broader context of speech-music research, as well as recent scholarship exploring the historical evolution of cue use in Western music.

  19. PRACTICING SPEECH THERAPY INTERVENTION FOR SOCIAL INTEGRATION OF CHILDREN WITH SPEECH DISORDERS

    Directory of Open Access Journals (Sweden)

    Martin Ofelia POPESCU

    2016-11-01

    Full Text Available The article presents a concise speech correction intervention program in of dyslalia in conjunction with capacity development of intra, interpersonal and social integration of children with speech disorders. The program main objectives represent: the potential increasing of individual social integration by correcting speech disorders in conjunction with intra- and interpersonal capacity, the potential growth of children and community groups for social integration by optimizing the socio-relational context of children with speech disorder. In the program were included 60 children / students with dyslalia speech disorders (monomorphic and polymorphic dyslalia, from 11 educational institutions - 6 kindergartens and 5 schools / secondary schools, joined with inter-school logopedic centre (CLI from Targu Jiu city and areas of Gorj district. The program was implemented under the assumption that therapeutic-formative intervention to correct speech disorders and facilitate the social integration will lead, in combination with correct pronunciation disorders, to social integration optimization of children with speech disorders. The results conirm the hypothesis and gives facts about the intervention program eficiency.

  20. Audiovisual integration of speech in a patient with Broca's Aphasia

    Science.gov (United States)

    Andersen, Tobias S.; Starrfelt, Randi

    2015-01-01

    Lesions to Broca's area cause aphasia characterized by a severe impairment of the ability to speak, with comparatively intact speech perception. However, some studies have found effects on speech perception under adverse listening conditions, indicating that Broca's area is also involved in speech perception. While these studies have focused on auditory speech perception other studies have shown that Broca's area is activated by visual speech perception. Furthermore, one preliminary report found that a patient with Broca's aphasia did not experience the McGurk illusion suggesting that an intact Broca's area is necessary for audiovisual integration of speech. Here we describe a patient with Broca's aphasia who experienced the McGurk illusion. This indicates that an intact Broca's area is not necessary for audiovisual integration of speech. The McGurk illusions this patient experienced were atypical, which could be due to Broca's area having a more subtle role in audiovisual integration of speech. The McGurk illusions of a control subject with Wernicke's aphasia were, however, also atypical. This indicates that the atypical McGurk illusions were due to deficits in speech processing that are not specific to Broca's aphasia. PMID:25972819

  1. Visual-Auditory Integration during Speech Imitation in Autism

    Science.gov (United States)

    Williams, Justin H. G.; Massaro, Dominic W.; Peel, Natalie J.; Bosseler, Alexis; Suddendorf, Thomas

    2004-01-01

    Children with autistic spectrum disorder (ASD) may have poor audio-visual integration, possibly reflecting dysfunctional "mirror neuron" systems which have been hypothesised to be at the core of the condition. In the present study, a computer program, utilizing speech synthesizer software and a "virtual" head (Baldi), delivered speech stimuli for…

  2. Music expertise shapes audiovisual temporal integration windows for speech, sinewave speech and music

    Directory of Open Access Journals (Sweden)

    Hwee Ling eLee

    2014-08-01

    Full Text Available This psychophysics study used musicians as a model to investigate whether musical expertise shapes the temporal integration window for audiovisual speech, sinewave speech or music. Musicians and non-musicians judged the audiovisual synchrony of speech, sinewave analogues of speech, and music stimuli at 13 audiovisual stimulus onset asynchronies (±360, ±300 ±240, ±180, ±120, ±60, and 0 ms. Further, we manipulated the duration of the stimuli by presenting sentences/melodies or syllables/tones. Critically, musicians relative to non-musicians exhibited significantly narrower temporal integration windows for both music and sinewave speech. Further, the temporal integration window for music decreased with the amount of music practice, but not with age of acquisition. In other words, the more musicians practiced piano in the past three years, the more sensitive they became to the temporal misalignment of visual and auditory signals. Collectively, our findings demonstrate that music practicing fine-tunes the audiovisual temporal integration window to various extents depending on the stimulus class. While the effect of piano practicing was most pronounced for music, it also generalized to other stimulus classes such as sinewave speech and to a marginally significant degree to natural speech.

  3. Electrophysiological assessment of audiovisual integration in speech perception

    DEFF Research Database (Denmark)

    Eskelund, Kasper; Dau, Torsten

    Speech perception integrates signal from ear and eye. This is witnessed by a wide range of audiovisual integration effects, such as ventriloquism and the McGurk illusion. Some behavioral evidence suggest that audiovisual integration of specific aspects is special for speech perception. However, our...... knowledge of such bimodal integration would be strengthened if the phenomena could be investigated by objective, neutrally based methods. One key question of the present work is if perceptual processing of audiovisual speech can be gauged with a specific signature of neurophysiological activity...... on the auditory speech percept? In two experiments, which both combine behavioral and neurophysiological measures, an uncovering of the relation between perception of faces and of audiovisual integration is attempted. Behavioral findings suggest a strong effect of face perception, whereas the MMN results are less...

  4. Sonority's Effect as a Surface Cue on Lexical Speech Perception of Children With Cochlear Implants.

    Science.gov (United States)

    Hamza, Yasmeen; Okalidou, Areti; Kyriafinis, George; van Wieringen, Astrid

    2018-03-06

    Sonority is the relative perceptual prominence/loudness of speech sounds of the same length, stress, and pitch. Children with cochlear implants (CIs), with restored audibility and relatively intact temporal processing, are expected to benefit from the perceptual prominence cues of highly sonorous sounds. Sonority also influences lexical access through the sonority-sequencing principle (SSP), a grammatical phonotactic rule, which facilitates the recognition and segmentation of syllables within speech. The more nonsonorous the onset of a syllable is, the larger is the degree of sonority rise to the nucleus, and the more optimal the SSP. Children with CIs may experience hindered or delayed development of the language-learning rule SSP, as a result of their deprived/degraded auditory experience. The purpose of the study was to explore sonority's role in speech perception and lexical access of prelingually deafened children with CIs. A case-control study with 15 children with CIs, 25 normal-hearing children (NHC), and 50 normal-hearing adults was conducted, using a lexical identification task of novel, nonreal CV-CV words taught via fast mapping. The CV-CV words were constructed according to four sonority conditions, entailing syllables with sonorous onsets/less optimal SSP (SS) and nonsonorous onsets/optimal SSP (NS) in all combinations, that is, SS-SS, SS-NS, NS-SS, and NS-NS. Outcome measures were accuracy and reaction times (RTs). A subgroup analysis of 12 children with CIs pair matched to 12 NHC on hearing age aimed to study the effect of oral-language exposure period on the sonority-related performance. The children groups showed similar accuracy performance, overall and across all the sonority conditions. However, within-group comparisons showed that the children with CIs scored more accurately on the SS-SS condition relative to the NS-NS and NS-SS conditions, while the NHC performed equally well across all conditions. Additionally, adult-comparable accuracy

  5. Audiovisual integration in children listening to spectrally degraded speech.

    Science.gov (United States)

    Maidment, David W; Kang, Hi Jee; Stewart, Hannah J; Amitay, Sygal

    2015-02-01

    The study explored whether visual information improves speech identification in typically developing children with normal hearing when the auditory signal is spectrally degraded. Children (n=69) and adults (n=15) were presented with noise-vocoded sentences from the Children's Co-ordinate Response Measure (Rosen, 2011) in auditory-only or audiovisual conditions. The number of bands was adaptively varied to modulate the degradation of the auditory signal, with the number of bands required for approximately 79% correct identification calculated as the threshold. The youngest children (4- to 5-year-olds) did not benefit from accompanying visual information, in comparison to 6- to 11-year-old children and adults. Audiovisual gain also increased with age in the child sample. The current data suggest that children younger than 6 years of age do not fully utilize visual speech cues to enhance speech perception when the auditory signal is degraded. This evidence not only has implications for understanding the development of speech perception skills in children with normal hearing but may also inform the development of new treatment and intervention strategies that aim to remediate speech perception difficulties in pediatric cochlear implant users.

  6. Audiovisual integration for speech during mid-childhood: Electrophysiological evidence

    Science.gov (United States)

    Kaganovich, Natalya; Schumaker, Jennifer

    2014-01-01

    Previous studies have demonstrated that the presence of visual speech cues reduces the amplitude and latency of the N1 and P2 event-related potential (ERP) components elicited by speech stimuli. However, the developmental trajectory of this effect is not yet fully mapped. We examined ERP responses to auditory, visual, and audiovisual speech in two groups of school-age children (7–8-year-olds and 10–11-year-olds) and in adults. Audiovisual speech led to the attenuation of the N1 and P2 components in all groups of participants, suggesting that the neural mechanisms underlying these effects are functional by early school years. Additionally, while the reduction in N1 was largest over the right scalp, the P2 attenuation was largest over the left and midline scalp. The difference in the hemispheric distribution of the N1 and P2 attenuation supports the idea that these components index at least somewhat disparate neural processes within the context of audiovisual speech perception. PMID:25463815

  7. Aging and Spectro-Temporal Integration of Speech

    Directory of Open Access Journals (Sweden)

    John H. Grose

    2016-10-01

    Full Text Available The purpose of this study was to determine the effects of age on the spectro-temporal integration of speech. The hypothesis was that the integration of speech fragments distributed over frequency, time, and ear of presentation is reduced in older listeners—even for those with good audiometric hearing. Younger, middle-aged, and older listeners (10 per group with good audiometric hearing participated. They were each tested under seven conditions that encompassed combinations of spectral, temporal, and binaural integration. Sentences were filtered into two bands centered at 500 Hz and 2500 Hz, with criterion bandwidth tailored for each participant. In some conditions, the speech bands were individually square wave interrupted at a rate of 10 Hz. Configurations of uninterrupted, synchronously interrupted, and asynchronously interrupted frequency bands were constructed that constituted speech fragments distributed across frequency, time, and ear of presentation. The over-arching finding was that, for most configurations, performance was not differentially affected by listener age. Although speech intelligibility varied across condition, there was no evidence of performance deficits in older listeners in any condition. This study indicates that age, per se, does not necessarily undermine the ability to integrate fragments of speech dispersed across frequency and time.

  8. Integrated Phoneme Subspace Method for Speech Feature Extraction

    Directory of Open Access Journals (Sweden)

    Park Hyunsin

    2009-01-01

    Full Text Available Speech feature extraction has been a key focus in robust speech recognition research. In this work, we discuss data-driven linear feature transformations applied to feature vectors in the logarithmic mel-frequency filter bank domain. Transformations are based on principal component analysis (PCA, independent component analysis (ICA, and linear discriminant analysis (LDA. Furthermore, this paper introduces a new feature extraction technique that collects the correlation information among phoneme subspaces and reconstructs feature space for representing phonemic information efficiently. The proposed speech feature vector is generated by projecting an observed vector onto an integrated phoneme subspace (IPS based on PCA or ICA. The performance of the new feature was evaluated for isolated word speech recognition. The proposed method provided higher recognition accuracy than conventional methods in clean and reverberant environments.

  9. Suppression of the µ rhythm during speech and non-speech discrimination revealed by independent component analysis: implications for sensorimotor integration in speech processing.

    Science.gov (United States)

    Bowers, Andrew; Saltuklaroglu, Tim; Harkrider, Ashley; Cuellar, Megan

    2013-01-01

    Constructivist theories propose that articulatory hypotheses about incoming phonetic targets may function to enhance perception by limiting the possibilities for sensory analysis. To provide evidence for this proposal, it is necessary to map ongoing, high-temporal resolution changes in sensorimotor activity (i.e., the sensorimotor μ rhythm) to accurate speech and non-speech discrimination performance (i.e., correct trials.). Sixteen participants (15 female and 1 male) were asked to passively listen to or actively identify speech and tone-sweeps in a two-force choice discrimination task while the electroencephalograph (EEG) was recorded from 32 channels. The stimuli were presented at signal-to-noise ratios (SNRs) in which discrimination accuracy was high (i.e., 80-100%) and low SNRs producing discrimination performance at chance. EEG data were decomposed using independent component analysis and clustered across participants using principle component methods in EEGLAB. ICA revealed left and right sensorimotor µ components for 14/16 and 13/16 participants respectively that were identified on the basis of scalp topography, spectral peaks, and localization to the precentral and postcentral gyri. Time-frequency analysis of left and right lateralized µ component clusters revealed significant (pFDRspeech discrimination trials relative to chance trials following stimulus offset. Findings are consistent with constructivist, internal model theories proposing that early forward motor models generate predictions about likely phonemic units that are then synthesized with incoming sensory cues during active as opposed to passive processing. Future directions and possible translational value for clinical populations in which sensorimotor integration may play a functional role are discussed.

  10. Patients with hippocampal amnesia successfully integrate gesture and speech.

    Science.gov (United States)

    Hilverman, Caitlin; Clough, Sharice; Duff, Melissa C; Cook, Susan Wagner

    2018-06-19

    During conversation, people integrate information from co-speech hand gestures with information in spoken language. For example, after hearing the sentence, "A piece of the log flew up and hit Carl in the face" while viewing a gesture directed at the nose, people tend to later report that the log hit Carl in the nose (information only in gesture) rather than in the face (information in speech). The cognitive and neural mechanisms that support the integration of gesture with speech are unclear. One possibility is that the hippocampus - known for its role in relational memory and information integration - is necessary for integrating gesture and speech. To test this possibility, we examined how patients with hippocampal amnesia and healthy and brain-damaged comparison participants express information from gesture in a narrative retelling task. Participants watched videos of an experimenter telling narratives that included hand gestures that contained supplementary information. Participants were asked to retell the narratives and their spoken retellings were assessed for the presence of information from gesture. For features that had been accompanied by supplementary gesture, patients with amnesia retold fewer of these features overall and fewer retellings that matched the speech from the narrative. Yet their retellings included features that contained information that had been present uniquely in gesture in amounts that were not reliably different from comparison groups. Thus, a functioning hippocampus is not necessary for gesture-speech integration over short timescales. Providing unique information in gesture may enhance communication for individuals with declarative memory impairment, possibly via non-declarative memory mechanisms. Copyright © 2018. Published by Elsevier Ltd.

  11. Toward Speech and Nonverbal Behaviors Integration for Humanoid Robot

    Directory of Open Access Journals (Sweden)

    Wei Wang

    2012-09-01

    Full Text Available It is essential to integrate speeches and nonverbal behaviors for a humanoid robot in human-robot interaction. This paper presents an approach using multi-object genetic algorithm to match the speeches and behaviors automatically. Firstly, with humanoid robot's emotion status, we construct a hierarchical structure to link voice characteristics and nonverbal behaviors. Secondly, these behaviors corresponding to speeches are matched and integrated into an action sequence based on genetic algorithm, so the robot can consistently speak and perform emotional behaviors. Our approach takes advantage of relevant knowledge described by psychologists and nonverbal communication. And from experiment results, our ultimate goal, implementing an affective robot to act and speak with partners vividly and fluently, could be achieved.

  12. Gesture-speech integration in children with specific language impairment.

    Science.gov (United States)

    Mainela-Arnold, Elina; Alibali, Martha W; Hostetter, Autumn B; Evans, Julia L

    2014-11-01

    Previous research suggests that speakers are especially likely to produce manual communicative gestures when they have relative ease in thinking about the spatial elements of what they are describing, paired with relative difficulty organizing those elements into appropriate spoken language. Children with specific language impairment (SLI) exhibit poor expressive language abilities together with within-normal-range nonverbal IQs. This study investigated whether weak spoken language abilities in children with SLI influence their reliance on gestures to express information. We hypothesized that these children would rely on communicative gestures to express information more often than their age-matched typically developing (TD) peers, and that they would sometimes express information in gestures that they do not express in the accompanying speech. Participants were 15 children with SLI (aged 5;6-10;0) and 18 age-matched TD controls. Children viewed a wordless cartoon and retold the story to a listener unfamiliar with the story. Children's gestures were identified and coded for meaning using a previously established system. Speech-gesture combinations were coded as redundant if the information conveyed in speech and gesture was the same, and non-redundant if the information conveyed in speech was different from the information conveyed in gesture. Children with SLI produced more gestures than children in the TD group; however, the likelihood that speech-gesture combinations were non-redundant did not differ significantly across the SLI and TD groups. In both groups, younger children were significantly more likely to produce non-redundant speech-gesture combinations than older children. The gesture-speech integration system functions similarly in children with SLI and TD, but children with SLI rely more on gesture to help formulate, conceptualize or express the messages they want to convey. This provides motivation for future research examining whether interventions

  13. Severe Speech Sound Disorders: An Integrated Multimodal Intervention

    Science.gov (United States)

    King, Amie M.; Hengst, Julie A.; DeThorne, Laura S.

    2013-01-01

    Purpose: This study introduces an integrated multimodal intervention (IMI) and examines its effectiveness for the treatment of persistent and severe speech sound disorders (SSD) in young children. The IMI is an activity-based intervention that focuses simultaneously on increasing the "quantity" of a child's meaningful productions of target words…

  14. Integrating speech in time depends on temporal expectancies and attention.

    Science.gov (United States)

    Scharinger, Mathias; Steinberg, Johanna; Tavano, Alessandro

    2017-08-01

    Sensory information that unfolds in time, such as in speech perception, relies on efficient chunking mechanisms in order to yield optimally-sized units for further processing. Whether or not two successive acoustic events receive a one-unit or a two-unit interpretation seems to depend on the fit between their temporal extent and a stipulated temporal window of integration. However, there is ongoing debate on how flexible this temporal window of integration should be, especially for the processing of speech sounds. Furthermore, there is no direct evidence of whether attention may modulate the temporal constraints on the integration window. For this reason, we here examine how different word durations, which lead to different temporal separations of sound onsets, interact with attention. In an Electroencephalography (EEG) study, participants actively and passively listened to words where word-final consonants were occasionally omitted. Words had either a natural duration or were artificially prolonged in order to increase the separation of speech sound onsets. Omission responses to incomplete speech input, originating in left temporal cortex, decreased when the critical speech sound was separated from previous sounds by more than 250 msec, i.e., when the separation was larger than the stipulated temporal window of integration (125-150 msec). Attention, on the other hand, only increased omission responses for stimuli with natural durations. We complemented the event-related potential (ERP) analyses by a frequency-domain analysis on the stimulus presentation rate. Notably, the power of stimulation frequency showed the same duration and attention effects than the omission responses. We interpret these findings on the background of existing research on temporal integration windows and further suggest that our findings may be accounted for within the framework of predictive coding. Copyright © 2017 Elsevier Ltd. All rights reserved.

  15. Cue integration vs. exemplar-based reasoning in multi-attribute decisions from memory

    Directory of Open Access Journals (Sweden)

    Arndt Broeder

    2010-08-01

    Full Text Available Inferences about target variables can be achieved by deliberate integration of probabilistic cues or by retrieving similar cue-patterns (exemplars from memory. In tasks with cue information presented in on-screen displays, rule-based strategies tend to dominate unless the abstraction of cue-target relations is unfeasible. This dominance has also been demonstrated --- surprisingly --- in experiments that demanded the retrieval of cue values from memory (M. Persson and J. Rieskamp, 2009. In three modified replications involving a fictitious disease, binary cue values were represented either by alternative symptoms (e.g., fever vs. hypothermia or by symptom presence vs. absence (e.g., fever vs. no fever. The former representation might hinder cue abstraction. The cues were predictive of the severity of the disease, and participants had to infer in each trial who of two patients was sicker. Both experiments replicated the rule-dominance with present-absent cues but yielded higher percentages of exemplar-based strategies with alternative cues. The experiments demonstrate that a change in cue representation may induce a dramatic shift from rule-based to exemplar-based reasoning in formally identical tasks.

  16. Audiovisual integration of speech falters under high attention demands.

    Science.gov (United States)

    Alsius, Agnès; Navarra, Jordi; Campbell, Ruth; Soto-Faraco, Salvador

    2005-05-10

    One of the most commonly cited examples of human multisensory integration occurs during exposure to natural speech, when the vocal and the visual aspects of the signal are integrated in a unitary percept. Audiovisual association of facial gestures and vocal sounds has been demonstrated in nonhuman primates and in prelinguistic children, arguing for a general basis for this capacity. One critical question, however, concerns the role of attention in such multisensory integration. Although both behavioral and neurophysiological studies have converged on a preattentive conceptualization of audiovisual speech integration, this mechanism has rarely been measured under conditions of high attentional load, when the observers' attention resources are depleted. We tested the extent to which audiovisual integration was modulated by the amount of available attentional resources by measuring the observers' susceptibility to the classic McGurk illusion in a dual-task paradigm. The proportion of visually influenced responses was severely, and selectively, reduced if participants were concurrently performing an unrelated visual or auditory task. In contrast with the assumption that crossmodal speech integration is automatic, our results suggest that these multisensory binding processes are subject to attentional demands.

  17. Interactional convergence in conversational storytelling: when reported speech is a cue of alignment and/or affiliation

    Directory of Open Access Journals (Sweden)

    Mathilde eGuardiola

    2013-10-01

    Full Text Available This paper investigates how and when interactional convergence is established by participants in conversation. We analyze sequences of storytelling using an original method which combines Conversation Analysis and a corpus-based approach. In storytelling, the participant in the position of listener is expected to produce either generic or specific responses adapted to the storyteller’s narrative. The listener’s behavior produced within the current activity, is a cue of his or her interactional alignment. We show here that the listener can produce a specific type of (aligned response which we term a reported speech utterance in echo. The participant who is not telling the story is nonetheless able to animate the characters, while reversing the usual asymmetric roles of storyteller and listener. The use of this device is a way for the listener to display a stance toward the events told by the storyteller. If the listener’s stance is congruent with that of the storyteller, this reveals a high degree of affiliation between the participants. We present seventeen extracts from a collection of 94 instances of echo reported speech which we examined using the concepts of alignment and affiliation in order to show how different kinds of convergent sequences are constructed. We demonstrate that this phenomenon is mainly used by the listener in order to align and affiliate with the storyteller by means of reformulative or overbidding Echo Reported Speech. We also show that in affiliative sequences, reported speech can be used by the listener in a humorous way in order to temporarily disalign. This disalignment constitutes a potential starting point for an oblique sequence, which, if accepted and continued by the storyteller, gives rise to a highly convergent sequence.

  18. Subcortical encoding of speech cues in children with attention deficit hyperactivity disorder.

    Science.gov (United States)

    Jafari, Zahra; Malayeri, Saeed; Rostami, Reza

    2015-02-01

    There is little information about processing of nonspeech and speech stimuli at the subcortical level in individuals with attention deficit hyperactivity disorder (ADHD). The auditory brainstem response (ABR) provides information about the function of the auditory brainstem pathways. We aim to investigate the subcortical function in neural encoding of click and speech stimuli in children with ADHD. The subjects include 50 children with ADHD and 34 typically developing (TD) children between the ages of 8 and 12 years. Click ABR (cABR) and speech ABR (sABR) with 40 ms synthetic /da/ syllable stimulus were recorded. Latencies of cABR in waves of III and V and duration of V-Vn (P⩽0.027), and latencies of sABR in waves A, D, E, F and O and duration of V-A (P⩽0.034) were significantly longer in children with ADHD than in TD children. There were no apparent differences in components the sustained frequency following response (FFR). We conclude that children with ADHD have deficits in temporal neural encoding of both nonspeech and speech stimuli. There is a common dysfunction in the processing of click and speech stimuli at the brainstem level in children with suspected ADHD. Copyright © 2015. Published by Elsevier Ireland Ltd.

  19. Working Memory and Speech Recognition in Noise under Ecologically Relevant Listening Conditions: Effects of Visual Cues and Noise Type among Adults with Hearing Loss

    Science.gov (United States)

    Miller, Christi W.; Stewart, Erin K.; Wu, Yu-Hsiang; Bishop, Christopher; Bentler, Ruth A.; Tremblay, Kelly

    2017-01-01

    Purpose: This study evaluated the relationship between working memory (WM) and speech recognition in noise with different noise types as well as in the presence of visual cues. Method: Seventy-six adults with bilateral, mild to moderately severe sensorineural hearing loss (mean age: 69 years) participated. Using a cross-sectional design, 2…

  20. Cue Integration in Dynamic Decision Marking (Integration des indices dans la Prise de Decision Dynamique)

    Science.gov (United States)

    2010-07-01

    e.g., pulling the goalie in hockey). Outcomes are thought of as changes from a reference point- the reference point is itself changed by how the...challenges and training related issues can be pulled from the literature. These are discussed below. General Themes Cue integration involves the use...from our understanding of the term "pattern recognition", a number of general themes, challenges and training related issues can be pulled from the

  1. Gesture and Speech Integration: An Exploratory Study of a Man with Aphasia

    Science.gov (United States)

    Cocks, Naomi; Sautin, Laetitia; Kita, Sotaro; Morgan, Gary; Zlotowitz, Sally

    2009-01-01

    Background: In order to comprehend fully a speaker's intention in everyday communication, information is integrated from multiple sources, including gesture and speech. There are no published studies that have explored the impact of aphasia on iconic co-speech gesture and speech integration. Aims: To explore the impact of aphasia on co-speech…

  2. Interactional convergence in conversational storytelling: when reported speech is a cue of alignment and/or affiliation.

    Science.gov (United States)

    Guardiola, Mathilde; Bertrand, Roxane

    2013-01-01

    This paper investigates how and when interactional convergence is established by participants in conversation. We analyze sequences of storytelling using an original method that combines Conversation Analysis and a corpus-based approach. In storytelling, the participant in the position of "listener" is expected to produce either generic or specific responses adapted to the storyteller's narrative. The listener's behavior produced within the current activity is a cue of his/her interactional alignment. We show here that the listener can produce a specific type of (aligned) response, which we term a reported speech utterance in echo. The participant who is not telling the story is nonetheless able to animate the characters, while reversing the usual asymmetric roles of storyteller and listener. The use of this device is a way for the listener to display his/her stance toward the events told by the storyteller. If the listener's stance is congruent with that of the storyteller, this reveals a high degree of affiliation between the participants. We present seventeen excerpts from a collection of 94 instances of Echo Reported Speech (ERS) which we examined using the concepts of alignment and affiliation in order to show how different kinds of convergent sequences are constructed. We demonstrate that this phenomenon is mainly used by the listener to align and affiliate with the storyteller by means of reformulative, enumerative, or overbidding ERS. We also show that in affiliative sequences, reported speech can be used by the listener in a humorous way in order to temporarily disalign. This disalignment constitutes a potential starting point for an oblique sequence, which, if accepted and continued by the storyteller, gives rise to a highly convergent sequence.

  3. Integration of asynchronous knowledge sources in a novel speech recognition framework

    OpenAIRE

    Van hamme, Hugo

    2008-01-01

    Van hamme H., ''Integration of asynchronous knowledge sources in a novel speech recognition framework'', Proceedings ITRW on speech analysis and processing for knowledge discovery, 4 pp., June 2008, Aalborg, Denmark.

  4. Iconic Gestures for Robot Avatars, Recognition and Integration with Speech

    Science.gov (United States)

    Bremner, Paul; Leonards, Ute

    2016-01-01

    Co-verbal gestures are an important part of human communication, improving its efficiency and efficacy for information conveyance. One possible means by which such multi-modal communication might be realized remotely is through the use of a tele-operated humanoid robot avatar. Such avatars have been previously shown to enhance social presence and operator salience. We present a motion tracking based tele-operation system for the NAO robot platform that allows direct transmission of speech and gestures produced by the operator. To assess the capabilities of this system for transmitting multi-modal communication, we have conducted a user study that investigated if robot-produced iconic gestures are comprehensible, and are integrated with speech. Robot performed gesture outcomes were compared directly to those for gestures produced by a human actor, using a within participant experimental design. We show that iconic gestures produced by a tele-operated robot are understood by participants when presented alone, almost as well as when produced by a human. More importantly, we show that gestures are integrated with speech when presented as part of a multi-modal communication equally well for human and robot performances. PMID:26925010

  5. Iconic Gestures for Robot Avatars, Recognition and Integration with Speech

    Directory of Open Access Journals (Sweden)

    Paul Adam Bremner

    2016-02-01

    Full Text Available Co-verbal gestures are an important part of human communication, improving its efficiency and efficacy for information conveyance. One possible means by which such multi-modal communication might be realised remotely is through the use of a tele-operated humanoid robot avatar. Such avatars have been previously shown to enhance social presence and operator salience. We present a motion tracking based tele-operation system for the NAO robot platform that allows direct transmission of speech and gestures produced by the operator. To assess the capabilities of this system for transmitting multi-modal communication, we have conducted a user study that investigated if robot-produced iconic gestures are comprehensible, and are integrated with speech. Robot performed gesture outcomes were compared directly to those for gestures produced by a human actor, using a within participant experimental design. We show that iconic gestures produced by a tele-operated robot are understood by participants when presented alone, almost as well as when produced by a human. More importantly, we show that gestures are integrated with speech when presented as part of a multi-modal communication equally well for human and robot performances.

  6. Audiovisual Integration of Speech in a Patient with Broca’s Aphasia

    DEFF Research Database (Denmark)

    Andersen, Tobias; Starrfelt, Randi

    2015-01-01

    perception. While these studies have focused on auditory speech perception other studies have shown that Broca's area is activated by visual speech perception. Furthermore, one preliminary report found that a patient with Broca's aphasia did not experience the McGurk illusion suggesting that an intact Broca......Lesions to Broca's area cause aphasia characterized by a severe impairment of the ability to speak, with comparatively intact speech perception. However, some studies have found effects on speech perception under adverse listening conditions, indicating that Broca's area is also involved in speech......'s area is necessary for audiovisual integration of speech. Here we describe a patient with Broca's aphasia who experienced the McGurk illusion. This indicates that an intact Broca's area is not necessary for audiovisual integration of speech. The McGurk illusions this patient experienced were atypical...

  7. "Frequent frames" in German child-directed speech: a limited cue to grammatical categories.

    Science.gov (United States)

    Stumper, Barbara; Bannard, Colin; Lieven, Elena; Tomasello, Michael

    2011-08-01

    Mintz (2003) found that in English child-directed speech, frequently occurring frames formed by linking the preceding (A) and succeeding (B) word (A_x_B) could accurately predict the syntactic category of the intervening word (x). This has been successfully extended to French (Chemla, Mintz, Bernal, & Christophe, 2009). In this paper, we show that, as for Dutch (Erkelens, 2009), frequent frames in German do not enable such accurate lexical categorization. This can be explained by the characteristics of German including a less restricted word order compared to English or French and the frequent use of some forms as both determiner and pronoun in colloquial German. Finally, we explore the relationship between the accuracy of frames and their potential utility and find that even some of those frames showing high token-based accuracy are of limited value because they are in fact set phrases with little or no variability in the slot position. Copyright © 2011 Cognitive Science Society, Inc.

  8. Detection vs. selection: integration of genetic, epigenetic and environmental cues in fluctuating environments.

    Science.gov (United States)

    McNamara, John M; Dall, Sasha R X; Hammerstein, Peter; Leimar, Olof

    2016-10-01

    There are many inputs during development that influence an organism's fit to current or upcoming environments. These include genetic effects, transgenerational epigenetic influences, environmental cues and developmental noise, which are rarely investigated in the same formal framework. We study an analytically tractable evolutionary model, in which cues are integrated to determine mature phenotypes in fluctuating environments. Environmental cues received during development and by the mother as an adult act as detection-based (individually observed) cues. The mother's phenotype and a quantitative genetic effect act as selection-based cues (they correlate with environmental states after selection). We specify when such cues are complementary and tend to be used together, and when using the most informative cue will predominate. Thus, we extend recent analyses of the evolutionary implications of subsets of these effects by providing a general diagnosis of the conditions under which detection and selection-based influences on development are likely to evolve and coexist. © 2016 John Wiley & Sons Ltd/CNRS.

  9. Integration of multiple intraguild predator cues for oviposition decisions by a predatory mite

    Science.gov (United States)

    Walzer, Andreas; Schausberger, Peter

    2012-01-01

    In mutual intraguild predation (IGP), the role of individual guild members is strongly context dependent and, during ontogeny, can shift from an intraguild (IG) prey to a food competitor or to an IG predator. Consequently, recognition of an offspring's predator is more complex for IG than classic prey females. Thus, IG prey females should be able to modulate their oviposition decisions by integrating multiple IG predator cues and by experience. Using a guild of plant-inhabiting predatory mites sharing the spider mite Tetranychus urticae as prey and passing through ontogenetic role shifts in mutual IGP, we assessed the effects of single and combined direct cues of the IG predator Amblyseius andersoni (eggs and traces left by a female on the substrate) on prey patch selection and oviposition behaviour of naïve and IG predator-experienced IG prey females of Phytoseiulus persimilis. The IG prey females preferentially resided in patches without predator cues when the alternative patch contained traces of predator females or the cue combination. Preferential egg placement in patches without predator cues was only apparent in the choice situation with the cue combination. Experience increased the responsiveness of females exposed to the IG predator cue combination, indicated by immediate selection of the prey patch without predator cues and almost perfect oviposition avoidance in patches with the cue combination. We argue that the evolution of the ability of IG prey females to evaluate offspring's IGP risk accurately is driven by the irreversibility of oviposition and the functionally complex relationships between predator guild members. PMID:23264692

  10. Audiovisual Integration of Speech in a Patient with Broca’s Aphasia

    DEFF Research Database (Denmark)

    Andersen, Tobias; Starrfelt, Randi

    2015-01-01

    's area is necessary for audiovisual integration of speech. Here we describe a patient with Broca's aphasia who experienced the McGurk illusion. This indicates that an intact Broca's area is not necessary for audiovisual integration of speech. The McGurk illusions this patient experienced were atypical......, which could be due to Broca's area having a more subtle role in audiovisual integration of speech. The McGurk illusions of a control subject with Wernicke's aphasia were, however, also atypical. This indicates that the atypical McGurk illusions were due to deficits in speech processing...

  11. An ALE meta-analysis on the audiovisual integration of speech signals.

    Science.gov (United States)

    Erickson, Laura C; Heeg, Elizabeth; Rauschecker, Josef P; Turkeltaub, Peter E

    2014-11-01

    The brain improves speech processing through the integration of audiovisual (AV) signals. Situations involving AV speech integration may be crudely dichotomized into those where auditory and visual inputs contain (1) equivalent, complementary signals (validating AV speech) or (2) inconsistent, different signals (conflicting AV speech). This simple framework may allow the systematic examination of broad commonalities and differences between AV neural processes engaged by various experimental paradigms frequently used to study AV speech integration. We conducted an activation likelihood estimation metaanalysis of 22 functional imaging studies comprising 33 experiments, 311 subjects, and 347 foci examining "conflicting" versus "validating" AV speech. Experimental paradigms included content congruency, timing synchrony, and perceptual measures, such as the McGurk effect or synchrony judgments, across AV speech stimulus types (sublexical to sentence). Colocalization of conflicting AV speech experiments revealed consistency across at least two contrast types (e.g., synchrony and congruency) in a network of dorsal stream regions in the frontal, parietal, and temporal lobes. There was consistency across all contrast types (synchrony, congruency, and percept) in the bilateral posterior superior/middle temporal cortex. Although fewer studies were available, validating AV speech experiments were localized to other regions, such as ventral stream visual areas in the occipital and inferior temporal cortex. These results suggest that while equivalent, complementary AV speech signals may evoke activity in regions related to the corroboration of sensory input, conflicting AV speech signals recruit widespread dorsal stream areas likely involved in the resolution of conflicting sensory signals. Copyright © 2014 Wiley Periodicals, Inc.

  12. Multisensory integration: the case of a time window of gesture-speech integration.

    Science.gov (United States)

    Obermeier, Christian; Gunter, Thomas C

    2015-02-01

    This experiment investigates the integration of gesture and speech from a multisensory perspective. In a disambiguation paradigm, participants were presented with short videos of an actress uttering sentences like "She was impressed by the BALL, because the GAME/DANCE...." The ambiguous noun (BALL) was accompanied by an iconic gesture fragment containing information to disambiguate the noun toward its dominant or subordinate meaning. We used four different temporal alignments between noun and gesture fragment: the identification point (IP) of the noun was either prior to (+120 msec), synchronous with (0 msec), or lagging behind the end of the gesture fragment (-200 and -600 msec). ERPs triggered to the IP of the noun showed significant differences for the integration of dominant and subordinate gesture fragments in the -200, 0, and +120 msec conditions. The outcome of this integration was revealed at the target words. These data suggest a time window for direct semantic gesture-speech integration ranging from at least -200 up to +120 msec. Although the -600 msec condition did not show any signs of direct integration at the homonym, significant disambiguation was found at the target word. An explorative analysis suggested that gesture information was directly integrated at the verb, indicating that there are multiple positions in a sentence where direct gesture-speech integration takes place. Ultimately, this would implicate that in natural communication, where a gesture lasts for some time, several aspects of that gesture will have their specific and possibly distinct impact on different positions in an utterance.

  13. Speech recognition by means of a three-integrated-circuit set

    Energy Technology Data Exchange (ETDEWEB)

    Zoicas, A.

    1983-11-03

    The author uses pattern recognition methods for detecting word boundaries, and monitors incoming speech at 12 millisecond intervals. Frequency is divided into eight bands and analysis is achieved in an analogue interface integrated circuit, a pipeline digital processor and a control integrated circuit. Applications are suggested, including speech input to personal computers. 3 references.

  14. Late development of cue integration is linked to sensory fusion in cortex.

    Science.gov (United States)

    Dekker, Tessa M; Ban, Hiroshi; van der Velde, Bauke; Sereno, Martin I; Welchman, Andrew E; Nardini, Marko

    2015-11-02

    Adults optimize perceptual judgements by integrating different types of sensory information [1, 2]. This engages specialized neural circuits that fuse signals from the same [3-5] or different [6] modalities. Whereas young children can use sensory cues independently, adult-like precision gains from cue combination only emerge around ages 10 to 11 years [7-9]. Why does it take so long to make best use of sensory information? Existing data cannot distinguish whether this (1) reflects surprisingly late changes in sensory processing (sensory integration mechanisms in the brain are still developing) or (2) depends on post-perceptual changes (integration in sensory cortex is adult-like, but higher-level decision processes do not access the information) [10]. We tested visual depth cue integration in the developing brain to distinguish these possibilities. We presented children aged 6-12 years with displays depicting depth from binocular disparity and relative motion and made measurements using psychophysics, retinotopic mapping, and pattern classification fMRI. Older children (>10.5 years) showed clear evidence for sensory fusion in V3B, a visual area thought to integrate depth cues in the adult brain [3-5]. By contrast, in younger children (develop. Copyright © 2015 The Authors. Published by Elsevier Ltd.. All rights reserved.

  15. Integration of visual and inertial cues in perceived heading of self-motion

    NARCIS (Netherlands)

    Winkel, K.N. de; Weesie, H.M.; Werkhoven, P.J.; Groen, E.L.

    2010-01-01

    In the present study, we investigated whether the perception of heading of linear self-motion can be explained by Maximum Likelihood Integration (MLI) of visual and non-visual sensory cues. MLI predicts smaller variance for multisensory judgments compared to unisensory judgments. Nine participants

  16. The Selective Cue Integration Framework: A Theory of Postidentification Witness Confidence Assessment

    Science.gov (United States)

    Charman, Steve D.; Carlucci, Marianna; Vallano, Jon; Gregory, Amy Hyman

    2010-01-01

    The current manuscript proposes a theory of how witnesses assess their confidence following a lineup identification, called the selective cue integration framework (SCIF). Drawing from past research on the postidentification feedback effect, the SCIF details a three-stage process of confidence assessment that is based largely on a…

  17. Audiovisual Speech Perception in Infancy: The Influence of Vowel Identity and Infants' Productive Abilities on Sensitivity to (Mis)Matches between Auditory and Visual Speech Cues

    Science.gov (United States)

    Altvater-Mackensen, Nicole; Mani, Nivedita; Grossmann, Tobias

    2016-01-01

    Recent studies suggest that infants' audiovisual speech perception is influenced by articulatory experience (Mugitani et al., 2008; Yeung & Werker, 2013). The current study extends these findings by testing if infants' emerging ability to produce native sounds in babbling impacts their audiovisual speech perception. We tested 44 6-month-olds…

  18. Effects of Audio-Visual Integration on the Detection of Masked Speech and Non-Speech Sounds

    Science.gov (United States)

    Eramudugolla, Ranmalee; Henderson, Rachel; Mattingley, Jason B.

    2011-01-01

    Integration of simultaneous auditory and visual information about an event can enhance our ability to detect that event. This is particularly evident in the perception of speech, where the articulatory gestures of the speaker's lips and face can significantly improve the listener's detection and identification of the message, especially when that…

  19. Effects of a brisk walk on blood pressure responses to the Stroop, a speech task and a smoking cue among temporarily abstinent smokers.

    Science.gov (United States)

    Taylor, Adrian; Katomeri, Magdalena

    2006-01-01

    A review and meta-analysis by Hamer et al. (2006) showed that a single session of exercise can attenuate post-exercise blood pressure (BP) responses to stress, but no studies examined the effects among smokers or with brisk walking. Healthy volunteers (n=60), averaging 28 years of age and smoking 15 cigarettes daily, abstained from smoking for 2 h before being randomly assigned to a 15-min brisk semi-self-paced walk or passive control condition. Subject characteristics, typical smoking cue-elicited cravings and BP were assessed at baseline. After each condition, BP was assessed before and after three psycho-social stressors were carried out: (1) computerised Stroop word-colour interference task, (2) speech task and (3) only handling a lit cigarette. A two-way mixed ANCOVA (controlling for baseline) revealed a significant overall interaction effect for time by condition for both systolic blood pressure (SBP) and diastolic blood pressure (DBP). Univariate ANCOVAs (to compare between-groups post-stressor BP, controlling for pre-stressor BP) revealed that exercise attenuated systolic BP and diastolic BP responses to the Stroop and speech tasks and SBP to the lit cigarette equivalent to an attenuated SBP and DBP of up to 3.8 mmHg. Post-exercise attenuation effects were moderated by resting blood pressure and self-reported smoking cue-elicited craving. Effects were strongest among those with higher blood pressure and smokers who reported typically stronger cravings when faced with smoking cues. Blood pressure responses to the lit cigarette were not associated with responses to the Stroop and speech task. A self-paced 15-min walk can reduce smokers' SBP and DBP responses to stress, of a magnitude similar on average to non-smokers.

  20. Integration of multiple cues allows threat-sensitive anti-intraguild predator responses in predatory mites

    Science.gov (United States)

    Walzer, Andreas; Schausberger, Peter

    2013-01-01

    Intraguild (IG) prey is commonly confronted with multiple IG predator species. However, the IG predation (IGP) risk for prey is not only dependent on the predator species, but also on inherent (intraspecific) characteristics of a given IG predator such as its life-stage, sex or gravidity and the associated prey needs. Thus, IG prey should have evolved the ability to integrate multiple IG predator cues, which should allow both inter- and intraspecific threat-sensitive anti-predator responses. Using a guild of plant-inhabiting predatory mites sharing spider mites as prey, we evaluated the effects of single and combined cues (eggs and/or chemical traces left by a predator female on the substrate) of the low risk IG predator Neoseiulus californicus and the high risk IG predator Amblyseius andersoni on time, distance and path shape parameters of the larval IG prey Phytoseiulus persimilis. IG prey discriminated between traces of the low and high risk IG predator, with and without additional presence of their eggs, indicating interspecific threat-sensitivity. The behavioural changes were manifest in distance moved, activity and path shape of IG prey. The cue combination of traces and eggs of the IG predators conveyed other information than each cue alone, allowing intraspecific threat-sensitive responses by IG prey apparent in changed velocities and distances moved. We argue that graded responses to single and combined IG predator cues are adaptive due to minimization of acceptance errors in IG prey decision making. PMID:23750040

  1. Audiovisual Integration of Speech in a Patient with Broca’s Aphasia

    Directory of Open Access Journals (Sweden)

    Tobias Søren Andersen

    2015-04-01

    Full Text Available Lesions to Broca’s area cause aphasia characterised by a severe impairment of the ability to speak, with comparatively intact speech perception. However, some studies have found effects on speech perception under adverse listening conditions, indicating that Broca’s area is also involved in speech perception. While these studies have focused on auditory speech perception other studies have shown that Broca’s area is activated by visual speech perception. Furthermore, one preliminary report found that a patient with Broca’s aphasia did not experience the McGurk illusion suggesting that an intact Broca’s area is necessary for audiovisual integration of speech. Here we describe a patient with Broca’s aphasia who experienced the McGurk illusion. This indicates that an intact Broca’s area is not necessary for audiovisual integration of speech. The McGurk illusions this patient experienced were atypical, which could be due to Broca’s area having a more subtle role in audiovisual integration of speech. The McGurk illusions of a control subject with Wernicke’s aphasia were, however, also atypical. This indicates that the atypical McGurk illusions were due to deficits in speech processing that are not specific to Broca’s aphasia.

  2. Effects of congruent and incongruent visual cues on speech perception and brain activity in cochlear implant users.

    Science.gov (United States)

    Song, Jae-Jin; Lee, Hyo-Jeong; Kang, Hyejin; Lee, Dong Soo; Chang, Sun O; Oh, Seung Ha

    2015-03-01

    While deafness-induced plasticity has been investigated in the visual and auditory domains, not much is known about language processing in audiovisual multimodal environments for patients with restored hearing via cochlear implant (CI) devices. Here, we examined the effect of agreeing or conflicting visual inputs on auditory processing in deaf patients equipped with degraded artificial hearing. Ten post-lingually deafened CI users with good performance, along with matched control subjects, underwent H 2 (15) O-positron emission tomography scans while carrying out a behavioral task requiring the extraction of speech information from unimodal auditory stimuli, bimodal audiovisual congruent stimuli, and incongruent stimuli. Regardless of congruency, the control subjects demonstrated activation of the auditory and visual sensory cortices, as well as the superior temporal sulcus, the classical multisensory integration area, indicating a bottom-up multisensory processing strategy. Compared to CI users, the control subjects exhibited activation of the right ventral premotor-supramarginal pathway. In contrast, CI users activated primarily the visual cortices more in the congruent audiovisual condition than in the null condition. In addition, compared to controls, CI users displayed an activation focus in the right amygdala for congruent audiovisual stimuli. The most notable difference between the two groups was an activation focus in the left inferior frontal gyrus in CI users confronted with incongruent audiovisual stimuli, suggesting top-down cognitive modulation for audiovisual conflict. Correlation analysis revealed that good speech performance was positively correlated with right amygdala activity for the congruent condition, but negatively correlated with bilateral visual cortices regardless of congruency. Taken together these results suggest that for multimodal inputs, cochlear implant users are more vision-reliant when processing congruent stimuli and are disturbed

  3. Integrating speech technology to meet crew station design requirements

    Science.gov (United States)

    Simpson, Carol A.; Ruth, John C.; Moore, Carolyn A.

    The last two years have seen improvements in speech generation and speech recognition technology that make speech I/O for crew station controls and displays viable for operational systems. These improvements include increased robustness of algorithm performance in high levels of background noise, increased vocabulary size, improved performance in the connected speech mode, and less speaker dependence. This improved capability makes possible far more sophisticated user interface design than was possible with earlier technology. Engineering, linguistic, and human factors design issues are discussed in the context of current voice I/O technology performance.

  4. Effectiveness of an Integrated Phonological Awareness Approach for Children with Childhood Apraxia of Speech (CAS)

    Science.gov (United States)

    McNeill, Brigid C.; Gillon, Gail T.; Dodd, Barbara

    2009-01-01

    This study investigated the effectiveness of an integrated phonological awareness approach for children with childhood apraxia of speech (CAS). Change in speech, phonological awareness, letter knowledge, word decoding, and spelling skills were examined. A controlled multiple single-subject design was employed. Twelve children aged 4-7 years with…

  5. Prediction and constraint in audiovisual speech perception.

    Science.gov (United States)

    Peelle, Jonathan E; Sommers, Mitchell S

    2015-07-01

    During face-to-face conversational speech listeners must efficiently process a rapid and complex stream of multisensory information. Visual speech can serve as a critical complement to auditory information because it provides cues to both the timing of the incoming acoustic signal (the amplitude envelope, influencing attention and perceptual sensitivity) and its content (place and manner of articulation, constraining lexical selection). Here we review behavioral and neurophysiological evidence regarding listeners' use of visual speech information. Multisensory integration of audiovisual speech cues improves recognition accuracy, particularly for speech in noise. Even when speech is intelligible based solely on auditory information, adding visual information may reduce the cognitive demands placed on listeners through increasing the precision of prediction. Electrophysiological studies demonstrate that oscillatory cortical entrainment to speech in auditory cortex is enhanced when visual speech is present, increasing sensitivity to important acoustic cues. Neuroimaging studies also suggest increased activity in auditory cortex when congruent visual information is available, but additionally emphasize the involvement of heteromodal regions of posterior superior temporal sulcus as playing a role in integrative processing. We interpret these findings in a framework of temporally-focused lexical competition in which visual speech information affects auditory processing to increase sensitivity to acoustic information through an early integration mechanism, and a late integration stage that incorporates specific information about a speaker's articulators to constrain the number of possible candidates in a spoken utterance. Ultimately it is words compatible with both auditory and visual information that most strongly determine successful speech perception during everyday listening. Thus, audiovisual speech perception is accomplished through multiple stages of integration

  6. Prediction and constraint in audiovisual speech perception

    Science.gov (United States)

    Peelle, Jonathan E.; Sommers, Mitchell S.

    2015-01-01

    During face-to-face conversational speech listeners must efficiently process a rapid and complex stream of multisensory information. Visual speech can serve as a critical complement to auditory information because it provides cues to both the timing of the incoming acoustic signal (the amplitude envelope, influencing attention and perceptual sensitivity) and its content (place and manner of articulation, constraining lexical selection). Here we review behavioral and neurophysiological evidence regarding listeners' use of visual speech information. Multisensory integration of audiovisual speech cues improves recognition accuracy, particularly for speech in noise. Even when speech is intelligible based solely on auditory information, adding visual information may reduce the cognitive demands placed on listeners through increasing precision of prediction. Electrophysiological studies demonstrate oscillatory cortical entrainment to speech in auditory cortex is enhanced when visual speech is present, increasing sensitivity to important acoustic cues. Neuroimaging studies also suggest increased activity in auditory cortex when congruent visual information is available, but additionally emphasize the involvement of heteromodal regions of posterior superior temporal sulcus as playing a role in integrative processing. We interpret these findings in a framework of temporally-focused lexical competition in which visual speech information affects auditory processing to increase sensitivity to auditory information through an early integration mechanism, and a late integration stage that incorporates specific information about a speaker's articulators to constrain the number of possible candidates in a spoken utterance. Ultimately it is words compatible with both auditory and visual information that most strongly determine successful speech perception during everyday listening. Thus, audiovisual speech perception is accomplished through multiple stages of integration, supported

  7. Speech Acquisition and Automatic Speech Recognition for Integrated Spacesuit Audio Systems

    Science.gov (United States)

    Huang, Yiteng; Chen, Jingdong; Chen, Shaoyan

    2010-01-01

    A voice-command human-machine interface system has been developed for spacesuit extravehicular activity (EVA) missions. A multichannel acoustic signal processing method has been created for distant speech acquisition in noisy and reverberant environments. This technology reduces noise by exploiting differences in the statistical nature of signal (i.e., speech) and noise that exists in the spatial and temporal domains. As a result, the automatic speech recognition (ASR) accuracy can be improved to the level at which crewmembers would find the speech interface useful. The developed speech human/machine interface will enable both crewmember usability and operational efficiency. It can enjoy a fast rate of data/text entry, small overall size, and can be lightweight. In addition, this design will free the hands and eyes of a suited crewmember. The system components and steps include beam forming/multi-channel noise reduction, single-channel noise reduction, speech feature extraction, feature transformation and normalization, feature compression, model adaption, ASR HMM (Hidden Markov Model) training, and ASR decoding. A state-of-the-art phoneme recognizer can obtain an accuracy rate of 65 percent when the training and testing data are free of noise. When it is used in spacesuits, the rate drops to about 33 percent. With the developed microphone array speech-processing technologies, the performance is improved and the phoneme recognition accuracy rate rises to 44 percent. The recognizer can be further improved by combining the microphone array and HMM model adaptation techniques and using speech samples collected from inside spacesuits. In addition, arithmetic complexity models for the major HMMbased ASR components were developed. They can help real-time ASR system designers select proper tasks when in the face of constraints in computational resources.

  8. A Longitudinal Assessment of Early Childhood Education with Integrated Speech Therapy for Children with Significant Language Impairment in Germany

    Science.gov (United States)

    Ullrich, Dieter; Ullrich, Katja; Marten, Magret

    2014-01-01

    Background: In Lower Saxony, Germany, pre-school children with language- and speech-deficits have the opportunity to access kindergartens with integrated language-/speech therapy prior to attending primary school, both regular or with integrated speech therapy. It is unknown whether these early childhood education treatments are helpful and…

  9. Real-Time Lane Detection on Suburban Streets Using Visual Cue Integration

    Directory of Open Access Journals (Sweden)

    Shehan Fernando

    2014-04-01

    Full Text Available The detection of lane boundaries on suburban streets using images obtained from video constitutes a challenging task. This is mainly due to the difficulties associated with estimating the complex geometric structure of lane boundaries, the quality of lane markings as a result of wear, occlusions by traffic, and shadows caused by road-side trees and structures. Most of the existing techniques for lane boundary detection employ a single visual cue and will only work under certain conditions and where there are clear lane markings. Also, better results are achieved when there are no other on-road objects present. This paper extends our previous work and discusses a novel lane boundary detection algorithm specifically addressing the abovementioned issues through the integration of two visual cues. The first visual cue is based on stripe-like features found on lane lines extracted using a two-dimensional symmetric Gabor filter. The second visual cue is based on a texture characteristic determined using the entropy measure of the predefined neighbourhood around a lane boundary line. The visual cues are then integrated using a rule-based classifier which incorporates a modified sequential covering algorithm to improve robustness. To separate lane boundary lines from other similar features, a road mask is generated using road chromaticity values estimated from CIE L*a*b* colour transformation. Extraneous points around lane boundary lines are then removed by an outlier removal procedure based on studentized residuals. The lane boundary lines are then modelled with Bezier spline curves. To validate the algorithm, extensive experimental evaluation was carried out on suburban streets and the results are presented.

  10. Floral pathway integrator gene expression mediates gradual transmission of environmental and endogenous cues to flowering time.

    Science.gov (United States)

    van Dijk, Aalt D J; Molenaar, Jaap

    2017-01-01

    The appropriate timing of flowering is crucial for the reproductive success of plants. Hence, intricate genetic networks integrate various environmental and endogenous cues such as temperature or hormonal statues. These signals integrate into a network of floral pathway integrator genes. At a quantitative level, it is currently unclear how the impact of genetic variation in signaling pathways on flowering time is mediated by floral pathway integrator genes. Here, using datasets available from literature, we connect Arabidopsis thaliana flowering time in genetic backgrounds varying in upstream signalling components with the expression levels of floral pathway integrator genes in these genetic backgrounds. Our modelling results indicate that flowering time depends in a quite linear way on expression levels of floral pathway integrator genes. This gradual, proportional response of flowering time to upstream changes enables a gradual adaptation to changing environmental factors such as temperature and light.

  11. Floral pathway integrator gene expression mediates gradual transmission of environmental and endogenous cues to flowering time

    Directory of Open Access Journals (Sweden)

    Aalt D.J. van Dijk

    2017-04-01

    Full Text Available The appropriate timing of flowering is crucial for the reproductive success of plants. Hence, intricate genetic networks integrate various environmental and endogenous cues such as temperature or hormonal statues. These signals integrate into a network of floral pathway integrator genes. At a quantitative level, it is currently unclear how the impact of genetic variation in signaling pathways on flowering time is mediated by floral pathway integrator genes. Here, using datasets available from literature, we connect Arabidopsis thaliana flowering time in genetic backgrounds varying in upstream signalling components with the expression levels of floral pathway integrator genes in these genetic backgrounds. Our modelling results indicate that flowering time depends in a quite linear way on expression levels of floral pathway integrator genes. This gradual, proportional response of flowering time to upstream changes enables a gradual adaptation to changing environmental factors such as temperature and light.

  12. Evaluating the effects of delivering integrated kinesthetic and tactile cues to individuals with unilateral hemiparetic stroke during overground walking.

    Science.gov (United States)

    Afzal, Muhammad Raheel; Pyo, Sanghun; Oh, Min-Kyun; Park, Young Sook; Yoon, Jungwon

    2018-04-16

    Integration of kinesthetic and tactile cues for application to post-stroke gait rehabilitation is a novel concept which needs to be explored. The combined provision of haptic cues may result in collective improvement of gait parameters such as symmetry, balance and muscle activation patterns. Our proposed integrated cue system can offer a cost-effective and voluntary gait training experience for rehabilitation of subjects with unilateral hemiparetic stroke. Ten post-stroke ambulatory subjects participated in a 10 m walking trial while utilizing the haptic cues (either alone or integrated application), at their preferred and increased gait speeds. In the system a haptic cane device (HCD) provided kinesthetic perception and a vibrotactile feedback device (VFD) provided tactile cue on the paretic leg for gait modification. Balance, gait symmetry and muscle activity were analyzed to identify the benefits of utilizing the proposed system. When using kinesthetic cues, either alone or integrated with a tactile cue, an increase in the percentage of non-paretic peak activity in the paretic muscles was observed at the preferred gait speed (vastus medialis obliquus: p kinesthetic cue, at their preferred gait speed (p <  0.001, partial η 2  = 0.702). When combining haptic cues, the subjects walked at their preferred gait speed with increased temporal stance symmetry and paretic muscle activity affecting their balance. Similar improvements were observed at higher gait speeds. The efficacy of the proposed system is influenced by gait speed. Improvements were observed at a 20% increased gait speed, whereas, a plateau effect was observed at a 40% increased gait speed. These results imply that integration of haptic cues may benefit post-stroke gait rehabilitation by inducing simultaneous improvements in gait symmetry and muscle activity.

  13. How is the McGurk effect modulated by Cued Speech in deaf and hearing adults?

    OpenAIRE

    Bayard, Clémence; Colin, Cécile; Leybaert, Jacqueline

    2014-01-01

    Speech perception for both hearing and deaf people involves an integrative process between auditory and lip-reading information. In order to disambiguate information from lips, manual cues from Cued Speech may be added. Cued Speech (CS) is a system of manual aids developed to help deaf people to clearly and completely understand speech visually (Cornett, 1967). Within this system, both labial and manual information, as lone input sources, remain ambiguous. Perceivers, therefore, have to combi...

  14. Integrating cues of social interest and voice pitch in men's preferences for women's voices.

    Science.gov (United States)

    Jones, Benedict C; Feinberg, David R; Debruine, Lisa M; Little, Anthony C; Vukovic, Jovana

    2008-04-23

    Most previous studies of vocal attractiveness have focused on preferences for physical characteristics of voices such as pitch. Here we examine the content of vocalizations in interaction with such physical traits, finding that vocal cues of social interest modulate the strength of men's preferences for raised pitch in women's voices. Men showed stronger preferences for raised pitch when judging the voices of women who appeared interested in the listener than when judging the voices of women who appeared relatively disinterested in the listener. These findings show that voice preferences are not determined solely by physical properties of voices and that men integrate information about voice pitch and the degree of social interest expressed by women when forming voice preferences. Women's preferences for raised pitch in women's voices were not modulated by cues of social interest, suggesting that the integration of cues of social interest and voice pitch when men judge the attractiveness of women's voices may reflect adaptations that promote efficient allocation of men's mating effort.

  15. Common variation in the autism risk gene CNTNAP2, brain structural connectivity and multisensory speech integration.

    Science.gov (United States)

    Ross, Lars A; Del Bene, Victor A; Molholm, Sophie; Jae Woo, Young; Andrade, Gizely N; Abrahams, Brett S; Foxe, John J

    2017-11-01

    Three lines of evidence motivated this study. 1) CNTNAP2 variation is associated with autism risk and speech-language development. 2) CNTNAP2 variations are associated with differences in white matter (WM) tracts comprising the speech-language circuitry. 3) Children with autism show impairment in multisensory speech perception. Here, we asked whether an autism risk-associated CNTNAP2 single nucleotide polymorphism in neurotypical adults was associated with multisensory speech perception performance, and whether such a genotype-phenotype association was mediated through white matter tract integrity in speech-language circuitry. Risk genotype at rs7794745 was associated with decreased benefit from visual speech and lower fractional anisotropy (FA) in several WM tracts (right precentral gyrus, left anterior corona radiata, right retrolenticular internal capsule). These structural connectivity differences were found to mediate the effect of genotype on audiovisual speech perception, shedding light on possible pathogenic pathways in autism and biological sources of inter-individual variation in audiovisual speech processing in neurotypicals. Copyright © 2017 Elsevier Inc. All rights reserved.

  16. Validating a Method to Assess Lipreading, Audiovisual Gain, and Integration During Speech Reception With Cochlear-Implanted and Normal-Hearing Subjects Using a Talking Head.

    Science.gov (United States)

    Schreitmüller, Stefan; Frenken, Miriam; Bentz, Lüder; Ortmann, Magdalene; Walger, Martin; Meister, Hartmut

    benefitted by AV over unimodal speech as indexed by calculations of the measures visual enhancement and auditory enhancement (each p < 0.001). Both groups efficiently integrated complementary auditory and visual speech features as indexed by calculations of the measure integration enhancement (each p < 0.005). Given the good agreement between results from literature and the outcome of supplementing an existing validated auditory test with synthetic visual cues, the introduced method can be considered an interesting candidate for clinical and scientific applications to assess measures important for AV SR in a standardized manner. This could be beneficial for optimizing the diagnosis and treatment of individual listening and communication disorders, such as cochlear implantation.

  17. Functional neuroanatomy of gesture-speech integration in children varies with individual differences in gesture processing.

    Science.gov (United States)

    Demir-Lira, Özlem Ece; Asaridou, Salomi S; Raja Beharelle, Anjali; Holt, Anna E; Goldin-Meadow, Susan; Small, Steven L

    2018-03-08

    Gesture is an integral part of children's communicative repertoire. However, little is known about the neurobiology of speech and gesture integration in the developing brain. We investigated how 8- to 10-year-old children processed gesture that was essential to understanding a set of narratives. We asked whether the functional neuroanatomy of gesture-speech integration varies as a function of (1) the content of speech, and/or (2) individual differences in how gesture is processed. When gestures provided missing information not present in the speech (i.e., disambiguating gesture; e.g., "pet" + flapping palms = bird), the presence of gesture led to increased activity in inferior frontal gyri, the right middle temporal gyrus, and the left superior temporal gyrus, compared to when gesture provided redundant information (i.e., reinforcing gesture; e.g., "bird" + flapping palms = bird). This pattern of activation was found only in children who were able to successfully integrate gesture and speech behaviorally, as indicated by their performance on post-test story comprehension questions. Children who did not glean meaning from gesture did not show differential activation across the two conditions. Our results suggest that the brain activation pattern for gesture-speech integration in children overlaps with-but is broader than-the pattern in adults performing the same task. Overall, our results provide a possible neurobiological mechanism that could underlie children's increasing ability to integrate gesture and speech over childhood, and account for individual differences in that integration. © 2018 John Wiley & Sons Ltd.

  18. Man-system interface based on automatic speech recognition: integration to a virtual control desk

    Energy Technology Data Exchange (ETDEWEB)

    Jorge, Carlos Alexandre F.; Mol, Antonio Carlos A.; Pereira, Claudio M.N.A.; Aghina, Mauricio Alves C., E-mail: calexandre@ien.gov.b, E-mail: mol@ien.gov.b, E-mail: cmnap@ien.gov.b, E-mail: mag@ien.gov.b [Instituto de Engenharia Nuclear (IEN/CNEN-RJ), Rio de Janeiro, RJ (Brazil); Nomiya, Diogo V., E-mail: diogonomiya@gmail.co [Universidade Federal do Rio de Janeiro (UFRJ), RJ (Brazil)

    2009-07-01

    This work reports the implementation of a man-system interface based on automatic speech recognition, and its integration to a virtual nuclear power plant control desk. The later is aimed to reproduce a real control desk using virtual reality technology, for operator training and ergonomic evaluation purpose. An automatic speech recognition system was developed to serve as a new interface with users, substituting computer keyboard and mouse. They can operate this virtual control desk in front of a computer monitor or a projection screen through spoken commands. The automatic speech recognition interface developed is based on a well-known signal processing technique named cepstral analysis, and on artificial neural networks. The speech recognition interface is described, along with its integration with the virtual control desk, and results are presented. (author)

  19. Man-system interface based on automatic speech recognition: integration to a virtual control desk

    International Nuclear Information System (INIS)

    Jorge, Carlos Alexandre F.; Mol, Antonio Carlos A.; Pereira, Claudio M.N.A.; Aghina, Mauricio Alves C.; Nomiya, Diogo V.

    2009-01-01

    This work reports the implementation of a man-system interface based on automatic speech recognition, and its integration to a virtual nuclear power plant control desk. The later is aimed to reproduce a real control desk using virtual reality technology, for operator training and ergonomic evaluation purpose. An automatic speech recognition system was developed to serve as a new interface with users, substituting computer keyboard and mouse. They can operate this virtual control desk in front of a computer monitor or a projection screen through spoken commands. The automatic speech recognition interface developed is based on a well-known signal processing technique named cepstral analysis, and on artificial neural networks. The speech recognition interface is described, along with its integration with the virtual control desk, and results are presented. (author)

  20. Interaction and Representational Integration: Evidence from Speech Errors

    Science.gov (United States)

    Goldrick, Matthew; Baker, H. Ross; Murphy, Amanda; Baese-Berk, Melissa

    2011-01-01

    We examine the mechanisms that support interaction between lexical, phonological and phonetic processes during language production. Studies of the phonetics of speech errors have provided evidence that partially activated lexical and phonological representations influence phonetic processing. We examine how these interactive effects are modulated…

  1. Audiovisual Integration in Children Listening to Spectrally Degraded Speech

    Science.gov (United States)

    Maidment, David W.; Kang, Hi Jee; Stewart, Hannah J.; Amitay, Sygal

    2015-01-01

    Purpose: The study explored whether visual information improves speech identification in typically developing children with normal hearing when the auditory signal is spectrally degraded. Method: Children (n = 69) and adults (n = 15) were presented with noise-vocoded sentences from the Children's Co-ordinate Response Measure (Rosen, 2011) in…

  2. Electrophysiological evidence for a self-processing advantage during audiovisual speech integration.

    Science.gov (United States)

    Treille, Avril; Vilain, Coriandre; Kandel, Sonia; Sato, Marc

    2017-09-01

    Previous electrophysiological studies have provided strong evidence for early multisensory integrative mechanisms during audiovisual speech perception. From these studies, one unanswered issue is whether hearing our own voice and seeing our own articulatory gestures facilitate speech perception, possibly through a better processing and integration of sensory inputs with our own sensory-motor knowledge. The present EEG study examined the impact of self-knowledge during the perception of auditory (A), visual (V) and audiovisual (AV) speech stimuli that were previously recorded from the participant or from a speaker he/she had never met. Audiovisual interactions were estimated by comparing N1 and P2 auditory evoked potentials during the bimodal condition (AV) with the sum of those observed in the unimodal conditions (A + V). In line with previous EEG studies, our results revealed an amplitude decrease of P2 auditory evoked potentials in AV compared to A + V conditions. Crucially, a temporal facilitation of N1 responses was observed during the visual perception of self speech movements compared to those of another speaker. This facilitation was negatively correlated with the saliency of visual stimuli. These results provide evidence for a temporal facilitation of the integration of auditory and visual speech signals when the visual situation involves our own speech gestures.

  3. Interaction and representational integration: Evidence from speech errors

    OpenAIRE

    Goldrick, Matthew; Baker, H. Ross; Murphy, Amanda; Baese-Berk, Melissa

    2011-01-01

    We examine the mechanisms that support interaction between lexical, phonological and phonetic processes during language production. Studies of the phonetics of speech errors have provided evidence that partially activated lexical and phonological representations influence phonetic processing. We examine how these interactive effects are modulated by lexical frequency. Previous research has demonstrated that during lexical access, the processing of high frequency words is facilitated; in contr...

  4. Crossmodal deficit in dyslexic children: practice affects the neural timing of letter-speech sound integration

    Directory of Open Access Journals (Sweden)

    Gojko eŽarić

    2015-06-01

    Full Text Available A failure to build solid letter-speech sound associations may contribute to reading impairments in developmental dyslexia. Whether this reduced neural integration of letters and speech sounds changes over time within individual children and how this relates to behavioral gains in reading skills remains unknown. In this research, we examined changes in event-related potential (ERP measures of letter-speech sound integration over a 6-month period during which 9-year-old dyslexic readers (n=17 followed a training in letter-speech sound coupling next to their regular reading curriculum. We presented the Dutch spoken vowels /a/ and /o/ as standard and deviant stimuli in one auditory and two audiovisual oddball conditions. In one audiovisual condition (AV0, the letter ‘a’ was presented simultaneously with the vowels, while in the other (AV200 it was preceding vowel onset for 200 ms. Prior to the training (T1, dyslexic readers showed the expected pattern of typical auditory mismatch responses, together with the absence of letter-speech sound effects in a late negativity (LN window. After the training (T2, our results showed earlier (and enhanced crossmodal effects in the LN window. Most interestingly, earlier LN latency at T2 was significantly related to higher behavioral accuracy in letter-speech sound coupling. On a more general level, the timing of the earlier mismatch negativity (MMN in the simultaneous condition (AV0 measured at T1, significantly related to reading fluency at both T1 and T2 as well as with reading gains. Our findings suggest that the reduced neural integration of letters and speech sounds in dyslexic children may show moderate improvement with reading instruction and training and that behavioral improvements relate especially to individual differences in the timing of this neural integration.

  5. Neural dynamics of audiovisual speech integration under variable listening conditions: an individual participant analysis.

    Science.gov (United States)

    Altieri, Nicholas; Wenger, Michael J

    2013-01-01

    Speech perception engages both auditory and visual modalities. Limitations of traditional accuracy-only approaches in the investigation of audiovisual speech perception have motivated the use of new methodologies. In an audiovisual speech identification task, we utilized capacity (Townsend and Nozawa, 1995), a dynamic measure of efficiency, to quantify audiovisual integration. Capacity was used to compare RT distributions from audiovisual trials to RT distributions from auditory-only and visual-only trials across three listening conditions: clear auditory signal, S/N ratio of -12 dB, and S/N ratio of -18 dB. The purpose was to obtain EEG recordings in conjunction with capacity to investigate how a late ERP co-varies with integration efficiency. Results showed efficient audiovisual integration for low auditory S/N ratios, but inefficient audiovisual integration when the auditory signal was clear. The ERP analyses showed evidence for greater audiovisual amplitude compared to the unisensory signals for lower auditory S/N ratios (higher capacity/efficiency) compared to the high S/N ratio (low capacity/inefficient integration). The data are consistent with an interactive framework of integration, where auditory recognition is influenced by speech-reading as a function of signal clarity.

  6. Transcranial Magnetic Stimulation over Left Inferior Frontal and Posterior Temporal Cortex Disrupts Gesture-Speech Integration.

    Science.gov (United States)

    Zhao, Wanying; Riggs, Kevin; Schindler, Igor; Holle, Henning

    2018-02-21

    Language and action naturally occur together in the form of cospeech gestures, and there is now convincing evidence that listeners display a strong tendency to integrate semantic information from both domains during comprehension. A contentious question, however, has been which brain areas are causally involved in this integration process. In previous neuroimaging studies, left inferior frontal gyrus (IFG) and posterior middle temporal gyrus (pMTG) have emerged as candidate areas; however, it is currently not clear whether these areas are causally or merely epiphenomenally involved in gesture-speech integration. In the present series of experiments, we directly tested for a potential critical role of IFG and pMTG by observing the effect of disrupting activity in these areas using transcranial magnetic stimulation in a mixed gender sample of healthy human volunteers. The outcome measure was performance on a Stroop-like gesture task (Kelly et al., 2010a), which provides a behavioral index of gesture-speech integration. Our results provide clear evidence that disrupting activity in IFG and pMTG selectively impairs gesture-speech integration, suggesting that both areas are causally involved in the process. These findings are consistent with the idea that these areas play a joint role in gesture-speech integration, with IFG regulating strategic semantic access via top-down signals acting upon temporal storage areas. SIGNIFICANCE STATEMENT Previous neuroimaging studies suggest an involvement of inferior frontal gyrus and posterior middle temporal gyrus in gesture-speech integration, but findings have been mixed and due to methodological constraints did not allow inferences of causality. By adopting a virtual lesion approach involving transcranial magnetic stimulation, the present study provides clear evidence that both areas are causally involved in combining semantic information arising from gesture and speech. These findings support the view that, rather than being

  7. Integrating Information from Speech and Physiological Signals to Achieve Emotional Sensitivity

    DEFF Research Database (Denmark)

    Kim, Jonghwa; André, Elisabeth; Rehm, Matthias

    2005-01-01

    Recently, there has been a significant amount of work on the recognition of emotions from speech and biosignals. Most approaches to emotion recognition so far concentrate on a single modality and do not take advantage of the fact that an integrated multimodal analysis may help to resolve...

  8. Speech Enhancement of Mobile Devices Based on the Integration of a Dual Microphone Array and a Background Noise Elimination Algorithm.

    Science.gov (United States)

    Chen, Yung-Yue

    2018-05-08

    Mobile devices are often used in our daily lives for the purposes of speech and communication. The speech quality of mobile devices is always degraded due to the environmental noises surrounding mobile device users. Regretfully, an effective background noise reduction solution cannot easily be developed for this speech enhancement problem. Due to these depicted reasons, a methodology is systematically proposed to eliminate the effects of background noises for the speech communication of mobile devices. This methodology integrates a dual microphone array with a background noise elimination algorithm. The proposed background noise elimination algorithm includes a whitening process, a speech modelling method and an H ₂ estimator. Due to the adoption of the dual microphone array, a low-cost design can be obtained for the speech enhancement of mobile devices. Practical tests have proven that this proposed method is immune to random background noises, and noiseless speech can be obtained after executing this denoise process.

  9. Speech Enhancement of Mobile Devices Based on the Integration of a Dual Microphone Array and a Background Noise Elimination Algorithm

    Directory of Open Access Journals (Sweden)

    Yung-Yue Chen

    2018-05-01

    Full Text Available Mobile devices are often used in our daily lives for the purposes of speech and communication. The speech quality of mobile devices is always degraded due to the environmental noises surrounding mobile device users. Regretfully, an effective background noise reduction solution cannot easily be developed for this speech enhancement problem. Due to these depicted reasons, a methodology is systematically proposed to eliminate the effects of background noises for the speech communication of mobile devices. This methodology integrates a dual microphone array with a background noise elimination algorithm. The proposed background noise elimination algorithm includes a whitening process, a speech modelling method and an H2 estimator. Due to the adoption of the dual microphone array, a low-cost design can be obtained for the speech enhancement of mobile devices. Practical tests have proven that this proposed method is immune to random background noises, and noiseless speech can be obtained after executing this denoise process.

  10. The early maximum likelihood estimation model of audiovisual integration in speech perception

    DEFF Research Database (Denmark)

    Andersen, Tobias

    2015-01-01

    integration to speech perception along with three model variations. In early MLE, integration is based on a continuous internal representation before categorization, which can make the model more parsimonious by imposing constraints that reflect experimental designs. The study also shows that cross......Speech perception is facilitated by seeing the articulatory mouth movements of the talker. This is due to perceptual audiovisual integration, which also causes the McGurk−MacDonald illusion, and for which a comprehensive computational account is still lacking. Decades of research have largely......-validation can evaluate models of audiovisual integration based on typical data sets taking both goodness-of-fit and model flexibility into account. All models were tested on a published data set previously used for testing the FLMP. Cross-validation favored the early MLE while more conventional error measures...

  11. Bayesian integration of position and orientation cues in perception of biological and non-biological dynamic forms

    Directory of Open Access Journals (Sweden)

    Steven Matthew Thurman

    2014-02-01

    Full Text Available Visual form analysis is fundamental to shape perception and likely plays a central role in perception of more complex dynamic shapes, such as moving objects or biological motion. Two primary form-based cues serve to represent the overall shape of an object: the spatial position and the orientation of locations along the boundary of the object. However, it is unclear how the visual system integrates these two sources of information in dynamic form analysis, and in particular how the brain resolves ambiguities due to sensory uncertainty and/or cue conflict. In the current study, we created animations of sparsely-sampled dynamic objects (human walkers or rotating squares comprised of oriented Gabor patches in which orientation could either coincide or conflict with information provided by position cues. When the cues were incongruent, we found a characteristic trade-off between position and orientation information whereby position cues increasingly dominated perception as the relative uncertainty of orientation increased and vice versa. Furthermore, we found no evidence for differences in the visual processing of biological and non-biological objects, casting doubt on the claim that biological motion may be specialized in the human brain, at least in specific terms of form analysis. To explain these behavioral results quantitatively, we adopt a probabilistic template-matching model that uses Bayesian inference within local modules to estimate object shape separately from either spatial position or orientation signals. The outputs of the two modules are integrated with weights that reflect individual estimates of subjective cue reliability, and integrated over time to produce a decision about the perceived dynamics of the input data. Results of this model provided a close fit to the behavioral data, suggesting a mechanism in the human visual system that approximates rational Bayesian inference to integrate position and orientation signals in dynamic

  12. Brain responses and looking behaviour during audiovisual speech integration in infants predict auditory speech comprehension in the second year of life.

    Directory of Open Access Journals (Sweden)

    Elena V Kushnerenko

    2013-07-01

    Full Text Available The use of visual cues during the processing of audiovisual speech is known to be less efficient in children and adults with language difficulties and difficulties are known to be more prevalent in children from low-income populations. In the present study, we followed an economically diverse group of thirty-seven infants longitudinally from 6-9 months to 14-16 months of age. We used eye-tracking to examine whether individual differences in visual attention during audiovisual processing of speech in 6 to 9 month old infants, particularly when processing congruent and incongruent auditory and visual speech cues, might be indicative of their later language development. Twenty-two of these 6-9 month old infants also participated in an event-related potential (ERP audiovisual task within the same experimental session. Language development was then followed-up at the age of 14-16 months, using two measures of language development, the Preschool Language Scale (PLS and the Oxford Communicative Development Inventory (CDI. The results show that those infants who were less efficient in auditory speech processing at the age of 6-9 months had lower receptive language scores at 14-16 months. A correlational analysis revealed that the pattern of face scanning and ERP responses to audio-visually incongruent stimuli at 6-9 months were both significantly associated with language development at 14-16 months. These findings add to the understanding of individual differences in neural signatures of audiovisual processing and associated looking behaviour in infants.

  13. Causal inference of asynchronous audiovisual speech

    Directory of Open Access Journals (Sweden)

    John F Magnotti

    2013-11-01

    Full Text Available During speech perception, humans integrate auditory information from the voice with visual information from the face. This multisensory integration increases perceptual precision, but only if the two cues come from the same talker; this requirement has been largely ignored by current models of speech perception. We describe a generative model of multisensory speech perception that includes this critical step of determining the likelihood that the voice and face information have a common cause. A key feature of the model is that it is based on a principled analysis of how an observer should solve this causal inference problem using the asynchrony between two cues and the reliability of the cues. This allows the model to make predictions abut the behavior of subjects performing a synchrony judgment task, predictive power that does not exist in other approaches, such as post hoc fitting of Gaussian curves to behavioral data. We tested the model predictions against the performance of 37 subjects performing a synchrony judgment task viewing audiovisual speech under a variety of manipulations, including varying asynchronies, intelligibility, and visual cue reliability. The causal inference model outperformed the Gaussian model across two experiments, providing a better fit to the behavioral data with fewer parameters. Because the causal inference model is derived from a principled understanding of the task, model parameters are directly interpretable in terms of stimulus and subject properties.

  14. What's in the input? Frequent frames in child-directed speech offer distributional cues to grammatical categories in Spanish and English.

    Science.gov (United States)

    Weisleder, Adriana; Waxman, Sandra R

    2010-11-01

    Recent analyses have revealed that child-directed speech contains distributional regularities that could, in principle, support young children's discovery of distinct grammatical categories (noun, verb, adjective). In particular, a distributional unit known as the frequent frame appears to be especially informative (Mintz, 2003). However, analyses have focused almost exclusively on the distributional information available in English. Because languages differ considerably in how the grammatical forms are marked within utterances, the scarcity of cross-linguistic evidence represents an unfortunate gap. We therefore advance the developmental evidence by analyzing the distributional information available in frequent frames across two languages (Spanish and English), across sentence positions (phrase medial and phrase final), and across grammatical forms (noun, verb, adjective). We selected six parent-child corpora from the CHILDES database (three English; three Spanish), and analyzed the input when children were aged 2 ; 6 or younger. In each language, frequent frames did indeed offer systematic cues to grammatical category assignment. We also identify differences in the accuracy of these frames across languages, sentences positions and grammatical classes.

  15. Integrating Automatic Speech Recognition and Machine Translation for Better Translation Outputs

    DEFF Research Database (Denmark)

    Liyanapathirana, Jeevanthi

    translations, combining machine translation with computer assisted translation has drawn attention in current research. This combines two prospects: the opportunity of ensuring high quality translation along with a significant performance gain. Automatic Speech Recognition (ASR) is another important area......, which caters important functionalities in language processing and natural language understanding tasks. In this work we integrate automatic speech recognition and machine translation in parallel. We aim to avoid manual typing of possible translations as dictating the translation would take less time...... to the n-best list rescoring, we also use word graphs with the expectation of arriving at a tighter integration of ASR and MT models. Integration methods include constraining ASR models using language and translation models of MT, and vice versa. We currently develop and experiment different methods...

  16. An analysis of machine translation and speech synthesis in speech-to-speech translation system

    OpenAIRE

    Hashimoto, K.; Yamagishi, J.; Byrne, W.; King, S.; Tokuda, K.

    2011-01-01

    This paper provides an analysis of the impacts of machine translation and speech synthesis on speech-to-speech translation systems. The speech-to-speech translation system consists of three components: speech recognition, machine translation and speech synthesis. Many techniques for integration of speech recognition and machine translation have been proposed. However, speech synthesis has not yet been considered. Therefore, in this paper, we focus on machine translation and speech synthesis, ...

  17. What Iconic Gesture Fragments Reveal about Gesture-Speech Integration: When Synchrony Is Lost, Memory Can Help

    Science.gov (United States)

    Obermeier, Christian; Holle, Henning; Gunter, Thomas C.

    2011-01-01

    The present series of experiments explores several issues related to gesture-speech integration and synchrony during sentence processing. To be able to more precisely manipulate gesture-speech synchrony, we used gesture fragments instead of complete gestures, thereby avoiding the usual long temporal overlap of gestures with their coexpressive…

  18. Sensory integration dysfunction affects efficacy of speech therapy on children with functional articulation disorders

    Directory of Open Access Journals (Sweden)

    Tung LC

    2013-01-01

    Full Text Available Li-Chen Tung,1,# Chin-Kai Lin,2,# Ching-Lin Hsieh,3,4 Ching-Chi Chen,1 Chin-Tsan Huang,1 Chun-Hou Wang5,6 1Department of Physical Medicine and Rehabilitation, Chi Mei Medical Center, Tainan, 2Program of Early Intervention, Department of Early Childhood Education, National Taichung University of Education, Taichung, 3School of Occupational Therapy, College of Medicine, National Taiwan University, Taipei, 4Department of Physical Medicine and Rehabilitation, National Taiwan University Hospital, Taipei, 5School of Physical Therapy, College of Medical Science and Technology, Chung Shan Medical University, Taichung, 6Physical Therapy Room, Chung Shan Medical University Hospital, Taichung, Taiwan#These authors contributed equally Background: Articulation disorders in young children are due to defects occurring at a certain stage in sensory and motor development. Some children with functional articulation disorders may also have sensory integration dysfunction (SID. We hypothesized that speech therapy would be less efficacious in children with SID than in those without SID. Hence, the purpose of this study was to compare the efficacy of speech therapy in two groups of children with functional articulation disorders: those without and those with SID.Method: A total of 30 young children with functional articulation disorders were divided into two groups, the no-SID group (15 children and the SID group (15 children. The number of pronunciation mistakes was evaluated before and after speech therapy.Results: There were no statistically significant differences in age, sex, sibling order, education of parents, and pretest number of mistakes in pronunciation between the two groups (P > 0.05. The mean and standard deviation in the pre- and posttest number of mistakes in pronunciation were 10.5 ± 3.2 and 3.3 ± 3.3 in the no-SID group, and 10.1 ± 2.9 and 6.9 ± 3.5 in the SID group, respectively. Results showed great changes after speech therapy treatment (F

  19. Perceptual stimulus-A Bayesian-based integration of multi-visual-cue approach and its application

    Institute of Scientific and Technical Information of China (English)

    XUE JianRu; ZHENG NanNing; ZHONG XiaoPin; PING LinJiang

    2008-01-01

    With the view that visual cue could be taken as a kind of stimulus, the study of the mechanism in the visual perception process by using visual cues in their probabilistic representation eventually leads to a class of statistical integration of multiple visual cues (IMVC) methods which have been applied widely in perceptual grouping, video analysis, and other basic problems in computer vision. In this paper, a survey on the basic ideas and recent advances of IMVC methods is presented, and much focus is on the models and algorithms of IMVC for video analysis within the framework of Bayesian estimation. Furthermore, two typical problems in video analysis, robust visual tracking and "switching problem" in multi-target tracking (MTT) are taken as test beds to verify a series of Bayesian-based IMVC methods proposed by the authors. Furthermore, the relations between the statistical IMVC and the visual per-ception process, as well as potential future research work for IMVC, are discussed.

  20. Integration of Distinct Objects in Visual Working Memory Depends on Strong Objecthood Cues Even for Different-Dimension Conjunctions.

    Science.gov (United States)

    Balaban, Halely; Luria, Roy

    2016-05-01

    What makes an integrated object in visual working memory (WM)? Past evidence suggested that WM holds all features of multidimensional objects together, but struggles to integrate color-color conjunctions. This difficulty was previously attributed to a challenge in same-dimension integration, but here we argue that it arises from the integration of 2 distinct objects. To test this, we examined the integration of distinct different-dimension features (a colored square and a tilted bar). We monitored the contralateral delay activity, an event-related potential component sensitive to the number of objects in WM. The results indicated that color and orientation belonging to distinct objects in a shared location were not integrated in WM (Experiment 1), even following a common fate Gestalt cue (Experiment 2). These conjunctions were better integrated in a less demanding task (Experiment 3), and in the original WM task, but with a less individuating version of the original stimuli (Experiment 4). Our results identify the critical factor in WM integration at same- versus separate-objects, rather than at same- versus different-dimensions. Compared with the perfect integration of an object's features, the integration of several objects is demanding, and depends on an interaction between the grouping cues and task demands, among other factors. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  1. Comparing the influence of spectro-temporal integration in computational speech segregation

    DEFF Research Database (Denmark)

    Bentsen, Thomas; May, Tobias; Kressner, Abigail Anne

    2016-01-01

    The goal of computational speech segregation systems is to automatically segregate a target speaker from interfering maskers. Typically, these systems include a feature extraction stage in the front-end and a classification stage in the back-end. A spectrotemporal integration strategy can...... be applied in either the frontend, using the so-called delta features, or in the back-end, using a second classifier that exploits the posterior probability of speech from the first classifier across a spectro-temporal window. This study systematically analyzes the influence of such stages on segregation...... metric that comprehensively predicts computational segregation performance and correlates well with intelligibility. The outcome of this study could help to identify the most effective spectro-temporal integration strategy for computational segregation systems....

  2. Holistic integration of gaze cues in visual face and body perception: Evidence from the composite design.

    Science.gov (United States)

    Vrancken, Leia; Germeys, Filip; Verfaillie, Karl

    2017-01-01

    A considerable amount of research on identity recognition and emotion identification with the composite design points to the holistic processing of these aspects in faces and bodies. In this paradigm, the interference from a nonattended face half on the perception of the attended half is taken as evidence for holistic processing (i.e., a composite effect). Far less research, however, has been dedicated to the concept of gaze. Nonetheless, gaze perception is a substantial component of face and body perception, and holds critical information for everyday communicative interactions. Furthermore, the ability of human observers to detect direct versus averted eye gaze is effortless, perhaps similar to identity perception and emotion recognition. However, the hypothesis of holistic perception of eye gaze has never been tested directly. Research on gaze perception with the composite design could facilitate further systematic comparison with other aspects of face and body perception that have been investigated using the composite design (i.e., identity and emotion). In the present research, a composite design was administered to assess holistic processing of gaze cues in faces (Experiment 1) and bodies (Experiment 2). Results confirmed that eye and head orientation (Experiment 1A) and head and body orientation (Experiment 2A) are integrated in a holistic manner. However, the composite effect was not completely disrupted by inversion (Experiments 1B and 2B), a finding that will be discussed together with implications for future research.

  3. Eliciting extra prominence in read-speech tasks: The effects of different text-highlighting methods on acoustic cues to perceived prominence

    DEFF Research Database (Denmark)

    Berger, Stephanie; Niebuhr, Oliver; Fischer, Kerstin

    2018-01-01

    The research initiative Innovating Speech EliCitation Techniques (INSPECT) aims to describe and quantify how recording methods, situations and materials influence speech produc-tion in lab-speech experiments. On this basis, INSPECT aims to develop methods that reliably stimulate specific patterns...... and styles of speech, like expressive or conversational speech or different types emphatic accents. The present study investigates if and how different text highlighting methods (yellow background, bold, capital letter, italics, and underlining) make speakers reinforce the level of perceived prominence...

  4. Integration of contextual cues into memory depends on "prefrontal" N-methyl-D-aspartate receptors.

    Science.gov (United States)

    Starosta, Sarah; Bartetzko, Isabelle; Stüttgen, Maik C; Güntürkün, Onur

    2017-10-01

    Every learning event is embedded in a context, but not always does the context become an integral part of the memory; however, for extinction learning it usually does, resulting in context-specific conditioned responding. The neuronal mechanisms underlying contextual control have been mainly investigated for Pavlovian fear extinction with a focus on hippocampal structures. However, the initial acquisition of novel responses can be subject to contextual control as well, although the neuronal mechanisms are mostly unknown. Here, we tested the hypothesis that contextual control of acquisition depends on glutamatergic transmission underlying executive functions in forebrain areas, e.g. by shifting attention to critical cues. Thus, we antagonized N-methyl-D-aspartate (NMDA) receptors with 2-amino-5-phosphonovaleric acid (AP5) in the pigeon nidopallium caudolaterale, the functional analogue of mammalian prefrontal cortex, during the concomitant acquisition and extinction of conditioned responding to two different stimuli. This paradigm has previously been shown to lead to contextual control over extinguished as well as non-extinguished responding. NMDA receptor blockade resulted in an impairment of extinction learning, but left the acquisition of responses to a novel stimulus unaffected. Critically, when responses were tested in a different context in the retrieval phase, we observed that NMDA receptor blockade led to the abolishment of contextual control over acquisition performance. This result is predicted by a model describing response inclination as the product of associative strength and contextual gain. In this model, learning under AP5 leads to a change in the contextual gain on the learned association, possibly via the modulation of attentional mechanisms. Copyright © 2017 Elsevier Inc. All rights reserved.

  5. Audiovisual speech integration in the superior temporal region is dysfunctional in dyslexia.

    Science.gov (United States)

    Ye, Zheng; Rüsseler, Jascha; Gerth, Ivonne; Münte, Thomas F

    2017-07-25

    Dyslexia is an impairment of reading and spelling that affects both children and adults even after many years of schooling. Dyslexic readers have deficits in the integration of auditory and visual inputs but the neural mechanisms of the deficits are still unclear. This fMRI study examined the neural processing of auditorily presented German numbers 0-9 and videos of lip movements of a German native speaker voicing numbers 0-9 in unimodal (auditory or visual) and bimodal (always congruent) conditions in dyslexic readers and their matched fluent readers. We confirmed results of previous studies that the superior temporal gyrus/sulcus plays a critical role in audiovisual speech integration: fluent readers showed greater superior temporal activations for combined audiovisual stimuli than auditory-/visual-only stimuli. Importantly, such an enhancement effect was absent in dyslexic readers. Moreover, the auditory network (bilateral superior temporal regions plus medial PFC) was dynamically modulated during audiovisual integration in fluent, but not in dyslexic readers. These results suggest that superior temporal dysfunction may underly poor audiovisual speech integration in readers with dyslexia. Copyright © 2017 IBRO. Published by Elsevier Ltd. All rights reserved.

  6. The effect of a concurrent working memory task and temporal offsets on the integration of auditory and visual speech information.

    Science.gov (United States)

    Buchan, Julie N; Munhall, Kevin G

    2012-01-01

    Audiovisual speech perception is an everyday occurrence of multisensory integration. Conflicting visual speech information can influence the perception of acoustic speech (namely the McGurk effect), and auditory and visual speech are integrated over a rather wide range of temporal offsets. This research examined whether the addition of a concurrent cognitive load task would affect the audiovisual integration in a McGurk speech task and whether the cognitive load task would cause more interference at increasing offsets. The amount of integration was measured by the proportion of responses in incongruent trials that did not correspond to the audio (McGurk response). An eye-tracker was also used to examine whether the amount of temporal offset and the presence of a concurrent cognitive load task would influence gaze behavior. Results from this experiment show a very modest but statistically significant decrease in the number of McGurk responses when subjects also perform a cognitive load task, and that this effect is relatively constant across the various temporal offsets. Participant's gaze behavior was also influenced by the addition of a cognitive load task. Gaze was less centralized on the face, less time was spent looking at the mouth and more time was spent looking at the eyes, when a concurrent cognitive load task was added to the speech task.

  7. Large Scale Functional Brain Networks Underlying Temporal Integration of Audio-Visual Speech Perception: An EEG Study.

    Science.gov (United States)

    Kumar, G Vinodh; Halder, Tamesh; Jaiswal, Amit K; Mukherjee, Abhishek; Roy, Dipanjan; Banerjee, Arpan

    2016-01-01

    Observable lip movements of the speaker influence perception of auditory speech. A classical example of this influence is reported by listeners who perceive an illusory (cross-modal) speech sound (McGurk-effect) when presented with incongruent audio-visual (AV) speech stimuli. Recent neuroimaging studies of AV speech perception accentuate the role of frontal, parietal, and the integrative brain sites in the vicinity of the superior temporal sulcus (STS) for multisensory speech perception. However, if and how does the network across the whole brain participates during multisensory perception processing remains an open question. We posit that a large-scale functional connectivity among the neural population situated in distributed brain sites may provide valuable insights involved in processing and fusing of AV speech. Varying the psychophysical parameters in tandem with electroencephalogram (EEG) recordings, we exploited the trial-by-trial perceptual variability of incongruent audio-visual (AV) speech stimuli to identify the characteristics of the large-scale cortical network that facilitates multisensory perception during synchronous and asynchronous AV speech. We evaluated the spectral landscape of EEG signals during multisensory speech perception at varying AV lags. Functional connectivity dynamics for all sensor pairs was computed using the time-frequency global coherence, the vector sum of pairwise coherence changes over time. During synchronous AV speech, we observed enhanced global gamma-band coherence and decreased alpha and beta-band coherence underlying cross-modal (illusory) perception compared to unisensory perception around a temporal window of 300-600 ms following onset of stimuli. During asynchronous speech stimuli, a global broadband coherence was observed during cross-modal perception at earlier times along with pre-stimulus decreases of lower frequency power, e.g., alpha rhythms for positive AV lags and theta rhythms for negative AV lags. Thus, our

  8. Integration of polarization and chromatic cues in the insect sky compass.

    Science.gov (United States)

    el Jundi, Basil; Pfeiffer, Keram; Heinze, Stanley; Homberg, Uwe

    2014-06-01

    Animals relying on a celestial compass for spatial orientation may use the position of the sun, the chromatic or intensity gradient of the sky, the polarization pattern of the sky, or a combination of these cues as compass signals. Behavioral experiments in bees and ants, indeed, showed that direct sunlight and sky polarization play a role in sky compass orientation, but the relative importance of these cues are species-specific. Intracellular recordings from polarization-sensitive interneurons in the desert locust and monarch butterfly suggest that inputs from different eye regions, including polarized-light input through the dorsal rim area of the eye and chromatic/intensity gradient input from the main eye, are combined at the level of the medulla to create a robust compass signal. Conflicting input from the polarization and chromatic/intensity channel, resulting from eccentric receptive fields, is eliminated at the level of the anterior optic tubercle and central complex through internal compensation for changing solar elevations, which requires input from a circadian clock. Across several species, the central complex likely serves as an internal sky compass, combining E-vector information with other celestial cues. Descending neurons, likewise, respond both to zenithal polarization and to unpolarized cues in an azimuth-dependent way.

  9. Speech understanding in noise with integrated in-ear and muff-style hearing protection systems

    Directory of Open Access Journals (Sweden)

    Sharon M Abel

    2011-01-01

    Full Text Available Integrated hearing protection systems are designed to enhance free field and radio communications during military operations while protecting against the damaging effects of high-level noise exposure. A study was conducted to compare the effect of increasing the radio volume on the intelligibility of speech over the radios of two candidate systems, in-ear and muff-style, in 85-dBA speech babble noise presented free field. Twenty normal-hearing, English-fluent subjects, half male and half female, were tested in same gender pairs. Alternating as talker and listener, their task was to discriminate consonant-vowel-consonant syllables that contrasted either the initial or final consonant. Percent correct consonant discrimination increased with increases in the radio volume. At the highest volume, subjects achieved 79% with the in-ear device but only 69% with the muff-style device, averaged across the gender of listener/talker pairs and consonant position. Although there was no main effect of gender, female listener/talkers showed a 10% advantage for the final consonant and male listener/talkers showed a 1% advantage for the initial consonant. These results indicate that normal hearing users can achieve reasonably high radio communication scores with integrated in-ear hearing protection in moderately high-level noise that provides both energetic and informational masking. The adequacy of the range of available radio volumes for users with hearing loss has yet to be determined.

  10. Heads First: Visual Aftereffects Reveal Hierarchical Integration of Cues to Social Attention.

    Directory of Open Access Journals (Sweden)

    Sarah Cooney

    Full Text Available Determining where another person is attending is an important skill for social interaction that relies on various visual cues, including the turning direction of the head and body. This study reports a novel high-level visual aftereffect that addresses the important question of how these sources of information are combined in gauging social attention. We show that adapting to images of heads turned 25° to the right or left produces a perceptual bias in judging the turning direction of subsequently presented bodies. In contrast, little to no change in the judgment of head orientation occurs after adapting to extremely oriented bodies. The unidirectional nature of the aftereffect suggests that cues from the human body signaling social attention are combined in a hierarchical fashion and is consistent with evidence from single-cell recording studies in nonhuman primates showing that information about head orientation can override information about body posture when both are visible.

  11. Integrating cues of social interest and voice pitch in men's preferences for women's voices

    OpenAIRE

    Jones, Benedict C; Feinberg, David R; DeBruine, Lisa M; Little, Anthony C; Vukovic, Jovana

    2008-01-01

    Most previous studies of vocal attractiveness have focused on preferences for physical characteristics of voices such as pitch. Here we examine the content of vocalizations in interaction with such physical traits, finding that vocal cues of social interest modulate the strength of men's preferences for raised pitch in women's voices. Men showed stronger preferences for raised pitch when judging the voices of women who appeared interested in the listener than when judging the voices of women ...

  12. Spectro-temporal cues enhance modulation sensitivity in cochlear implant users.

    Science.gov (United States)

    Zheng, Yi; Escabí, Monty; Litovsky, Ruth Y

    2017-08-01

    Although speech understanding is highly variable amongst cochlear implants (CIs) subjects, the remarkably high speech recognition performance of many CI users is unexpected and not well understood. Numerous factors, including neural health and degradation of the spectral information in the speech signal of CIs, likely contribute to speech understanding. We studied the ability to use spectro-temporal modulations, which may be critical for speech understanding and discrimination, and hypothesize that CI users adopt a different perceptual strategy than normal-hearing (NH) individuals, whereby they rely more heavily on joint spectro-temporal cues to enhance detection of auditory cues. Modulation detection sensitivity was studied in CI users and NH subjects using broadband "ripple" stimuli that were modulated spectrally, temporally, or jointly, i.e., spectro-temporally. The spectro-temporal modulation transfer functions of CI users and NH subjects was decomposed into spectral and temporal dimensions and compared to those subjects' spectral-only and temporal-only modulation transfer functions. In CI users, the joint spectro-temporal sensitivity was better than that predicted by spectral-only and temporal-only sensitivity, indicating a heightened spectro-temporal sensitivity. Such an enhancement through the combined integration of spectral and temporal cues was not observed in NH subjects. The unique use of spectro-temporal cues by CI patients can yield benefits for use of cues that are important for speech understanding. This finding has implications for developing sound processing strategies that may rely on joint spectro-temporal modulations to improve speech comprehension of CI users, and the findings of this study may be valuable for developing clinical assessment tools to optimize CI processor performance. Copyright © 2017 Elsevier B.V. All rights reserved.

  13. Spectro-temporal cues enhance modulation sensitivity in cochlear implant users

    Science.gov (United States)

    Zheng, Yi; Escabí, Monty; Litovsky, Ruth Y.

    2018-01-01

    Although speech understanding is highly variable amongst cochlear implants (CIs) subjects, the remarkably high speech recognition performance of many CI users is unexpected and not well understood. Numerous factors, including neural health and degradation of the spectral information in the speech signal of CIs, likely contribute to speech understanding. We studied the ability to use spectro-temporal modulations, which may be critical for speech understanding and discrimination, and hypothesize that CI users adopt a different perceptual strategy than normal-hearing (NH) individuals, whereby they rely more heavily on joint spectro-temporal cues to enhance detection of auditory cues. Modulation detection sensitivity was studied in CI users and NH subjects using broadband “ripple” stimuli that were modulated spectrally, temporally, or jointly, i.e., spectro-temporally. The spectro-temporal modulation transfer functions of CI users and NH subjects was decomposed into spectral and temporal dimensions and compared to those subjects’ spectral-only and temporal-only modulation transfer functions. In CI users, the joint spectro-temporal sensitivity was better than that predicted by spectral-only and temporal-only sensitivity, indicating a heightened spectro-temporal sensitivity. Such an enhancement through the combined integration of spectral and temporal cues was not observed in NH subjects. The unique use of spectro-temporal cues by CI patients can yield benefits for use of cues that are important for speech understanding. This finding has implications for developing sound processing strategies that may rely on joint spectro-temporal modulations to improve speech comprehension of CI users, and the findings of this study may be valuable for developing clinical assessment tools to optimize CI processor performance. PMID:28601530

  14. Is Birdsong More Like Speech or Music?

    Science.gov (United States)

    Shannon, Robert V

    2016-04-01

    Music and speech share many acoustic cues but not all are equally important. For example, harmonic pitch is essential for music but not for speech. When birds communicate is their song more like speech or music? A new study contrasting pitch and spectral patterns shows that birds perceive their song more like humans perceive speech. Copyright © 2016 Elsevier Ltd. All rights reserved.

  15. Integrating Music Therapy Services and Speech-Language Therapy Services for Children with Severe Communication Impairments: A Co-Treatment Model

    Science.gov (United States)

    Geist, Kamile; McCarthy, John; Rodgers-Smith, Amy; Porter, Jessica

    2008-01-01

    Documenting how music therapy can be integrated with speech-language therapy services for children with communication delay is not evident in the literature. In this article, a collaborative model with procedures, experiences, and communication outcomes of integrating music therapy with the existing speech-language services is given. Using…

  16. Some Behavioral and Neurobiological Constraints on Theories of Audiovisual Speech Integration: A Review and Suggestions for New Directions

    Science.gov (United States)

    Altieri, Nicholas; Pisoni, David B.; Townsend, James T.

    2012-01-01

    Summerfield (1987) proposed several accounts of audiovisual speech perception, a field of research that has burgeoned in recent years. The proposed accounts included the integration of discrete phonetic features, vectors describing the values of independent acoustical and optical parameters, the filter function of the vocal tract, and articulatory dynamics of the vocal tract. The latter two accounts assume that the representations of audiovisual speech perception are based on abstract gestures, while the former two assume that the representations consist of symbolic or featural information obtained from visual and auditory modalities. Recent converging evidence from several different disciplines reveals that the general framework of Summerfield’s feature-based theories should be expanded. An updated framework building upon the feature-based theories is presented. We propose a processing model arguing that auditory and visual brain circuits provide facilitatory information when the inputs are correctly timed, and that auditory and visual speech representations do not necessarily undergo translation into a common code during information processing. Future research on multisensory processing in speech perception should investigate the connections between auditory and visual brain regions, and utilize dynamic modeling tools to further understand the timing and information processing mechanisms involved in audiovisual speech integration. PMID:21968081

  17. Cuttlefish dynamic camouflage: responses to substrate choice and integration of multiple visual cues.

    Science.gov (United States)

    Allen, Justine J; Mäthger, Lydia M; Barbosa, Alexandra; Buresch, Kendra C; Sogin, Emilia; Schwartz, Jillian; Chubb, Charles; Hanlon, Roger T

    2010-04-07

    Prey camouflage is an evolutionary response to predation pressure. Cephalopods have extensive camouflage capabilities and studying them can offer insight into effective camouflage design. Here, we examine whether cuttlefish, Sepia officinalis, show substrate or camouflage pattern preferences. In the first two experiments, cuttlefish were presented with a choice between different artificial substrates or between different natural substrates. First, the ability of cuttlefish to show substrate preference on artificial and natural substrates was established. Next, cuttlefish were offered substrates known to evoke three main camouflage body pattern types these animals show: Uniform or Mottle (function by background matching); or Disruptive. In a third experiment, cuttlefish were presented with conflicting visual cues on their left and right sides to assess their camouflage response. Given a choice between substrates they might encounter in nature, we found no strong substrate preference except when cuttlefish could bury themselves. Additionally, cuttlefish responded to conflicting visual cues with mixed body patterns in both the substrate preference and split substrate experiments. These results suggest that differences in energy costs for different camouflage body patterns may be minor and that pattern mixing and symmetry may play important roles in camouflage.

  18. Integration of visual and non-visual self-motion cues during voluntary head movements in the human brain.

    Science.gov (United States)

    Schindler, Andreas; Bartels, Andreas

    2018-05-15

    Our phenomenological experience of the stable world is maintained by continuous integration of visual self-motion with extra-retinal signals. However, due to conventional constraints of fMRI acquisition in humans, neural responses to visuo-vestibular integration have only been studied using artificial stimuli, in the absence of voluntary head-motion. We here circumvented these limitations and let participants to move their heads during scanning. The slow dynamics of the BOLD signal allowed us to acquire neural signal related to head motion after the observer's head was stabilized by inflatable aircushions. Visual stimuli were presented on head-fixed display goggles and updated in real time as a function of head-motion that was tracked using an external camera. Two conditions simulated forward translation of the participant. During physical head rotation, the congruent condition simulated a stable world, whereas the incongruent condition added arbitrary lateral motion. Importantly, both conditions were precisely matched in visual properties and head-rotation. By comparing congruent with incongruent conditions we found evidence consistent with the multi-modal integration of visual cues with head motion into a coherent "stable world" percept in the parietal operculum and in an anterior part of parieto-insular cortex (aPIC). In the visual motion network, human regions MST, a dorsal part of VIP, the cingulate sulcus visual area (CSv) and a region in precuneus (Pc) showed differential responses to the same contrast. The results demonstrate for the first time neural multimodal interactions between precisely matched congruent versus incongruent visual and non-visual cues during physical head-movement in the human brain. The methodological approach opens the path to a new class of fMRI studies with unprecedented temporal and spatial control over visuo-vestibular stimulation. Copyright © 2018 Elsevier Inc. All rights reserved.

  19. Robust speech dereverberation with a neural network-based post-filter that exploits multi-conditional training of binaural cues

    DEFF Research Database (Denmark)

    May, Tobias

    2018-01-01

    -frequency (T-F) units. A multi-conditional training (MCT) procedure was used to simulate the uncertainties of short-term binaural cues in response to room reverberation by mixing the direct part of head related impulse responses (HRIRs) with diffuse noise. Despite being trained with only anechoic HRIRs...

  20. Using ILD or ITD Cues for Sound Source Localization and Speech Understanding in a Complex Listening Environment by Listeners with Bilateral and with Hearing-Preservation Cochlear Implants

    Science.gov (United States)

    Loiselle, Louise H.; Dorman, Michael F.; Yost, William A.; Cook, Sarah J.; Gifford, Rene H.

    2016-01-01

    Purpose: To assess the role of interaural time differences and interaural level differences in (a) sound-source localization, and (b) speech understanding in a cocktail party listening environment for listeners with bilateral cochlear implants (CIs) and for listeners with hearing-preservation CIs. Methods: Eleven bilateral listeners with MED-EL…

  1. Temporal dynamics of sensorimotor integration in speech perception and production: Independent component analysis of EEG data

    Directory of Open Access Journals (Sweden)

    David eJenson

    2014-07-01

    Full Text Available Activity in premotor and sensorimotor cortices is found in speech production and some perception tasks. Yet, how sensorimotor integration supports these functions is unclear due to a lack of data examining the timing of activity from these regions. Beta (~20Hz and alpha (~10Hz spectral power within the EEG µ rhythm are considered indices of motor and somatosensory activity, respectively. In the current study, perception conditions required discrimination (same/different of syllables pairs (/ba/ and /da/ in quiet and noisy conditions. Production conditions required covert and overt syllable productions and overt word production. Independent component analysis was performed on EEG data obtained during these conditions to 1 identify clusters of µ components common to all conditions and 2 examine real-time event-related spectral perturbations (ERSP within alpha and beta bands. 17 and 15 out of 20 participants produced left and right µ-components, respectively, localized to precentral gyri. Discrimination conditions were characterized by significant (pFDR<.05 early alpha event-related synchronization (ERS prior to and during stimulus presentation and later alpha event-related desynchronization (ERD following stimulus offset. Beta ERD began early and gained strength across time. Differences were found between quiet and noisy discrimination conditions. Both overt syllable and word productions yielded similar alpha/beta ERD that began prior to production and was strongest during muscle activity. Findings during covert production were weaker than during overt production. One explanation for these findings is that µ-beta ERD indexes early predictive coding (e.g., internal modeling and/or overt and covert attentional / motor processes. µ-alpha ERS may index inhibitory input to the premotor cortex from sensory regions prior to and during discrimination, while µ-alpha ERD may index re-afferent sensory feedback during speech rehearsal and production.

  2. Experienced speech-language pathologists' responses to ethical dilemmas: an integrated approach to ethical reasoning.

    Science.gov (United States)

    Kenny, Belinda; Lincoln, Michelle; Balandin, Susan

    2010-05-01

    To investigate the approaches of experienced speech-language pathologists (SLPs) to ethical reasoning and the processes they use to resolve ethical dilemmas. Ten experienced SLPs participated in in-depth interviews. A narrative approach was used to guide participants' descriptions of how they resolved ethical dilemmas. Individual narrative transcriptions were analyzed by using the participant's words to develop an ethical story that described and interpreted their responses to dilemmas. Key concepts from individual stories were then coded into group themes to reflect participants' reasoning processes. Five major themes reflected participants' approaches to ethical reasoning: (a) focusing on the well-being of the client, (b) fulfilling professional roles and responsibilities, (c) attending to professional relationships, (d) managing resources, and (e) integrating personal and professional values. SLPs demonstrated a range of ethical reasoning processes: applying bioethical principles, casuistry, and narrative reasoning when managing ethical dilemmas in the workplace. The results indicate that experienced SLPs adopted an integrated approach to ethical reasoning. They supported clients' rights to make health care choices. Bioethical principles, casuistry, and narrative reasoning provided useful frameworks for facilitating health professionals' application of codes of ethics to complex professional practice issues.

  3. Sugar Antennae for Guidance Signals: Syndecans and Glypicans Integrate Directional Cues for Navigating Neurons

    Directory of Open Access Journals (Sweden)

    Christa Rhiner

    2006-01-01

    Full Text Available Attractive and repulsive signals guide migrating nerve cells in all directions when the nervous system starts to form. The neurons extend thin processes, axons, that connect over wide distances with other brain cells to form a complicated neuronal network. One of the most fascinating questions in neuroscience is how the correct wiring of billions of nerve cells in our brain is controlled. Several protein families are known to serve as guidance cues for navigating neurons and axons. Nevertheless, the combinatorial potential of these proteins seems to be insufficient to sculpt the entire neuronal network and the appropriate formation of connections. Recently, heparan sulfate proteoglycans (HSPGs, which are present on the cell surface of neurons and in the extracellular matrix through which neurons and axons migrate, have been found to play a role in regulating cell migration and axon guidance. Intriguingly, the large number of distinct modifications that can be put onto the sugar side chains of these PGs would in principle allow for an enormous diversity of HSPGs, which could help in regulating the vast number of guidance choices taken by individual neurons. In this review, we will focus on the role of the cell surface HSPGs syndecan and glypican and specific HS modifications in promoting neuronal migration, axon guidance, and synapse formation.

  4. Guidelines for the integration of audio cues into computer user interfaces

    Energy Technology Data Exchange (ETDEWEB)

    Sumikawa, D.A.

    1985-06-01

    Throughout the history of computers, vision has been the main channel through which information is conveyed to the computer user. As the complexities of man-machine interactions increase, more and more information must be transferred from the computer to the user and then successfully interpreted by the user. A logical next step in the evolution of the computer-user interface is the incorporation of sound and thereby using the sense of ''hearing'' in the computer experience. This allows our visual and auditory capabilities to work naturally together in unison leading to more effective and efficient interpretation of all information received by the user from the computer. This thesis presents an initial set of guidelines to assist interface developers in designing an effective sight and sound user interface. This study is a synthesis of various aspects of sound, human communication, computer-user interfaces, and psychoacoustics. We introduce the notion of an earcon. Earcons are audio cues used in the computer-user interface to provide information and feedback to the user about some computer object, operation, or interaction. A possible construction technique for earcons, the use of earcons in the interface, how earcons are learned and remembered, and the affects of earcons on their users are investigated. This study takes the point of view that earcons are a language and human/computer communication issue and are therefore analyzed according to the three dimensions of linguistics; syntactics, semantics, and pragmatics.

  5. The level of audiovisual print-speech integration deficits in dyslexia.

    Science.gov (United States)

    Kronschnabel, Jens; Brem, Silvia; Maurer, Urs; Brandeis, Daniel

    2014-09-01

    The classical phonological deficit account of dyslexia is increasingly linked to impairments in grapho-phonological conversion, and to dysfunctions in superior temporal regions associated with audiovisual integration. The present study investigates mechanisms of audiovisual integration in typical and impaired readers at the critical developmental stage of adolescence. Congruent and incongruent audiovisual as well as unimodal (visual only and auditory only) material was presented. Audiovisual presentations were single letters and three-letter (consonant-vowel-consonant) stimuli accompanied by matching or mismatching speech sounds. Three-letter stimuli exhibited fast phonetic transitions as in real-life language processing and reading. Congruency effects, i.e. different brain responses to congruent and incongruent stimuli were taken as an indicator of audiovisual integration at a phonetic level (grapho-phonological conversion). Comparisons of unimodal and audiovisual stimuli revealed basic, more sensory aspects of audiovisual integration. By means of these two criteria of audiovisual integration, the generalizability of audiovisual deficits in dyslexia was tested. Moreover, it was expected that the more naturalistic three-letter stimuli are superior to single letters in revealing group differences. Electrophysiological and hemodynamic (EEG and fMRI) data were acquired simultaneously in a simple target detection task. Applying the same statistical models to event-related EEG potentials and fMRI responses allowed comparing the effects detected by the two techniques at a descriptive level. Group differences in congruency effects (congruent against incongruent) were observed in regions involved in grapho-phonological processing, including the left inferior frontal and angular gyri and the inferotemporal cortex. Importantly, such differences also emerged in superior temporal key regions. Three-letter stimuli revealed stronger group differences than single letters. No

  6. A randomized controlled trial on the beneficial effects of training letter-speech sound integration on reading fluency in children with dyslexia

    NARCIS (Netherlands)

    Fraga González, G.; Žarić, G.; Tijms, J.; Bonte, M.; Blomert, L.; van der Molen, M.W.

    2015-01-01

    A recent account of dyslexia assumes that a failure to develop automated letter-speech sound integration might be responsible for the observed lack of reading fluency. This study uses a pre-test-training-post-test design to evaluate the effects of a training program based on letter-speech sound

  7. The Effective Use of Symbols in Teaching Word Recognition to Children with Severe Learning Difficulties: A Comparison of Word Alone, Integrated Picture Cueing and the Handle Technique.

    Science.gov (United States)

    Sheehy, Kieron

    2002-01-01

    A comparison is made between a new technique (the Handle Technique), Integrated Picture Cueing, and a Word Alone Method. Results show using a new combination of teaching strategies enabled logographic symbols to be used effectively in teaching word recognition to 12 children with severe learning difficulties. (Contains references.) (Author/CR)

  8. Visual cues and listening effort: individual variability.

    Science.gov (United States)

    Picou, Erin M; Ricketts, Todd A; Hornsby, Benjamin W Y

    2011-10-01

    To investigate the effect of visual cues on listening effort as well as whether predictive variables such as working memory capacity (WMC) and lipreading ability affect the magnitude of listening effort. Twenty participants with normal hearing were tested using a paired-associates recall task in 2 conditions (quiet and noise) and 2 presentation modalities (audio only [AO] and auditory-visual [AV]). Signal-to-noise ratios were adjusted to provide matched speech recognition across audio-only and AV noise conditions. Also measured were subjective perceptions of listening effort and 2 predictive variables: (a) lipreading ability and (b) WMC. Objective and subjective results indicated that listening effort increased in the presence of noise, but on average the addition of visual cues did not significantly affect the magnitude of listening effort. Although there was substantial individual variability, on average participants who were better lipreaders or had larger WMCs demonstrated reduced listening effort in noise in AV conditions. Overall, the results support the hypothesis that integrating auditory and visual cues requires cognitive resources in some participants. The data indicate that low lipreading ability or low WMC is associated with relatively effortful integration of auditory and visual information in noise.

  9. The development of co-speech gesture and its semantic integration with speech in 6- to 12-year-old children with autism spectrum disorders.

    Science.gov (United States)

    So, Wing-Chee; Wong, Miranda Kit-Yi; Lui, Ming; Yip, Virginia

    2015-11-01

    Previous work leaves open the question of whether children with autism spectrum disorders aged 6-12 years have delay in producing gestures compared to their typically developing peers. This study examined gestural production among school-aged children in a naturalistic context and how their gestures are semantically related to the accompanying speech. Delay in gestural production was found in children with autism spectrum disorders through their middle to late childhood. Compared to their typically developing counterparts, children with autism spectrum disorders gestured less often and used fewer types of gestures, in particular markers, which carry culture-specific meaning. Typically developing children's gestural production was related to language and cognitive skills, but among children with autism spectrum disorders, gestural production was more strongly related to the severity of socio-communicative impairment. Gesture impairment also included the failure to integrate speech with gesture: in particular, supplementary gestures are absent in children with autism spectrum disorders. The findings extend our understanding of gestural production in school-aged children with autism spectrum disorders during spontaneous interaction. The results can help guide new therapies for gestural production for children with autism spectrum disorders in middle and late childhood. © The Author(s) 2014.

  10. Integration of visual and inertial cues in the perception of angular self-motion

    NARCIS (Netherlands)

    Winkel, K.N. de; Soyka, F.; Barnett-Cowan, M.; Bülthoff, H.H.; Groen, E.L.; Werkhoven, P.J.

    2013-01-01

    The brain is able to determine angular self-motion from visual, vestibular, and kinesthetic information. There is compelling evidence that both humans and non-human primates integrate visual and inertial (i.e., vestibular and kinesthetic) information in a statistically optimal fashion when

  11. Integration of reward signalling and appetite regulating peptide systems in the control of food-cue responses.

    Science.gov (United States)

    Reichelt, A C; Westbrook, R F; Morris, M J

    2015-11-01

    Understanding the neurobiological substrates that encode learning about food-associated cues and how those signals are modulated is of great clinical importance especially in light of the worldwide obesity problem. Inappropriate or maladaptive responses to food-associated cues can promote over-consumption, leading to excessive energy intake and weight gain. Chronic exposure to foods rich in fat and sugar alters the reinforcing value of foods and weakens inhibitory neural control, triggering learned, but maladaptive, associations between environmental cues and food rewards. Thus, responses to food-associated cues can promote cravings and food-seeking by activating mesocorticolimbic dopamine neurocircuitry, and exert physiological effects including salivation. These responses may be analogous to the cravings experienced by abstaining drug addicts that can trigger relapse into drug self-administration. Preventing cue-triggered eating may therefore reduce the over-consumption seen in obesity and binge-eating disorder. In this review we discuss recent research examining how cues associated with palatable foods can promote reward-based feeding behaviours and the potential involvement of appetite-regulating peptides including leptin, ghrelin, orexin and melanin concentrating hormone. These peptide signals interface with mesolimbic dopaminergic regions including the ventral tegmental area to modulate reactivity to cues associated with palatable foods. Thus, a novel target for anti-obesity therapeutics is to reduce non-homeostatic, reward driven eating behaviour, which can be triggered by environmental cues associated with highly palatable, fat and sugar rich foods. © 2015 The British Pharmacological Society.

  12. On the functional integration between postural and supra-postural tasks on the basis of contextual cues and task constraint.

    Science.gov (United States)

    de Lima, Andrea Cristina; de Azevedo Neto, Raymundo Machado; Teixeira, Luis Augusto

    2010-10-01

    In order to evaluate the effects of uncertainty about direction of mechanical perturbation and supra-postural task constraint on postural control, young adults had their upright stance perturbed while holding a tray in a horizontal position. Stance was perturbed by moving forward or backward a supporting platform, contrasting situations of certainty versus uncertainty of direction of displacement. Increased constraint on postural stability was imposed by a supra-postural task of equilibrating a cylinder on the tray. Performance was assessed through EMG of anterior leg muscles, angular displacement of the main joints involved in the postural reactions and displacement of the tray. Results showed that both certainty on the direction of perturbation and increased supra-postural task constraint led to decreased angular displacement of the knee and the hip. Furthermore, combination of certainty and high supra-postural task constraint produced shorter latency of muscular activation. Such postural responses were paralleled by decreased displacement of the tray. These results suggest a functional integration between the tasks, with central set priming reactive postural responses from contextual cues and increased stability demand. Copyright © 2010 Elsevier B.V. All rights reserved.

  13. Two- and 3-year-olds integrate linguistic and pedagogical cues in guiding inductive generalization and exploration.

    Science.gov (United States)

    Butler, Lucas P; Tomasello, Michael

    2016-05-01

    Young children can in principle make generic inferences (e.g., "doffels are magnetic") on the basis of their own individual experience. Recent evidence, however, shows that by 4 years of age children make strong generic inferences on the basis of a single pedagogical demonstration with an individual (e.g., an adult demonstrates for the child that a single "doffel" is magnetic). In the current experiments, we extended this to look at younger children, investigating how the mechanisms underlying this phenomenon are integrated with other aspects of inductive inference during early development. We found that both 2- and 3-year-olds used pedagogical cues to guide such generic inferences, but only so long as the "doffel" was linguistically labeled. In a follow-up study, 3-year-olds, but not 2-year-olds, continued to make this generic inference even if the word "doffel" was uttered incidentally and non-referentially in a context preceding the pedagogical demonstration, thereby simply marking the opportunity to learn about a culturally important category. By 3 years of age, then, young children show a remarkable ability to flexibly combine different sources of culturally relevant information (e.g., linguistic labeling, pedagogy) to make the kinds of generic inferences so central in human cultural learning. Copyright © 2015 Elsevier Inc. All rights reserved.

  14. Defective chromatic and achromatic visual pathways in developmental dyslexia: Cues for an integrated intervention programme.

    Science.gov (United States)

    Bonfiglio, Luca; Bocci, Tommaso; Minichilli, Fabrizio; Crecchi, Alessandra; Barloscio, Davide; Spina, Donata Maria; Rossi, Bruno; Sartucci, Ferdinando

    2017-01-01

    As well as obtaining confirmation of the magnocellular system involvement in developmental dyslexia (DD); the aim was primarily to search for a possible involvement of the parvocellular system; and, furthermore, to complete the assessment of the visual chromatic axis by also analysing the koniocellular system. Visual evoked potentials (VEPs) in response to achromatic stimuli with low luminance contrast and low spatial frequency, and isoluminant red/green and blue/yellow stimuli with high spatial frequency were recorded in 10 dyslexic children and 10 age- and sex-matched, healthy subjects. Dyslexic children showed delayed VEPs to both achromatic stimuli (magnocellular-dorsal stream) and isoluminant red/green and blue/yellow stimuli (parvocellular-ventral and koniocellular streams). To our knowledge, this is the first time that a dysfunction of colour vision has been brought to light in an objective way (i.e., by means of electrophysiological methods) in children with DD. These results give rise to speculation concerning the need for a putative approach for promoting both learning how to read and/or improving existing reading skills of children with or at risk of DD. The working hypothesis would be to combine two integrated interventions in a single programme aimed at fostering the function of both the magnocellular and the parvocellular streams.

  15. A Bistable Circuit Involving SCARECROW-RETINOBLASTOMA Integrates Cues to Inform Asymmetric Stem Cell Division

    Science.gov (United States)

    Cruz-Ramírez, Alfredo; Díaz-Triviño, Sara; Blilou, Ikram; Grieneisen, Verônica A.; Sozzani, Rosangela; Zamioudis, Christos; Miskolczi, Pál; Nieuwland, Jeroen; Benjamins, René; Dhonukshe, Pankaj; Caballero-Pérez, Juan; Horvath, Beatrix; Long, Yuchen; Mähönen, Ari Pekka; Zhang, Hongtao; Xu, Jian; Murray, James A.H.; Benfey, Philip N.; Bako, Laszlo; Marée, Athanasius F.M.; Scheres, Ben

    2012-01-01

    SUMMARY In plants, where cells cannot migrate, asymmetric cell divisions (ACDs) must be confined to the appropriate spatial context. We investigate tissue-generating asymmetric divisions in a stem cell daughter within the Arabidopsis root. Spatial restriction of these divisions requires physical binding of the stem cell regulator SCARECROW (SCR) by the RETINOBLASTOMA-RELATED (RBR) protein. In the stem cell niche, SCR activity is counteracted by phosphorylation of RBR through a cyclinD6;1-CDK complex. This cyclin is itself under transcriptional control of SCR and its partner SHORT ROOT (SHR), creating a robust bistable circuit with either high or low SHR-SCR complex activity. Auxin biases this circuit by promoting CYCD6;1 transcription. Mathematical modeling shows that ACDs are only switched on after integration of radial and longitudinal information, determined by SHR and auxin distribution, respectively. Coupling of cell-cycle progression to protein degradation resets the circuit, resulting in a “flip flop” that constrains asymmetric cell division to the stem cell region. PMID:22921914

  16. Integration of AI-2 Based Cell-Cell Signaling with Metabolic Cues in Escherichia coli.

    Directory of Open Access Journals (Sweden)

    Arindam Mitra

    Full Text Available The quorum sensing molecule Autoinducer-2 (AI-2 is generated as a byproduct of activated methyl cycle by the action of LuxS in Escherichia coli. AI-2 is synthesized, released and later internalized in a cell-density dependent manner. Here, by mutational analysis of the genes, uvrY and csrA, we describe a regulatory circuit of accumulation and uptake of AI-2. We constructed a single-copy chromosomal luxS-lacZ fusion in a luxS + merodiploid strain and evaluated its relative expression in uvrY and csrA mutants. At the entry of stationary phase, the expression of the fusion and AI-2 accumulation was positively regulated by uvrY and negatively regulated by csrA respectively. A deletion of csrA altered message stability of the luxS transcript and CsrA protein exhibited weak binding to 5' luxS regulatory region. DNA protein interaction and chromatin immunoprecipitation analysis confirmed direct interaction of UvrY with the luxS promoter. Additionally, reduced expression of the fusion in hfq deletion mutant suggested involvement of small RNA interactions in luxS regulation. In contrast, the expression of lsrA operon involved in AI-2 uptake, is negatively regulated by uvrY and positively by csrA in a cell-density dependent manner. The dual role of csrA in AI-2 synthesis and uptake suggested a regulatory crosstalk of cell signaling with carbon regulation in Escherichia coli. We found that the cAMP-CRP mediated catabolite repression of luxS expression was uvrY dependent. This study suggests that luxS expression is complex and regulated at the level of transcription and translation. The multifactorial regulation supports the notion that cell-cell communication requires interaction and integration of multiple metabolic signals.

  17. ON INTEGRATED COURSE “SOCIAL AND SPEECH COMMUNICATIONS” FOR STUDENTS OF ART HIGHER EDUCATIONAL ESTABLISHMENT

    Directory of Open Access Journals (Sweden)

    Elena Nicolaevna Klemenova

    2013-11-01

    Full Text Available The article describes the experience in teaching the course “Social and Speech Communication”. As the result of training the students are to master the arsenal of means for effective communication, the base of which turns out to be linguistic communication and its bearer that is the language personality, get knowledge about complex processes of information exchange, discover the psychological peculiarities of verbal and non-verbal communication, learn how to communicate for solving professional and personal problems.The skill of fluent mastering all kinds of speech activity, the skill of correct and intellectual communication in various spheres and structures, the skill of speech event linguistic analysis including from the point of view of their esthetical value represent the unity of systemic and individual approach in the sphere of humanitarian training for future architects, designers and managers.DOI: http://dx.doi.org/10.12731/2218-7405-2013-7-43

  18. Effects of Early Bilingual Experience with a Tone and a Non-Tone Language on Speech-Music Integration.

    Directory of Open Access Journals (Sweden)

    Salomi S Asaridou

    Full Text Available We investigated music and language processing in a group of early bilinguals who spoke a tone language and a non-tone language (Cantonese and Dutch. We assessed online speech-music processing interactions, that is, interactions that occur when speech and music are processed simultaneously in songs, with a speeded classification task. In this task, participants judged sung pseudowords either musically (based on the direction of the musical interval or phonologically (based on the identity of the sung vowel. We also assessed longer-term effects of linguistic experience on musical ability, that is, the influence of extensive prior experience with language when processing music. These effects were assessed with a task in which participants had to learn to identify musical intervals and with four pitch-perception tasks. Our hypothesis was that due to their experience in two different languages using lexical versus intonational tone, the early Cantonese-Dutch bilinguals would outperform the Dutch control participants. In online processing, the Cantonese-Dutch bilinguals processed speech and music more holistically than controls. This effect seems to be driven by experience with a tone language, in which integration of segmental and pitch information is fundamental. Regarding longer-term effects of linguistic experience, we found no evidence for a bilingual advantage in either the music-interval learning task or the pitch-perception tasks. Together, these results suggest that being a Cantonese-Dutch bilingual does not have any measurable longer-term effects on pitch and music processing, but does have consequences for how speech and music are processed jointly.

  19. A Causal Inference Model Explains Perception of the McGurk Effect and Other Incongruent Audiovisual Speech.

    Directory of Open Access Journals (Sweden)

    John F Magnotti

    2017-02-01

    Full Text Available Audiovisual speech integration combines information from auditory speech (talker's voice and visual speech (talker's mouth movements to improve perceptual accuracy. However, if the auditory and visual speech emanate from different talkers, integration decreases accuracy. Therefore, a key step in audiovisual speech perception is deciding whether auditory and visual speech have the same source, a process known as causal inference. A well-known illusion, the McGurk Effect, consists of incongruent audiovisual syllables, such as auditory "ba" + visual "ga" (AbaVga, that are integrated to produce a fused percept ("da". This illusion raises two fundamental questions: first, given the incongruence between the auditory and visual syllables in the McGurk stimulus, why are they integrated; and second, why does the McGurk effect not occur for other, very similar syllables (e.g., AgaVba. We describe a simplified model of causal inference in multisensory speech perception (CIMS that predicts the perception of arbitrary combinations of auditory and visual speech. We applied this model to behavioral data collected from 60 subjects perceiving both McGurk and non-McGurk incongruent speech stimuli. The CIMS model successfully predicted both the audiovisual integration observed for McGurk stimuli and the lack of integration observed for non-McGurk stimuli. An identical model without causal inference failed to accurately predict perception for either form of incongruent speech. The CIMS model uses causal inference to provide a computational framework for studying how the brain performs one of its most important tasks, integrating auditory and visual speech cues to allow us to communicate with others.

  20. Integrating Speech-Language Pathology Services in Palliative End-of-Life Care

    Science.gov (United States)

    Pollens, Robin D.

    2012-01-01

    Clinical speech-language pathologists (SLPs) may receive referrals to consult with teams serving patients who have a severe and/or terminal disease. Palliative care focuses on the prevention or relief of suffering to maximize quality of life for these patients and their families. This article describes how the role of the SLP in palliative care…

  1. Experienced Speech-Language Pathologists' Responses to Ethical Dilemmas: An Integrated Approach to Ethical Reasoning

    Science.gov (United States)

    Kenny, Belinda; Lincoln, Michelle; Balandin, Susan

    2010-01-01

    Purpose: To investigate the approaches of experienced speech-language pathologists (SLPs) to ethical reasoning and the processes they use to resolve ethical dilemmas. Method: Ten experienced SLPs participated in in-depth interviews. A narrative approach was used to guide participants' descriptions of how they resolved ethical dilemmas. Individual…

  2. Benefits to Speech Perception in Noise From the Binaural Integration of Electric and Acoustic Signals in Simulated Unilateral Deafness.

    Science.gov (United States)

    Ma, Ning; Morris, Saffron; Kitterick, Pádraig Thomas

    2016-01-01

    This study used vocoder simulations with normal-hearing (NH) listeners to (1) measure their ability to integrate speech information from an NH ear and a simulated cochlear implant (CI), and (2) investigate whether binaural integration is disrupted by a mismatch in the delivery of spectral information between the ears arising from a misalignment in the mapping of frequency to place. Eight NH volunteers participated in the study and listened to sentences embedded in background noise via headphones. Stimuli presented to the left ear were unprocessed. Stimuli presented to the right ear (referred to as the CI-simulation ear) were processed using an eight-channel noise vocoder with one of the three processing strategies. An Ideal strategy simulated a frequency-to-place map across all channels that matched the delivery of spectral information between the ears. A Realistic strategy created a misalignment in the mapping of frequency to place in the CI-simulation ear where the size of the mismatch between the ears varied across channels. Finally, a Shifted strategy imposed a similar degree of misalignment in all channels, resulting in consistent mismatch between the ears across frequency. The ability to report key words in sentences was assessed under monaural and binaural listening conditions and at signal to noise ratios (SNRs) established by estimating speech-reception thresholds in each ear alone. The SNRs ensured that the monaural performance of the left ear never exceeded that of the CI-simulation ear. The advantages of binaural integration were calculated by comparing binaural performance with monaural performance using the CI-simulation ear alone. Thus, these advantages reflected the additional use of the experimentally constrained left ear and were not attributable to better-ear listening. Binaural performance was as accurate as, or more accurate than, monaural performance with the CI-simulation ear alone. When both ears supported a similar level of monaural

  3. Encoding Specificity and Nonverbal Cue Context: An Expansion of Episodic Memory Research.

    Science.gov (United States)

    Woodall, W. Gill; Folger, Joseph P.

    1981-01-01

    Reports two studies demonstrating the ability of nonverbal contextual cues to act as retrieval mechanisms for co-occurring language. Suggests that visual contextual cues, such as speech primacy and motor primacy gestures, can access linguistic target information. Motor primacy cues are shown to act as stronger retrieval cues. (JMF)

  4. Decoding Speech With Integrated Hybrid Signals Recorded From the Human Ventral Motor Cortex

    Directory of Open Access Journals (Sweden)

    Kenji Ibayashi

    2018-04-01

    Full Text Available Restoration of speech communication for locked-in patients by means of brain computer interfaces (BCIs is currently an important area of active research. Among the neural signals obtained from intracranial recordings, single/multi-unit activity (SUA/MUA, local field potential (LFP, and electrocorticography (ECoG are good candidates for an input signal for BCIs. However, the question of which signal or which combination of the three signal modalities is best suited for decoding speech production remains unverified. In order to record SUA, LFP, and ECoG simultaneously from a highly localized area of human ventral sensorimotor cortex (vSMC, we fabricated an electrode the size of which was 7 by 13 mm containing sparsely arranged microneedle and conventional macro contacts. We determined which signal modality is the most capable of decoding speech production, and tested if the combination of these signals could improve the decoding accuracy of spoken phonemes. Feature vectors were constructed from spike frequency obtained from SUAs and event-related spectral perturbation derived from ECoG and LFP signals, then input to the decoder. The results showed that the decoding accuracy for five spoken vowels was highest when features from multiple signals were combined and optimized for each subject, and reached 59% when averaged across all six subjects. This result suggests that multi-scale signals convey complementary information for speech articulation. The current study demonstrated that simultaneous recording of multi-scale neuronal activities could raise decoding accuracy even though the recording area is limited to a small portion of cortex, which is advantageous for future implementation of speech-assisting BCIs.

  5. Visual-Haptic Integration: Cue Weights are Varied Appropriately, to Account for Changes in Haptic Reliability Introduced by Using a Tool

    Directory of Open Access Journals (Sweden)

    Chie Takahashi

    2011-10-01

    Full Text Available Tools such as pliers systematically change the relationship between an object's size and the hand opening required to grasp it. Previous work suggests the brain takes this into account, integrating visual and haptic size information that refers to the same object, independent of the similarity of the ‘raw’ visual and haptic signals (Takahashi et al., VSS 2009. Variations in tool geometry also affect the reliability (precision of haptic size estimates, however, because they alter the change in hand opening caused by a given change in object size. Here, we examine whether the brain appropriately adjusts the weights given to visual and haptic size signals when tool geometry changes. We first estimated each cue's reliability by measuring size-discrimination thresholds in vision-alone and haptics-alone conditions. We varied haptic reliability using tools with different object-size:hand-opening ratios (1:1, 0.7:1, and 1.4:1. We then measured the weights given to vision and haptics with each tool, using a cue-conflict paradigm. The weight given to haptics varied with tool type in a manner that was well predicted by the single-cue reliabilities (MLE model; Ernst and Banks, 2002. This suggests that the process of visual-haptic integration appropriately accounts for variations in haptic reliability introduced by different tool geometries.

  6. Top-Down Modulation of Auditory-Motor Integration during Speech Production: The Role of Working Memory.

    Science.gov (United States)

    Guo, Zhiqiang; Wu, Xiuqin; Li, Weifeng; Jones, Jeffery A; Yan, Nan; Sheft, Stanley; Liu, Peng; Liu, Hanjun

    2017-10-25

    Although working memory (WM) is considered as an emergent property of the speech perception and production systems, the role of WM in sensorimotor integration during speech processing is largely unknown. We conducted two event-related potential experiments with female and male young adults to investigate the contribution of WM to the neurobehavioural processing of altered auditory feedback during vocal production. A delayed match-to-sample task that required participants to indicate whether the pitch feedback perturbations they heard during vocalizations in test and sample sequences matched, elicited significantly larger vocal compensations, larger N1 responses in the left middle and superior temporal gyrus, and smaller P2 responses in the left middle and superior temporal gyrus, inferior parietal lobule, somatosensory cortex, right inferior frontal gyrus, and insula compared with a control task that did not require memory retention of the sequence of pitch perturbations. On the other hand, participants who underwent extensive auditory WM training produced suppressed vocal compensations that were correlated with improved auditory WM capacity, and enhanced P2 responses in the left middle frontal gyrus, inferior parietal lobule, right inferior frontal gyrus, and insula that were predicted by pretraining auditory WM capacity. These findings indicate that WM can enhance the perception of voice auditory feedback errors while inhibiting compensatory vocal behavior to prevent voice control from being excessively influenced by auditory feedback. This study provides the first evidence that auditory-motor integration for voice control can be modulated by top-down influences arising from WM, rather than modulated exclusively by bottom-up and automatic processes. SIGNIFICANCE STATEMENT One outstanding question that remains unsolved in speech motor control is how the mismatch between predicted and actual voice auditory feedback is detected and corrected. The present study

  7. Speech Perception as a Multimodal Phenomenon

    OpenAIRE

    Rosenblum, Lawrence D.

    2008-01-01

    Speech perception is inherently multimodal. Visual speech (lip-reading) information is used by all perceivers and readily integrates with auditory speech. Imaging research suggests that the brain treats auditory and visual speech similarly. These findings have led some researchers to consider that speech perception works by extracting amodal information that takes the same form across modalities. From this perspective, speech integration is a property of the input information itself. Amodal s...

  8. Reduced neural integration of letters and speech sounds in dyslexic children scales with individual differences in reading fluency.

    Directory of Open Access Journals (Sweden)

    Gojko Žarić

    Full Text Available The acquisition of letter-speech sound associations is one of the basic requirements for fluent reading acquisition and its failure may contribute to reading difficulties in developmental dyslexia. Here we investigated event-related potential (ERP measures of letter-speech sound integration in 9-year-old typical and dyslexic readers and specifically test their relation to individual differences in reading fluency. We employed an audiovisual oddball paradigm in typical readers (n = 20, dysfluent (n = 18 and severely dysfluent (n = 18 dyslexic children. In one auditory and two audiovisual conditions the Dutch spoken vowels/a/and/o/were presented as standard and deviant stimuli. In audiovisual blocks, the letter 'a' was presented either simultaneously (AV0, or 200 ms before (AV200 vowel sound onset. Across the three children groups, vowel deviancy in auditory blocks elicited comparable mismatch negativity (MMN and late negativity (LN responses. In typical readers, both audiovisual conditions (AV0 and AV200 led to enhanced MMN and LN amplitudes. In both dyslexic groups, the audiovisual LN effects were mildly reduced. Most interestingly, individual differences in reading fluency were correlated with MMN latency in the AV0 condition. A further analysis revealed that this effect was driven by a short-lived MMN effect encompassing only the N1 window in severely dysfluent dyslexics versus a longer MMN effect encompassing both the N1 and P2 windows in the other two groups. Our results confirm and extend previous findings in dyslexic children by demonstrating a deficient pattern of letter-speech sound integration depending on the level of reading dysfluency. These findings underscore the importance of considering individual differences across the entire spectrum of reading skills in addition to group differences between typical and dyslexic readers.

  9. Immediate integration of prosodic information from speech and visual information from pictures in the absence of focused attention: a mismatch negativity study.

    Science.gov (United States)

    Li, X; Yang, Y; Ren, G

    2009-06-16

    Language is often perceived together with visual information. Recent experimental evidences indicated that, during spoken language comprehension, the brain can immediately integrate visual information with semantic or syntactic information from speech. Here we used the mismatch negativity to further investigate whether prosodic information from speech could be immediately integrated into a visual scene context or not, and especially the time course and automaticity of this integration process. Sixteen Chinese native speakers participated in the study. The materials included Chinese spoken sentences and picture pairs. In the audiovisual situation, relative to the concomitant pictures, the spoken sentence was appropriately accented in the standard stimuli, but inappropriately accented in the two kinds of deviant stimuli. In the purely auditory situation, the speech sentences were presented without pictures. It was found that the deviants evoked mismatch responses in both audiovisual and purely auditory situations; the mismatch negativity in the purely auditory situation peaked at the same time as, but was weaker than that evoked by the same deviant speech sounds in the audiovisual situation. This pattern of results suggested immediate integration of prosodic information from speech and visual information from pictures in the absence of focused attention.

  10. Integration of literacy into speech-language therapy: a descriptive analysis of treatment practices.

    Science.gov (United States)

    Tambyraja, Sherine R; Schmitt, Mary Beth; Justice, Laura M; Logan, Jessica A R; Schwarz, Sadie

    2014-01-01

    The purpose of the present study was: (a) to examine the extent to which speech-language therapy provided to children with language disorders in the schools targets code-based literacy skills (e.g., alphabet knowledge and phonological awareness) during business-as-usual treatment sessions, and (b) to determine whether literacy-focused therapy time was associated with factors specific to children and/or speech-language pathologists (SLPs). Participants were 151 kindergarten and first-grade children and 40 SLPs. Video-recorded therapy sessions were coded to determine the amount of time that addressed literacy. Assessments of children's literacy skills were administered as well as questionnaires regarding characteristics of SLPs (e.g., service delivery, professional development). Results showed that time spent addressing code-related literacy across therapy sessions was variable. Significant predictors included SLP years of experience, therapy location, and therapy session duration, such that children receiving services from SLPs with more years of experience, and/or who utilized the classroom for therapy, received more literacy-focused time. Additionally, children in longer therapy sessions received more therapy time on literacy skills. There is considerable variability in the extent to which children received literacy-focused time in therapy; however, SLP-level factors predict time spent in literacy more than child-level factors. Further research is needed to understand the nature of literacy-focused therapy in the public schools. Readers will be able to: (a) define code-based literacy skills, (b) discuss the role that speech-language pathologists have in fostering children's literacy development, and (c) identify key factors that may currently influence the inclusion of literacy targets in school-based speech-language therapy. Copyright © 2014 Elsevier Inc. All rights reserved.

  11. The integration of prosodic speech in high functioning autism: a preliminary FMRI study.

    Directory of Open Access Journals (Sweden)

    Isabelle Hesling

    2010-07-01

    Full Text Available Autism is a neurodevelopmental disorder characterized by a specific triad of symptoms such as abnormalities in social interaction, abnormalities in communication and restricted activities and interests. While verbal autistic subjects may present a correct mastery of the formal aspects of speech, they have difficulties in prosody (music of speech, leading to communication disorders. Few behavioural studies have revealed a prosodic impairment in children with autism, and among the few fMRI studies aiming at assessing the neural network involved in language, none has specifically studied prosodic speech. The aim of the present study was to characterize specific prosodic components such as linguistic prosody (intonation, rhythm and emphasis and emotional prosody and to correlate them with the neural network underlying them.We used a behavioural test (Profiling Elements of the Prosodic System, PEPS and fMRI to characterize prosodic deficits and investigate the neural network underlying prosodic processing. Results revealed the existence of a link between perceptive and productive prosodic deficits for some prosodic components (rhythm, emphasis and affect in HFA and also revealed that the neural network involved in prosodic speech perception exhibits abnormal activation in the left SMG as compared to controls (activation positively correlated with intonation and emphasis and an absence of deactivation patterns in regions involved in the default mode.These prosodic impairments could not only result from activation patterns abnormalities but also from an inability to adequately use the strategy of the default network inhibition, both mechanisms that have to be considered for decreasing task performance in High Functioning Autism.

  12. Advanced MicroObserver UGS integration with and cueing of the BattleHawk squad level loitering munition and UAV

    Science.gov (United States)

    Steadman, Bob; Finklea, John; Kershaw, James; Loughman, Cathy; Shaffner, Patti; Frost, Dean; Deller, Sean

    2014-06-01

    Textron's Advanced MicroObserver(R) is a next generation remote unattended ground sensor system (UGS) for border security, infrastructure protection, and small combat unit security. The original MicroObserver(R) is a sophisticated seismic sensor system with multi-node fusion that supports target tracking. This system has been deployed in combat theaters. The system's seismic sensor nodes are uniquely able to be completely buried (including antennas) for optimal covertness. The advanced version adds a wireless day/night Electro-Optic Infrared (EOIR) system, cued by seismic tracking, with sophisticated target discrimination and automatic frame capture features. Also new is a field deployable Gateway configurable with a variety of radio systems and flexible networking, an important upgrade that enabled the research described herein. BattleHawkTM is a small tube launched Unmanned Air Vehicle (UAV) with a warhead. Using transmitted video from its EOIR subsystem an operator can search for and acquire a target day or night, select a target for attack, and execute terminal dive to destroy the target. It is designed as a lightweight squad level asset carried by an individual infantryman. Although BattleHawk has the best loiter time in its class, it's still relatively short compared to large UAVs. Also it's a one-shot asset in its munition configuration. Therefore Textron Defense Systems conducted research, funded internally, to determine if there was military utility in having the highly persistent MicroObserver(R) system cue BattleHawk's launch and vector it to beyond visual range targets for engagement. This paper describes that research; the system configuration implemented, and the results of field testing that was performed on a government range early in 2013. On the integrated system that was implemented, MicroObserver(R) seismic detections activated that system's camera which then automatically captured images of the target. The geo-referenced and time-tagged Micro

  13. Audio-Visual Speech Perception: A Developmental ERP Investigation

    Science.gov (United States)

    Knowland, Victoria C. P.; Mercure, Evelyne; Karmiloff-Smith, Annette; Dick, Fred; Thomas, Michael S. C.

    2014-01-01

    Being able to see a talking face confers a considerable advantage for speech perception in adulthood. However, behavioural data currently suggest that children fail to make full use of these available visual speech cues until age 8 or 9. This is particularly surprising given the potential utility of multiple informational cues during language…

  14. Word segmentation with universal prosodic cues.

    Science.gov (United States)

    Endress, Ansgar D; Hauser, Marc D

    2010-09-01

    When listening to speech from one's native language, words seem to be well separated from one another, like beads on a string. When listening to a foreign language, in contrast, words seem almost impossible to extract, as if there was only one bead on the same string. This contrast reveals that there are language-specific cues to segmentation. The puzzle, however, is that infants must be endowed with a language-independent mechanism for segmentation, as they ultimately solve the segmentation problem for any native language. Here, we approach the acquisition problem by asking whether there are language-independent cues to segmentation that might be available to even adult learners who have already acquired a native language. We show that adult learners recognize words in connected speech when only prosodic cues to word-boundaries are given from languages unfamiliar to the participants. In both artificial and natural speech, adult English speakers, with no prior exposure to the test languages, readily recognized words in natural languages with critically different prosodic patterns, including French, Turkish and Hungarian. We suggest that, even though languages differ in their sound structures, they carry universal prosodic characteristics. Further, these language-invariant prosodic cues provide a universally accessible mechanism for finding words in connected speech. These cues may enable infants to start acquiring words in any language even before they are fine-tuned to the sound structure of their native language. Copyright © 2010. Published by Elsevier Inc.

  15. The influence of masker type on early reflection processing and speech intelligibility (L)

    DEFF Research Database (Denmark)

    Arweiler, Iris; Buchholz, Jörg M.; Dau, Torsten

    2013-01-01

    Arweiler and Buchholz [J. Acoust. Soc. Am. 130, 996-1005 (2011)] showed that, while the energy of early reflections (ERs) in a room improves speech intelligibility, the benefit is smaller than that provided by the energy of the direct sound (DS). In terms of integration of ERs and DS, binaural...... listening did not provide a benefit from ERs apart from a binaural energy summation, such that monaural auditory processing could account for the data. However, a diffuse speech shaped noise (SSN) was used in the speech intelligibility experiments, which does not provide distinct binaural cues...... to the auditory system. In the present study, the monaural and binaural benefit from ERs for speech intelligibility was investigated using three directional maskers presented from 90° azimuth: a SSN, a multi-talker babble, and a reversed two-talker masker. For normal-hearing as well as hearing-impaired listeners...

  16. Contributions of speech-language therapy to the integration of individuals with Down syndrome in the workplace.

    Science.gov (United States)

    Barbosa, Talita Maria Monteiro Farias; Lima, Ivonaldo Leidson Barbosa; Alves, Giorvan Ânderson Dos Santos; Delgado, Isabelle Cahino

    2018-03-01

    To analyze the contributions of speech-language therapy in the integration of young individuals with Down syndrome (DS) into the workplace, with reference to their professionalization. A questionnaire was distributed to eight undergraduate students (tutors) who participated in a project with individuals with DS, five mothers of individuals with DS, and five employees from the institution in which the present study was conducted. The questionnaire assessed the communication, memory, behavior, social interaction, autonomy and independence of the participants with DS, called "trainees". The trainees were employed in one of five routine work sectors at the university that conducted the present study. The data collected in this descriptive and cross-sectional study were analyzed quantitatively and qualitatively. The Research Ethics Committee of the affiliated institute approved the project. Mothers and tutors rated the trainees' language skills as "good". However, their ratings differed from those of the participating employees. After the trainees with DS were placed in a work environment, significant changes were observed in their communication and autonomy. There was no improvement in the trainees' independence, but after training noticeable changes were observed in their social behavior and autonomy. Speech-language therapy during vocational training led to positive changes in the social behavior of individuals with DS, as evidenced by an increase in their autonomy and communication.

  17. Auditory-visual speech integration by prelinguistic infants: perception of an emergent consonant in the McGurk effect.

    Science.gov (United States)

    Burnham, Denis; Dodd, Barbara

    2004-12-01

    The McGurk effect, in which auditory [ba] dubbed onto [ga] lip movements is perceived as "da" or "tha," was employed in a real-time task to investigate auditory-visual speech perception in prelingual infants. Experiments 1A and 1B established the validity of real-time dubbing for producing the effect. In Experiment 2, 4 1/2-month-olds were tested in a habituation-test paradigm, in which an auditory-visual stimulus was presented contingent upon visual fixation of a live face. The experimental group was habituated to a McGurk stimulus (auditory [ba] visual [ga]), and the control group to matching auditory-visual [ba]. Each group was then presented with three auditory-only test trials, [ba], [da], and [(delta)a] (as in then). Visual-fixation durations in test trials showed that the experimental group treated the emergent percept in the McGurk effect, [da] or [(delta)a], as familiar (even though they had not heard these sounds previously) and [ba] as novel. For control group infants [da] and [(delta)a] were no more familiar than [ba]. These results are consistent with infants' perception of the McGurk effect, and support the conclusion that prelinguistic infants integrate auditory and visual speech information. Copyright 2004 Wiley Periodicals, Inc.

  18. Automatic discrimination between laughter and speech

    NARCIS (Netherlands)

    Truong, K.; Leeuwen, D. van

    2007-01-01

    Emotions can be recognized by audible paralinguistic cues in speech. By detecting these paralinguistic cues that can consist of laughter, a trembling voice, coughs, changes in the intonation contour etc., information about the speaker’s state and emotion can be revealed. This paper describes the

  19. Severe Multisensory Speech Integration Deficits in High-Functioning School-Aged Children with Autism Spectrum Disorder (ASD) and Their Resolution During Early Adolescence

    Science.gov (United States)

    Foxe, John J.; Molholm, Sophie; Del Bene, Victor A.; Frey, Hans-Peter; Russo, Natalie N.; Blanco, Daniella; Saint-Amour, Dave; Ross, Lars A.

    2015-01-01

    Under noisy listening conditions, visualizing a speaker's articulations substantially improves speech intelligibility. This multisensory speech integration ability is crucial to effective communication, and the appropriate development of this capacity greatly impacts a child's ability to successfully navigate educational and social settings. Research shows that multisensory integration abilities continue developing late into childhood. The primary aim here was to track the development of these abilities in children with autism, since multisensory deficits are increasingly recognized as a component of the autism spectrum disorder (ASD) phenotype. The abilities of high-functioning ASD children (n = 84) to integrate seen and heard speech were assessed cross-sectionally, while environmental noise levels were systematically manipulated, comparing them with age-matched neurotypical children (n = 142). Severe integration deficits were uncovered in ASD, which were increasingly pronounced as background noise increased. These deficits were evident in school-aged ASD children (5–12 year olds), but were fully ameliorated in ASD children entering adolescence (13–15 year olds). The severity of multisensory deficits uncovered has important implications for educators and clinicians working in ASD. We consider the observation that the multisensory speech system recovers substantially in adolescence as an indication that it is likely amenable to intervention during earlier childhood, with potentially profound implications for the development of social communication abilities in ASD children. PMID:23985136

  20. Speech Problems

    Science.gov (United States)

    ... Staying Safe Videos for Educators Search English Español Speech Problems KidsHealth / For Teens / Speech Problems What's in ... a person's ability to speak clearly. Some Common Speech and Language Disorders Stuttering is a problem that ...

  1. Turn-taking cue delays in human-robot communication

    NARCIS (Netherlands)

    Cuijpers, R. H.; Van Den Goor, V. J.P.

    2017-01-01

    Fluent communication between a human and a robot relies on the use of effective turn-taking cues. In human speech staying silent after a sequence of utterances is usually accompanied by an explicit turnyielding cue to signal the end of a turn. Here we study the effect of the timing of four

  2. Deficits in audiovisual speech perception in normal aging emerge at the level of whole-word recognition.

    Science.gov (United States)

    Stevenson, Ryan A; Nelms, Caitlin E; Baum, Sarah H; Zurkovsky, Lilia; Barense, Morgan D; Newhouse, Paul A; Wallace, Mark T

    2015-01-01

    Over the next 2 decades, a dramatic shift in the demographics of society will take place, with a rapid growth in the population of older adults. One of the most common complaints with healthy aging is a decreased ability to successfully perceive speech, particularly in noisy environments. In such noisy environments, the presence of visual speech cues (i.e., lip movements) provide striking benefits for speech perception and comprehension, but previous research suggests that older adults gain less from such audiovisual integration than their younger peers. To determine at what processing level these behavioral differences arise in healthy-aging populations, we administered a speech-in-noise task to younger and older adults. We compared the perceptual benefits of having speech information available in both the auditory and visual modalities and examined both phoneme and whole-word recognition across varying levels of signal-to-noise ratio. For whole-word recognition, older adults relative to younger adults showed greater multisensory gains at intermediate SNRs but reduced benefit at low SNRs. By contrast, at the phoneme level both younger and older adults showed approximately equivalent increases in multisensory gain as signal-to-noise ratio decreased. Collectively, the results provide important insights into both the similarities and differences in how older and younger adults integrate auditory and visual speech cues in noisy environments and help explain some of the conflicting findings in previous studies of multisensory speech perception in healthy aging. These novel findings suggest that audiovisual processing is intact at more elementary levels of speech perception in healthy-aging populations and that deficits begin to emerge only at the more complex word-recognition level of speech signals. Copyright © 2015 Elsevier Inc. All rights reserved.

  3. Temporal visual cues aid speech recognition

    DEFF Research Database (Denmark)

    Zhou, Xiang; Ross, Lars; Lehn-Schiøler, Tue

    2006-01-01

    of audio to generate an artificial talking-face video and measured word recognition performance on simple monosyllabic words. RESULTS: When presenting words together with the artificial video we find that word recognition is improved over purely auditory presentation. The effect is significant (p......BACKGROUND: It is well known that under noisy conditions, viewing a speaker's articulatory movement aids the recognition of spoken words. Conventionally it is thought that the visual input disambiguates otherwise confusing auditory input. HYPOTHESIS: In contrast we hypothesize...... that it is the temporal synchronicity of the visual input that aids parsing of the auditory stream. More specifically, we expected that purely temporal information, which does not convey information such as place of articulation may facility word recognition. METHODS: To test this prediction we used temporal features...

  4. Impact of a PACS/RIS-integrated speech recognition system on radiology reporting time and report availability

    International Nuclear Information System (INIS)

    Trumm, C.G.; Glaser, C.; Paasche, V.; Kuettner, B.; Francke, M.; Nissen-Meyer, S.; Reiser, M.; Crispin, A.; Popp, P.

    2006-01-01

    Purpose: Quantification of the impact of a PACS/RIS-integrated speech recognition system (SRS) on the time expenditure for radiology reporting and on hospital-wide report availability (RA) in a university institution. Material and Methods: In a prospective pilot study, the following parameters were assessed for 669 radiographic examinations (CR): 1. time requirement per report dictation (TED: dictation time (s)/number of images [examination] x number of words [report]) with either a combination of PACS/tape-based dictation (TD: analog dictation device/minicassette/transcription) or PACS/RIS/speech recognition system (RR: remote recognition/transcription and OR: online recognition/self-correction by radiologist), respectively, and 2. the Report Turnaround Time (RTT) as the time interval from the entry of the first image into the PACS to the available RIS/HIS report. Two equal time periods were chosen retrospectively from the RIS database: 11/2002-2/2003 (only TD) and 11/2003-2/2004 (only RR or OR with speech recognition system [SRS]). The midterm (≥24 h, 24 h intervals) and short-term (< 24 h, 1 h intervals), RA after examination completion were calculated for all modalities and for Cr, CT, MR and XA/DS separately. The relative increase in the mid-term RA (RIMRA: related to total number of examinations in each time period) and increase in the short-term RA (ISRA: ratio of available reports during the 1st to 24th hour) were calculated. Results: Prospectively, there was a significant difference between TD/RR/OR (n=151/257/261) regarding mean TED (0.44/0.54/0.62 s [per word and image]) and mean RTT (10.47/6.65/1.27 h), respectively. Retrospectively, 37 898/39 680 reports were computed from the RIS database for the time periods of 11/2002-2/2003 and 11/2003-2/2004. For CR/CT there was a shift of the short-term RA to the first 6 hours after examination completion (mean cumulative RA 20% higher) with a more than three-fold increase in the total number of available

  5. Perception and the temporal properties of speech

    Science.gov (United States)

    Gordon, Peter C.

    1991-11-01

    Four experiments addressing the role of attention in phonetic perception are reported. The first experiment shows that the relative importance of two cues to the voicing distinction changes when subjects must perform an arithmetic distractor task at the same time as identifying a speech stimulus. The voice onset time cue loses phonetic significance when subjects are distracted, while the F0 onset frequency cue does not. The second experiment shows a similar pattern for two cues to the distinction between the vowels /i/ (as in 'beat') and /I/ (as in 'bit'). Together these experiments indicate that careful attention to speech perception is necessary for strong acoustic cues to achieve their full phonetic impact, while weaker acoustic cues achieve their full phonetic impact without close attention. Experiment 3 shows that this pattern is obtained when the distractor task places little demand on verbal short term memory. Experiment 4 provides a large data set for testing formal models of the role of attention in speech perception. Attention is shown to influence the signal to noise ratio in phonetic encoding. This principle is instantiated in a network model in which the role of attention is to reduce noise in the phonetic encoding of acoustic cues. Implications of this work for understanding speech perception and general theories of the role of attention in perception are discussed.

  6. Online integration of information from speech and gesture: Insights from event related potentials

    NARCIS (Netherlands)

    Özyürek, A.; Willems, R.M.; Kita, S.; Hagoort, P.

    2007-01-01

    During language comprehension, listeners use the global semantic representation from previous sentence or discourse context to immediately integrate the meaning of each upcoming word into the unfolding message-level representation. Here we investigate whether communicative gestures that often

  7. More equal than others: Equity norms as an integration of cognitive heuristics and contextual cues in bargaining games.

    Science.gov (United States)

    Civai, Claudia; Rumiati, Raffaella Ida; Rustichini, Aldo

    2013-09-01

    Behavior in one-shot bargaining games, like the Ultimatum Game (UG), has been interpreted as an expression of social preferences, such as inequity aversion and negative reciprocity; however, the traditional UG design limits the range of possible psychological interpretation of the results. Here, we employed three different designs for ultimatum games, finding support for a more comprehensive theory: behavior is driven by cognitive factors implementing rules such as equal splitting, speaking up for the idea that equity works as a cognitive heuristic, applicable when the environment provides no reason to behave otherwise. Instead subjects deviate from this rule when environment changes, as, for instance, when personal interest is at stake. Results show that behavior varies systematically with contextual cues, balancing the self-interest with the automatic application of the equity heuristic. Thus, the context suggests the rule to be applied in a specific situation. Copyright © 2013 Elsevier B.V. All rights reserved.

  8. Exploring the role of brain oscillations in speech perception in noise: Intelligibility of isochronously retimed speech

    Directory of Open Access Journals (Sweden)

    Vincent Aubanel

    2016-08-01

    Full Text Available A growing body of evidence shows that brain oscillations track speech. This mechanism is thought to maximise processing efficiency by allocating resources to important speech information, effectively parsing speech into units of appropriate granularity for further decoding. However, some aspects of this mechanism remain unclear. First, while periodicity is an intrinsic property of this physiological mechanism, speech is only quasi-periodic, so it is not clear whether periodicity would present an advantage in processing. Second, it is still a matter of debate which aspect of speech triggers or maintains cortical entrainment, from bottom-up cues such as fluctuations of the amplitude envelope of speech to higher level linguistic cues such as syntactic structure. We present data from a behavioural experiment assessing the effect of isochronous retiming of speech on speech perception in noise. Two types of anchor points were defined for retiming speech, namely syllable onsets and amplitude envelope peaks. For each anchor point type, retiming was implemented at two hierarchical levels, a slow time scale around 2.5 Hz and a fast time scale around 4 Hz. Results show that while any temporal distortion resulted in reduced speech intelligibility, isochronous speech anchored to P-centers (approximated by stressed syllable vowel onsets was significantly more intelligible than a matched anisochronous retiming, suggesting a facilitative role of periodicity defined on linguistically motivated units in processing speech in noise.

  9. Speech Compression

    Directory of Open Access Journals (Sweden)

    Jerry D. Gibson

    2016-06-01

    Full Text Available Speech compression is a key technology underlying digital cellular communications, VoIP, voicemail, and voice response systems. We trace the evolution of speech coding based on the linear prediction model, highlight the key milestones in speech coding, and outline the structures of the most important speech coding standards. Current challenges, future research directions, fundamental limits on performance, and the critical open problem of speech coding for emergency first responders are all discussed.

  10. Tackling the complexity in speech

    DEFF Research Database (Denmark)

    section includes four carefully selected chapters. They deal with facets of speech production, speech acoustics, and/or speech perception or recognition, place them in an integrated phonetic-phonological perspective, and relate them in more or less explicit ways to aspects of speech technology. Therefore......, we hope that this volume can help speech scientists with traditional training in phonetics and phonology to keep up with the latest developments in speech technology. In the opposite direction, speech researchers starting from a technological perspective will hopefully get inspired by reading about...... the questions, phenomena, and communicative functions that are currently addressed in phonetics and phonology. Either way, the future of speech research lies in international, interdisciplinary collaborations, and our volume is meant to reflect and facilitate such collaborations...

  11. Causal inference and temporal predictions in audiovisual perception of speech and music.

    Science.gov (United States)

    Noppeney, Uta; Lee, Hwee Ling

    2018-03-31

    To form a coherent percept of the environment, the brain must integrate sensory signals emanating from a common source but segregate those from different sources. Temporal regularities are prominent cues for multisensory integration, particularly for speech and music perception. In line with models of predictive coding, we suggest that the brain adapts an internal model to the statistical regularities in its environment. This internal model enables cross-sensory and sensorimotor temporal predictions as a mechanism to arbitrate between integration and segregation of signals from different senses. © 2018 New York Academy of Sciences.

  12. Applications in accessibility of text-to-speech synthesis for South African languages: Initial system integration and user engagement

    CSIR Research Space (South Africa)

    Schlünz, Georg I

    2017-09-01

    Full Text Available with little or no functional speech to speak out loud. Screen readers and accessible e-books allow a print-disabled (visually-impaired, partially-sighted or dyslexic) individual to read text material by listening to audio versions. Text-to-speech synthesis...

  13. NPR-9, a Galanin-Like G-Protein Coupled Receptor, and GLR-1 Regulate Interneuronal Circuitry Underlying Multisensory Integration of Environmental Cues in Caenorhabditis elegans.

    Directory of Open Access Journals (Sweden)

    Jason C Campbell

    2016-05-01

    Full Text Available C. elegans inhabit environments that require detection of diverse stimuli to modulate locomotion in order to avoid unfavourable conditions. In a mammalian context, a failure to appropriately integrate environmental signals can lead to Parkinson's, Alzheimer's, and epilepsy. Provided that the circuitry underlying mammalian sensory integration can be prohibitively complex, we analyzed nematode behavioral responses in differing environmental contexts to evaluate the regulation of context dependent circuit reconfiguration and sensorimotor control. Our work has added to the complexity of a known parallel circuit, mediated by interneurons AVA and AIB, that integrates sensory cues and is responsible for the initiation of backwards locomotion. Our analysis of the galanin-like G-protein coupled receptor NPR-9 in C. elegans revealed that upregulation of galanin signaling impedes the integration of sensory evoked neuronal signals. Although the expression pattern of npr-9 is limited to AIB, upregulation of the receptor appears to impede AIB and AVA circuits to broadly prevent backwards locomotion, i.e. reversals, suggesting that these two pathways functionally interact. Galanin signaling similarly plays a broadly inhibitory role in mammalian models. Moreover, our identification of a mutant, which rarely initiates backwards movement, allowed us to interrogate locomotory mechanisms underlying chemotaxis. In support of the pirouette model of chemotaxis, organisms that did not exhibit reversal behavior were unable to navigate towards an attractant peak. We also assessed ionotropic glutamate receptor GLR-1 cell-specifically within AIB and determined that GLR-1 fine-tunes AIB activity to modify locomotion following reversal events. Our research highlights that signal integration underlying the initiation and fine-tuning of backwards locomotion is AIB and NPR-9 dependent, and has demonstrated the suitability of C. elegans for analysis of multisensory integration

  14. Treating speech subsystems in childhood apraxia of speech with tactual input: the PROMPT approach.

    Science.gov (United States)

    Dale, Philip S; Hayden, Deborah A

    2013-11-01

    Prompts for Restructuring Oral Muscular Phonetic Targets (PROMPT; Hayden, 2004; Hayden, Eigen, Walker, & Olsen, 2010)-a treatment approach for the improvement of speech sound disorders in children-uses tactile-kinesthetic- proprioceptive (TKP) cues to support and shape movements of the oral articulators. No research to date has systematically examined the efficacy of PROMPT for children with childhood apraxia of speech (CAS). Four children (ages 3;6 [years;months] to 4;8), all meeting the American Speech-Language-Hearing Association (2007) criteria for CAS, were treated using PROMPT. All children received 8 weeks of 2 × per week treatment, including at least 4 weeks of full PROMPT treatment that included TKP cues. During the first 4 weeks, 2 of the 4 children received treatment that included all PROMPT components except TKP cues. This design permitted both between-subjects and within-subjects comparisons to evaluate the effect of TKP cues. Gains in treatment were measured by standardized tests and by criterion-referenced measures based on the production of untreated probe words, reflecting change in speech movements and auditory perceptual accuracy. All 4 children made significant gains during treatment, but measures of motor speech control and untreated word probes provided evidence for more gain when TKP cues were included. PROMPT as a whole appears to be effective for treating children with CAS, and the inclusion of TKP cues appears to facilitate greater effect.

  15. Tuning Neural Phase Entrainment to Speech.

    Science.gov (United States)

    Falk, Simone; Lanzilotti, Cosima; Schön, Daniele

    2017-08-01

    Musical rhythm positively impacts on subsequent speech processing. However, the neural mechanisms underlying this phenomenon are so far unclear. We investigated whether carryover effects from a preceding musical cue to a speech stimulus result from a continuation of neural phase entrainment to periodicities that are present in both music and speech. Participants listened and memorized French metrical sentences that contained (quasi-)periodic recurrences of accents and syllables. Speech stimuli were preceded by a rhythmically regular or irregular musical cue. Our results show that the presence of a regular cue modulates neural response as estimated by EEG power spectral density, intertrial coherence, and source analyses at critical frequencies during speech processing compared with the irregular condition. Importantly, intertrial coherences for regular cues were indicative of the participants' success in memorizing the subsequent speech stimuli. These findings underscore the highly adaptive nature of neural phase entrainment across fundamentally different auditory stimuli. They also support current models of neural phase entrainment as a tool of predictive timing and attentional selection across cognitive domains.

  16. Assessing the contribution of binaural cues for apparent source width perception via a functional model

    DEFF Research Database (Denmark)

    Käsbach, Johannes; Hahmann, Manuel; May, Tobias

    2016-01-01

    In echoic conditions, sound sources are not perceived as point sources but appear to be expanded. The expansion in the horizontal dimension is referred to as apparent source width (ASW). To elicit this perception, the auditory system has access to fluctuations of binaural cues, the interaural time...... a statistical representation of ITDs and ILDs based on percentiles integrated over time and frequency. The model’s performance was evaluated against psychoacoustic data obtained with noise, speech and music signals in loudspeakerbased experiments. A robust model prediction of ASW was achieved using a cross...

  17. Speech Matters

    DEFF Research Database (Denmark)

    Hasse Jørgensen, Stina

    2011-01-01

    About Speech Matters - Katarina Gregos, the Greek curator's exhibition at the Danish Pavillion, the Venice Biannual 2011.......About Speech Matters - Katarina Gregos, the Greek curator's exhibition at the Danish Pavillion, the Venice Biannual 2011....

  18. Audiovisual Asynchrony Detection in Human Speech

    Science.gov (United States)

    Maier, Joost X.; Di Luca, Massimiliano; Noppeney, Uta

    2011-01-01

    Combining information from the visual and auditory senses can greatly enhance intelligibility of natural speech. Integration of audiovisual speech signals is robust even when temporal offsets are present between the component signals. In the present study, we characterized the temporal integration window for speech and nonspeech stimuli with…

  19. Speech-to-Speech Relay Service

    Science.gov (United States)

    Consumer Guide Speech to Speech Relay Service Speech-to-Speech (STS) is one form of Telecommunications Relay Service (TRS). TRS is a service that allows persons with hearing and speech disabilities ...

  20. Apraxia of Speech

    Science.gov (United States)

    ... Health Info » Voice, Speech, and Language Apraxia of Speech On this page: What is apraxia of speech? ... about apraxia of speech? What is apraxia of speech? Apraxia of speech (AOS)—also known as acquired ...

  1. Audio-visual speech perception in infants and toddlers with Down syndrome, fragile X syndrome, and Williams syndrome.

    Science.gov (United States)

    D'Souza, Dean; D'Souza, Hana; Johnson, Mark H; Karmiloff-Smith, Annette

    2016-08-01

    Typically-developing (TD) infants can construct unified cross-modal percepts, such as a speaking face, by integrating auditory-visual (AV) information. This skill is a key building block upon which higher-level skills, such as word learning, are built. Because word learning is seriously delayed in most children with neurodevelopmental disorders, we assessed the hypothesis that this delay partly results from a deficit in integrating AV speech cues. AV speech integration has rarely been investigated in neurodevelopmental disorders, and never previously in infants. We probed for the McGurk effect, which occurs when the auditory component of one sound (/ba/) is paired with the visual component of another sound (/ga/), leading to the perception of an illusory third sound (/da/ or /tha/). We measured AV integration in 95 infants/toddlers with Down, fragile X, or Williams syndrome, whom we matched on Chronological and Mental Age to 25 TD infants. We also assessed a more basic AV perceptual ability: sensitivity to matching vs. mismatching AV speech stimuli. Infants with Williams syndrome failed to demonstrate a McGurk effect, indicating poor AV speech integration. Moreover, while the TD children discriminated between matching and mismatching AV stimuli, none of the other groups did, hinting at a basic deficit or delay in AV speech processing, which is likely to constrain subsequent language development. Copyright © 2016 Elsevier Inc. All rights reserved.

  2. A Randomized Controlled Trial on The Beneficial Effects of Training Letter-Speech Sound Integration on Reading Fluency in Children with Dyslexia.

    Directory of Open Access Journals (Sweden)

    Gorka Fraga González

    Full Text Available A recent account of dyslexia assumes that a failure to develop automated letter-speech sound integration might be responsible for the observed lack of reading fluency. This study uses a pre-test-training-post-test design to evaluate the effects of a training program based on letter-speech sound associations with a special focus on gains in reading fluency. A sample of 44 children with dyslexia and 23 typical readers, aged 8 to 9, was recruited. Children with dyslexia were randomly allocated to either the training program group (n = 23 or a waiting-list control group (n = 21. The training intensively focused on letter-speech sound mapping and consisted of 34 individual sessions of 45 minutes over a five month period. The children with dyslexia showed substantial reading gains for the main word reading and spelling measures after training, improving at a faster rate than typical readers and waiting-list controls. The results are interpreted within the conceptual framework assuming a multisensory integration deficit as the most proximal cause of dysfluent reading in dyslexia.ISRCTN register ISRCTN12783279.

  3. Introductory speeches

    International Nuclear Information System (INIS)

    2001-01-01

    This CD is multimedia presentation of programme safety upgrading of Bohunice V1 NPP. This chapter consist of introductory commentary and 4 introductory speeches (video records): (1) Introductory speech of Vincent Pillar, Board chairman and director general of Slovak electric, Plc. (SE); (2) Introductory speech of Stefan Schmidt, director of SE - Bohunice Nuclear power plants; (3) Introductory speech of Jan Korec, Board chairman and director general of VUJE Trnava, Inc. - Engineering, Design and Research Organisation, Trnava; Introductory speech of Dietrich Kuschel, Senior vice-president of FRAMATOME ANP Project and Engineering

  4. A causal test of the motor theory of speech perception: a case of impaired speech production and spared speech perception.

    Science.gov (United States)

    Stasenko, Alena; Bonn, Cory; Teghipco, Alex; Garcea, Frank E; Sweet, Catherine; Dombovy, Mary; McDonough, Joyce; Mahon, Bradford Z

    2015-01-01

    The debate about the causal role of the motor system in speech perception has been reignited by demonstrations that motor processes are engaged during the processing of speech sounds. Here, we evaluate which aspects of auditory speech processing are affected, and which are not, in a stroke patient with dysfunction of the speech motor system. We found that the patient showed a normal phonemic categorical boundary when discriminating two non-words that differ by a minimal pair (e.g., ADA-AGA). However, using the same stimuli, the patient was unable to identify or label the non-word stimuli (using a button-press response). A control task showed that he could identify speech sounds by speaker gender, ruling out a general labelling impairment. These data suggest that while the motor system is not causally involved in perception of the speech signal, it may be used when other cues (e.g., meaning, context) are not available.

  5. Familiar units prevail over statistical cues in word segmentation.

    Science.gov (United States)

    Poulin-Charronnat, Bénédicte; Perruchet, Pierre; Tillmann, Barbara; Peereman, Ronald

    2017-09-01

    In language acquisition research, the prevailing position is that listeners exploit statistical cues, in particular transitional probabilities between syllables, to discover words of a language. However, other cues are also involved in word discovery. Assessing the weight learners give to these different cues leads to a better understanding of the processes underlying speech segmentation. The present study evaluated whether adult learners preferentially used known units or statistical cues for segmenting continuous speech. Before the exposure phase, participants were familiarized with part-words of a three-word artificial language. This design allowed the dissociation of the influence of statistical cues and familiar units, with statistical cues favoring word segmentation and familiar units favoring (nonoptimal) part-word segmentation. In Experiment 1, performance in a two-alternative forced choice (2AFC) task between words and part-words revealed part-word segmentation (even though part-words were less cohesive in terms of transitional probabilities and less frequent than words). By contrast, an unfamiliarized group exhibited word segmentation, as usually observed in standard conditions. Experiment 2 used a syllable-detection task to remove the likely contamination of performance by memory and strategy effects in the 2AFC task. Overall, the results suggest that familiar units overrode statistical cues, ultimately questioning the need for computation mechanisms of transitional probabilities (TPs) in natural language speech segmentation.

  6. Do gender differences in audio-visual benefit and visual influence in audio-visual speech perception emerge with age?

    Directory of Open Access Journals (Sweden)

    Magnus eAlm

    2015-07-01

    Full Text Available Gender and age have been found to affect adults’ audio-visual (AV speech perception. However, research on adult aging focuses on adults over 60 years, who have an increasing likelihood for cognitive and sensory decline, which may confound positive effects of age-related AV-experience and its interaction with gender. Observed age and gender differences in AV speech perception may also depend on measurement sensitivity and AV task difficulty. Consequently both AV benefit and visual influence were used to measure visual contribution for gender-balanced groups of young (20-30 years and middle-aged adults (50-60 years with task difficulty varied using AV syllables from different talkers in alternative auditory backgrounds. Females had better speech-reading performance than males. Whereas no gender differences in AV benefit or visual influence were observed for young adults, visually influenced responses were significantly greater for middle-aged females than middle-aged males. That speech-reading performance did not influence AV benefit may be explained by visual speech extraction and AV integration constituting independent abilities. Contrastingly, the gender difference in visually influenced responses in middle adulthood may reflect an experience-related shift in females’ general AV perceptual strategy. Although young females’ speech-reading proficiency may not readily contribute to greater visual influence, between young and middle-adulthood recurrent confirmation of the contribution of visual cues induced by speech-reading proficiency may gradually shift females AV perceptual strategy towards more visually dominated responses.

  7. Emotionally conditioning the target-speech voice enhances recognition of the target speech under "cocktail-party" listening conditions.

    Science.gov (United States)

    Lu, Lingxi; Bao, Xiaohan; Chen, Jing; Qu, Tianshu; Wu, Xihong; Li, Liang

    2018-05-01

    Under a noisy "cocktail-party" listening condition with multiple people talking, listeners can use various perceptual/cognitive unmasking cues to improve recognition of the target speech against informational speech-on-speech masking. One potential unmasking cue is the emotion expressed in a speech voice, by means of certain acoustical features. However, it was unclear whether emotionally conditioning a target-speech voice that has none of the typical acoustical features of emotions (i.e., an emotionally neutral voice) can be used by listeners for enhancing target-speech recognition under speech-on-speech masking conditions. In this study we examined the recognition of target speech against a two-talker speech masker both before and after the emotionally neutral target voice was paired with a loud female screaming sound that has a marked negative emotional valence. The results showed that recognition of the target speech (especially the first keyword in a target sentence) was significantly improved by emotionally conditioning the target speaker's voice. Moreover, the emotional unmasking effect was independent of the unmasking effect of the perceived spatial separation between the target speech and the masker. Also, (skin conductance) electrodermal responses became stronger after emotional learning when the target speech and masker were perceptually co-located, suggesting an increase of listening efforts when the target speech was informationally masked. These results indicate that emotionally conditioning the target speaker's voice does not change the acoustical parameters of the target-speech stimuli, but the emotionally conditioned vocal features can be used as cues for unmasking target speech.

  8. Testing the influence of external and internal cues on smoking motivation using a community sample.

    Science.gov (United States)

    Litvin, Erika B; Brandon, Thomas H

    2010-02-01

    Exposing smokers to either external cues (e.g., pictures of cigarettes) or internal cues (e.g., negative affect induction) can induce urge to smoke and other behavioral and physiological responses. However, little is known about whether the two types of cues interact when presented in close proximity, as is likely the case in the real word. Additionally, potential moderators of cue reactivity have rarely been examined. Finally, few cue-reactivity studies have used representative samples of smokers. In a randomized 2 x 2 crossed factorial between-subjects design, the current study tested the effects of a negative affect cue intended to produce anxiety (speech preparation task) and an external smoking cue on urge and behavioral reactivity in a community sample of adult smokers (N = 175), and whether trait impulsivity moderated the effects. Both types of cues produced main effects on urges to smoke, despite the speech task failing to increase anxiety significantly. The speech task increased smoking urge related to anticipation of negative affect relief, whereas the external smoking cues increased urges related to anticipation of pleasure; however, the cues did not interact. Impulsivity measures predicted urge and other smoking-related variables, but did not moderate cue-reactivity. Results suggest independent rather than synergistic effects of these contributors to smoking motivation. (PsycINFO Database Record (c) 2010 APA, all rights reserved).

  9. SPEECH ACT ANALYSIS OF IGBO UTTERANCES IN FUNERAL ...

    African Journals Online (AJOL)

    Dean SPGS NAU

    In other words, a speech act is a .... relationship with that one single person and to share those memories ... identifies four conditions or rules for the effective performance of a ... In other words, the rules establish a system for the ... 54 shaped by the interplay of particular speech acts and non verbal cues. ..... Retrieved from.

  10. Predicting Intelligibility Gains in Dysarthria through Automated Speech Feature Analysis

    Science.gov (United States)

    Fletcher, Annalise R.; Wisler, Alan A.; McAuliffe, Megan J.; Lansford, Kaitlin L.; Liss, Julie M.

    2017-01-01

    Purpose: Behavioral speech modifications have variable effects on the intelligibility of speakers with dysarthria. In the companion article, a significant relationship was found between measures of speakers' baseline speech and their intelligibility gains following cues to speak louder and reduce rate (Fletcher, McAuliffe, Lansford, Sinex, &…

  11. Auditory, Visual and Audiovisual Speech Processing Streams in Superior Temporal Sulcus.

    Science.gov (United States)

    Venezia, Jonathan H; Vaden, Kenneth I; Rong, Feng; Maddox, Dale; Saberi, Kourosh; Hickok, Gregory

    2017-01-01

    The human superior temporal sulcus (STS) is responsive to visual and auditory information, including sounds and facial cues during speech recognition. We investigated the functional organization of STS with respect to modality-specific and multimodal speech representations. Twenty younger adult participants were instructed to perform an oddball detection task and were presented with auditory, visual, and audiovisual speech stimuli, as well as auditory and visual nonspeech control stimuli in a block fMRI design. Consistent with a hypothesized anterior-posterior processing gradient in STS, auditory, visual and audiovisual stimuli produced the largest BOLD effects in anterior, posterior and middle STS (mSTS), respectively, based on whole-brain, linear mixed effects and principal component analyses. Notably, the mSTS exhibited preferential responses to multisensory stimulation, as well as speech compared to nonspeech. Within the mid-posterior and mSTS regions, response preferences changed gradually from visual, to multisensory, to auditory moving posterior to anterior. Post hoc analysis of visual regions in the posterior STS revealed that a single subregion bordering the mSTS was insensitive to differences in low-level motion kinematics yet distinguished between visual speech and nonspeech based on multi-voxel activation patterns. These results suggest that auditory and visual speech representations are elaborated gradually within anterior and posterior processing streams, respectively, and may be integrated within the mSTS, which is sensitive to more abstract speech information within and across presentation modalities. The spatial organization of STS is consistent with processing streams that are hypothesized to synthesize perceptual speech representations from sensory signals that provide convergent information from visual and auditory modalities.

  12. Recognizing intentions in infant-directed speech: evidence for universals.

    Science.gov (United States)

    Bryant, Gregory A; Barrett, H Clark

    2007-08-01

    In all languages studied to date, distinct prosodic contours characterize different intention categories of infant-directed (ID) speech. This vocal behavior likely exists universally as a species-typical trait, but little research has examined whether listeners can accurately recognize intentions in ID speech using only vocal cues, without access to semantic information. We recorded native-English-speaking mothers producing four intention categories of utterances (prohibition, approval, comfort, and attention) as both ID and adult-directed (AD) speech, and we then presented the utterances to Shuar adults (South American hunter-horticulturalists). Shuar subjects were able to reliably distinguish ID from AD speech and were able to reliably recognize the intention categories in both types of speech, although performance was significantly better with ID speech. This is the first demonstration that adult listeners in an indigenous, nonindustrialized, and nonliterate culture can accurately infer intentions from both ID speech and AD speech in a language they do not speak.

  13. Speech coding

    Energy Technology Data Exchange (ETDEWEB)

    Ravishankar, C., Hughes Network Systems, Germantown, MD

    1998-05-08

    Speech is the predominant means of communication between human beings and since the invention of the telephone by Alexander Graham Bell in 1876, speech services have remained to be the core service in almost all telecommunication systems. Original analog methods of telephony had the disadvantage of speech signal getting corrupted by noise, cross-talk and distortion Long haul transmissions which use repeaters to compensate for the loss in signal strength on transmission links also increase the associated noise and distortion. On the other hand digital transmission is relatively immune to noise, cross-talk and distortion primarily because of the capability to faithfully regenerate digital signal at each repeater purely based on a binary decision. Hence end-to-end performance of the digital link essentially becomes independent of the length and operating frequency bands of the link Hence from a transmission point of view digital transmission has been the preferred approach due to its higher immunity to noise. The need to carry digital speech became extremely important from a service provision point of view as well. Modem requirements have introduced the need for robust, flexible and secure services that can carry a multitude of signal types (such as voice, data and video) without a fundamental change in infrastructure. Such a requirement could not have been easily met without the advent of digital transmission systems, thereby requiring speech to be coded digitally. The term Speech Coding is often referred to techniques that represent or code speech signals either directly as a waveform or as a set of parameters by analyzing the speech signal. In either case, the codes are transmitted to the distant end where speech is reconstructed or synthesized using the received set of codes. A more generic term that is applicable to these techniques that is often interchangeably used with speech coding is the term voice coding. This term is more generic in the sense that the

  14. Mild developmental foreign accent syndrome and psychiatric comorbidity: Altered white matter integrity in speech and emotion regulation networks

    Directory of Open Access Journals (Sweden)

    Marcelo L Berthier

    2016-08-01

    Full Text Available Foreign accent syndrome (FAS is a speech disorder that is defined by the emergence of a peculiar manner of articulation and intonation which is perceived as foreign. In most cases of acquired FAS (AFAS the new accent is secondary to small focal lesions involving components of the bilaterally distributed neural network for speech production. In the past few years FAS has also been described in different psychiatric conditions (conversion disorder, bipolar disorder, schizophrenia as well as in developmental disorders (specific language impairment, apraxia of speech. In the present study, two adult males, one with atypical phonetic production and the other one with cluttering, reported having developmental FAS (DFAS since their adolescence. Perceptual analysis by naïve judges could not confirm the presence of foreign accent, possibly due to the mildness of the speech disorder. However, detailed linguistic analysis provided evidence of prosodic and segmental errors previously reported in AFAS cases. Cognitive testing showed reduced communication in activities of daily living and mild deficits related to psychiatric disorders. Psychiatric evaluation revealed long-lasting internalizing disorders (neuroticism, anxiety, obsessive-compulsive disorder, social phobia, depression, alexithymia, hopelessness, and apathy in both subjects. Diffusion tensor imaging (DTI data from each subject with DFAS were compared with data from a group of 21 age- and gender-matched healthy control subjects. Diffusion parameters (MD, AD, and RD in predefined regions of interest showed changes of white matter microstructure in regions previously related with AFAS and psychiatric disorders. In conclusion, the present findings militate against the possibility that these two subjects have FAS of psychogenic origin. Rather, our findings provide evidence that mild DFAS occurring in the context of subtle, yet persistent, developmental speech disorders may be associated with

  15. Prosodic cues to word order: what level of representation?

    Directory of Open Access Journals (Sweden)

    Carline eBernard

    2012-10-01

    Full Text Available Within language, systematic correlations exist between syntactic structure and prosody. Prosodic prominence, for instance, falls on the complement and not the head of syntactic phrases, and its realization depends on the phrasal position of the prominent element. Thus, in Japanese, a functor-final language, prominence is phrase-initial and realized as increased pitch (^Tōkyō ni ‘Tokyo to’, whereas in French, English or Italian, functor-initial languages, it manifests itself as phrase-final lengthening (to Rome. Prosody is readily available in the linguistic signal even to the youngest infants. It has, therefore, been proposed that young learners might be able to exploit its correlations with syntax to bootstrap language structure. In this study, we tested this hypothesis, investigating how 8-month-old monolingual French infants processed an artificial grammar manipulating the relative position of prosodic prominence and word frequency. In Condition 1, we created a speech stream in which the two cues, prosody and frequency, were aligned, frequent words being prosodically non-prominent and infrequent ones being prominent, as is the case in natural language (functors are prosodically minimal compared to content words. In Condition 2, the two cues were misaligned, with frequent words carrying prosodic prominence, unlike in natural language. After familiarization with the aligned or the misaligned stream in a headturn preference procedure, we tested infants’ preference for test items having a frequent word initial or a frequent word final word order. We found that infants’ familiarized with the aligned stream showed the expected preference for the frequent word initial test items, mimicking the functor-initial word order of French. Infants in the misaligned condition showed no preference. These results suggest that infants are able to use word frequency and prosody as early cues to word order and they integrate them into a coherent

  16. Neural pathways for visual speech perception

    Directory of Open Access Journals (Sweden)

    Lynne E Bernstein

    2014-12-01

    Full Text Available This paper examines the questions, what levels of speech can be perceived visually, and how is visual speech represented by the brain? Review of the literature leads to the conclusions that every level of psycholinguistic speech structure (i.e., phonetic features, phonemes, syllables, words, and prosody can be perceived visually, although individuals differ in their abilities to do so; and that there are visual modality-specific representations of speech qua speech in higher-level vision brain areas. That is, the visual system represents the modal patterns of visual speech. The suggestion that the auditory speech pathway receives and represents visual speech is examined in light of neuroimaging evidence on the auditory speech pathways. We outline the generally agreed-upon organization of the visual ventral and dorsal pathways and examine several types of visual processing that might be related to speech through those pathways, specifically, face and body, orthography, and sign language processing. In this context, we examine the visual speech processing literature, which reveals widespread diverse patterns activity in posterior temporal cortices in response to visual speech stimuli. We outline a model of the visual and auditory speech pathways and make several suggestions: (1 The visual perception of speech relies on visual pathway representations of speech qua speech. (2 A proposed site of these representations, the temporal visual speech area (TVSA has been demonstrated in posterior temporal cortex, ventral and posterior to multisensory posterior superior temporal sulcus (pSTS. (3 Given that visual speech has dynamic and configural features, its representations in feedforward visual pathways are expected to integrate these features, possibly in TVSA.

  17. The effectiveness of Speech-Music Therapy for Aphasia (SMTA) in five speakers with Apraxia of Speech and aphasia

    NARCIS (Netherlands)

    Hurkmans, Joost; Jonkers, Roel; de Bruijn, Madeleen; Boonstra, Anne M.; Hartman, Paul P.; Arendzen, Hans; Reinders - Messelink, Heelen

    2015-01-01

    Background: Several studies using musical elements in the treatment of neurological language and speech disorders have reported improvement of speech production. One such programme, Speech-Music Therapy for Aphasia (SMTA), integrates speech therapy and music therapy (MT) to treat the individual with

  18. Learning Grammatical Categories from Distributional Cues: Flexible Frames for Language Acquisition

    Science.gov (United States)

    St. Clair, Michelle C.; Monaghan, Padraic; Christiansen, Morten H.

    2010-01-01

    Numerous distributional cues in the child's environment may potentially assist in language learning, but what cues are useful to the child and when are these cues utilised? We propose that the most useful source of distributional cue is a flexible frame surrounding the word, where the language learner integrates information from the preceding and…

  19. Neural Entrainment to Speech Modulates Speech Intelligibility

    NARCIS (Netherlands)

    Riecke, Lars; Formisano, Elia; Sorger, Bettina; Baskent, Deniz; Gaudrain, Etienne

    2018-01-01

    Speech is crucial for communication in everyday life. Speech-brain entrainment, the alignment of neural activity to the slow temporal fluctuations (envelope) of acoustic speech input, is a ubiquitous element of current theories of speech processing. Associations between speech-brain entrainment and

  20. Use of explicit memory cues following parietal lobe lesions.

    Science.gov (United States)

    Dobbins, Ian G; Jaeger, Antonio; Studer, Bettina; Simons, Jon S

    2012-11-01

    The putative role of the lateral parietal lobe in episodic memory has recently become a topic of considerable debate, owing primarily to its consistent activation for studied materials during functional magnetic resonance imaging studies of recognition. Here we examined the performance of patients with parietal lobe lesions using an explicit memory cueing task in which probabilistic cues ("Likely Old" or "Likely New"; 75% validity) preceded the majority of verbal recognition memory probes. Without cues, patients and control participants did not differ in accuracy. However, group differences emerged during the "Likely New" cue condition with controls responding more accurately than parietal patients when these cues were valid (preceding new materials) and trending towards less accuracy when these cues were invalid (preceding old materials). Both effects suggest insufficient integration of external cues into memory judgments on the part of the parietal patients whose cued performance largely resembled performance in the complete absence of cues. Comparison of the parietal patients to a patient group with frontal lobe lesions suggested the pattern was specific to parietal and adjacent area lesions. Overall, the data indicate that parietal lobe patients fail to appropriately incorporate external cues of novelty into recognition attributions. This finding supports a role for the lateral parietal lobe in the adaptive biasing of memory judgments through the integration of external cues and internal memory evidence. We outline the importance of such adaptive biasing through consideration of basic signal detection predictions regarding maximum possible accuracy with and without informative environmental cues. Copyright © 2012 Elsevier Ltd. All rights reserved.

  1. Speak, Move, Play and Learn with Children on the Autism Spectrum: Activities to Boost Communication Skills, Sensory Integration and Coordination Using Simple Ideas from Speech and Language Pathology and Occupational Therapy

    Science.gov (United States)

    Brady, Lois Jean; Gonzalez, America X.; Zawadzki, Maciej; Presley, Corinda

    2012-01-01

    This practical resource is brimming with ideas and guidance for using simple ideas from speech and language pathology and occupational therapy to boost communication, sensory integration, and coordination skills in children on the autism spectrum. Suitable for use in the classroom, at home, and in community settings, it is packed with…

  2. Speech Research

    Science.gov (United States)

    Several articles addressing topics in speech research are presented. The topics include: exploring the functional significance of physiological tremor: A biospectroscopic approach; differences between experienced and inexperienced listeners to deaf speech; a language-oriented view of reading and its disabilities; Phonetic factors in letter detection; categorical perception; Short-term recall by deaf signers of American sign language; a common basis for auditory sensory storage in perception and immediate memory; phonological awareness and verbal short-term memory; initiation versus execution time during manual and oral counting by stutterers; trading relations in the perception of speech by five-year-old children; the role of the strap muscles in pitch lowering; phonetic validation of distinctive features; consonants and syllable boundaires; and vowel information in postvocalic frictions.

  3. Hate speech

    Directory of Open Access Journals (Sweden)

    Anne Birgitta Nilsen

    2014-12-01

    Full Text Available The manifesto of the Norwegian terrorist Anders Behring Breivik is based on the “Eurabia” conspiracy theory. This theory is a key starting point for hate speech amongst many right-wing extremists in Europe, but also has ramifications beyond these environments. In brief, proponents of the Eurabia theory claim that Muslims are occupying Europe and destroying Western culture, with the assistance of the EU and European governments. By contrast, members of Al-Qaeda and other extreme Islamists promote the conspiracy theory “the Crusade” in their hate speech directed against the West. Proponents of the latter theory argue that the West is leading a crusade to eradicate Islam and Muslims, a crusade that is similarly facilitated by their governments. This article presents analyses of texts written by right-wing extremists and Muslim extremists in an effort to shed light on how hate speech promulgates conspiracy theories in order to spread hatred and intolerance.The aim of the article is to contribute to a more thorough understanding of hate speech’s nature by applying rhetorical analysis. Rhetorical analysis is chosen because it offers a means of understanding the persuasive power of speech. It is thus a suitable tool to describe how hate speech works to convince and persuade. The concepts from rhetorical theory used in this article are ethos, logos and pathos. The concept of ethos is used to pinpoint factors that contributed to Osama bin Laden's impact, namely factors that lent credibility to his promotion of the conspiracy theory of the Crusade. In particular, Bin Laden projected common sense, good morals and good will towards his audience. He seemed to have coherent and relevant arguments; he appeared to possess moral credibility; and his use of language demonstrated that he wanted the best for his audience.The concept of pathos is used to define hate speech, since hate speech targets its audience's emotions. In hate speech it is the

  4. Speech enhancement

    CERN Document Server

    Benesty, Jacob; Chen, Jingdong

    2006-01-01

    We live in a noisy world! In all applications (telecommunications, hands-free communications, recording, human-machine interfaces, etc.) that require at least one microphone, the signal of interest is usually contaminated by noise and reverberation. As a result, the microphone signal has to be ""cleaned"" with digital signal processing tools before it is played out, transmitted, or stored.This book is about speech enhancement. Different well-known and state-of-the-art methods for noise reduction, with one or multiple microphones, are discussed. By speech enhancement, we mean not only noise red

  5. Specialization in audiovisual speech perception: a replication study

    DEFF Research Database (Denmark)

    Eskelund, Kasper; Andersen, Tobias

    Speech perception is audiovisual as evidenced by bimodal integration in the McGurk effect. This integration effect may be specific to speech or be applied to all stimuli in general. To investigate this, Tuomainen et al. (2005) used sine-wave speech, which naïve observers may perceive as non......-speech, but hear as speech once informed of the linguistic origin of the signal. Combinations of sine-wave speech and incongruent video of the talker elicited a McGurk effect only for informed observers. This indicates that the audiovisual integration effect is specific to speech perception. However, observers...... that observers did look near the mouth. We conclude that eye-movements did not influence the results of Tuomainen et al. and that their results thus can be taken as evidence of a speech specific mode of audiovisual integration underlying the McGurk illusion....

  6. Development in children's interpretation of pitch cues to emotions.

    Science.gov (United States)

    Quam, Carolyn; Swingley, Daniel

    2012-01-01

    Young infants respond to positive and negative speech prosody (A. Fernald, 1993), yet 4-year-olds rely on lexical information when it conflicts with paralinguistic cues to approval or disapproval (M. Friend, 2003). This article explores this surprising phenomenon, testing one hundred eighteen 2- to 5-year-olds' use of isolated pitch cues to emotions in interactive tasks. Only 4- to 5-year-olds consistently interpreted exaggerated, stereotypically happy or sad pitch contours as evidence that a puppet had succeeded or failed to find his toy (Experiment 1) or was happy or sad (Experiments 2, 3). Two- and 3-year-olds exploited facial and body-language cues in the same task. The authors discuss the implications of this late-developing use of pitch cues to emotions, relating them to other functions of pitch. © 2011 The Authors. Child Development © 2011 Society for Research in Child Development, Inc.

  7. Degradation of labial information modifies audiovisual speech perception in cochlear-implanted children.

    Science.gov (United States)

    Huyse, Aurélie; Berthommier, Frédéric; Leybaert, Jacqueline

    2013-01-01

    The aim of the present study was to examine audiovisual speech integration in cochlear-implanted children and in normally hearing children exposed to degraded auditory stimuli. Previous studies have shown that speech perception in cochlear-implanted users is biased toward the visual modality when audition and vision provide conflicting information. Our main question was whether an experimentally designed degradation of the visual speech cue would increase the importance of audition in the response pattern. The impact of auditory proficiency was also investigated. A group of 31 children with cochlear implants and a group of 31 normally hearing children matched for chronological age were recruited. All children with cochlear implants had profound congenital deafness and had used their implants for at least 2 years. Participants had to perform an /aCa/ consonant-identification task in which stimuli were presented randomly in three conditions: auditory only, visual only, and audiovisual (congruent and incongruent McGurk stimuli). In half of the experiment, the visual speech cue was normal; in the other half (visual reduction) a degraded visual signal was presented, aimed at preventing lipreading of good quality. The normally hearing children received a spectrally reduced speech signal (simulating the input delivered by the cochlear implant). First, performance in visual-only and in congruent audiovisual modalities were decreased, showing that the visual reduction technique used here was efficient at degrading lipreading. Second, in the incongruent audiovisual trials, visual reduction led to a major increase in the number of auditory based responses in both groups. Differences between proficient and nonproficient children were found in both groups, with nonproficient children's responses being more visual and less auditory than those of proficient children. Further analysis revealed that differences between visually clear and visually reduced conditions and between

  8. Speech Intelligibility

    Science.gov (United States)

    Brand, Thomas

    Speech intelligibility (SI) is important for different fields of research, engineering and diagnostics in order to quantify very different phenomena like the quality of recordings, communication and playback devices, the reverberation of auditoria, characteristics of hearing impairment, benefit using hearing aids or combinations of these things.

  9. Voice-associated static face image releases speech from informational masking.

    Science.gov (United States)

    Gao, Yayue; Cao, Shuyang; Qu, Tianshu; Wu, Xihong; Li, Haifeng; Zhang, Jinsheng; Li, Liang

    2014-06-01

    In noisy, multipeople talking environments such as a cocktail party, listeners can use various perceptual and/or cognitive cues to improve recognition of target speech against masking, particularly informational masking. Previous studies have shown that temporally prepresented voice cues (voice primes) improve recognition of target speech against speech masking but not noise masking. This study investigated whether static face image primes that have become target-voice associated (i.e., facial images linked through associative learning with voices reciting the target speech) can be used by listeners to unmask speech. The results showed that in 32 normal-hearing younger adults, temporally prepresenting a voice-priming sentence with the same voice reciting the target sentence significantly improved the recognition of target speech that was masked by irrelevant two-talker speech. When a person's face photograph image became associated with the voice reciting the target speech by learning, temporally prepresenting the target-voice-associated face image significantly improved recognition of target speech against speech masking, particularly for the last two keywords in the target sentence. Moreover, speech-recognition performance under the voice-priming condition was significantly correlated to that under the face-priming condition. The results suggest that learned facial information on talker identity plays an important role in identifying the target-talker's voice and facilitating selective attention to the target-speech stream against the masking-speech stream. © 2014 The Institute of Psychology, Chinese Academy of Sciences and Wiley Publishing Asia Pty Ltd.

  10. 78 FR 49693 - Speech-to-Speech and Internet Protocol (IP) Speech-to-Speech Telecommunications Relay Services...

    Science.gov (United States)

    2013-08-15

    ...-Speech Services for Individuals with Hearing and Speech Disabilities, Report and Order (Order), document...] Speech-to-Speech and Internet Protocol (IP) Speech-to-Speech Telecommunications Relay Services; Telecommunications Relay Services and Speech-to-Speech Services for Individuals With Hearing and Speech Disabilities...

  11. Sound frequency affects speech emotion perception: results from congenital amusia.

    Science.gov (United States)

    Lolli, Sydney L; Lewenstein, Ari D; Basurto, Julian; Winnik, Sean; Loui, Psyche

    2015-01-01

    Congenital amusics, or "tone-deaf" individuals, show difficulty in perceiving and producing small pitch differences. While amusia has marked effects on music perception, its impact on speech perception is less clear. Here we test the hypothesis that individual differences in pitch perception affect judgment of emotion in speech, by applying low-pass filters to spoken statements of emotional speech. A norming study was first conducted on Mechanical Turk to ensure that the intended emotions from the Macquarie Battery for Evaluation of Prosody were reliably identifiable by US English speakers. The most reliably identified emotional speech samples were used in Experiment 1, in which subjects performed a psychophysical pitch discrimination task, and an emotion identification task under low-pass and unfiltered speech conditions. Results showed a significant correlation between pitch-discrimination threshold and emotion identification accuracy for low-pass filtered speech, with amusics (defined here as those with a pitch discrimination threshold >16 Hz) performing worse than controls. This relationship with pitch discrimination was not seen in unfiltered speech conditions. Given the dissociation between low-pass filtered and unfiltered speech conditions, we inferred that amusics may be compensating for poorer pitch perception by using speech cues that are filtered out in this manipulation. To assess this potential compensation, Experiment 2 was conducted using high-pass filtered speech samples intended to isolate non-pitch cues. No significant correlation was found between pitch discrimination and emotion identification accuracy for high-pass filtered speech. Results from these experiments suggest an influence of low frequency information in identifying emotional content of speech.

  12. Perception of co-speech gestures in aphasic patients: a visual exploration study during the observation of dyadic conversations.

    Science.gov (United States)

    Preisig, Basil C; Eggenberger, Noëmi; Zito, Giuseppe; Vanbellingen, Tim; Schumacher, Rahel; Hopfner, Simone; Nyffeler, Thomas; Gutbrod, Klemens; Annoni, Jean-Marie; Bohlhalter, Stephan; Müri, René M

    2015-03-01

    Co-speech gestures are part of nonverbal communication during conversations. They either support the verbal message or provide the interlocutor with additional information. Furthermore, they prompt as nonverbal cues the cooperative process of turn taking. In the present study, we investigated the influence of co-speech gestures on the perception of dyadic dialogue in aphasic patients. In particular, we analysed the impact of co-speech gestures on gaze direction (towards speaker or listener) and fixation of body parts. We hypothesized that aphasic patients, who are restricted in verbal comprehension, adapt their visual exploration strategies. Sixteen aphasic patients and 23 healthy control subjects participated in the study. Visual exploration behaviour was measured by means of a contact-free infrared eye-tracker while subjects were watching videos depicting spontaneous dialogues between two individuals. Cumulative fixation duration and mean fixation duration were calculated for the factors co-speech gesture (present and absent), gaze direction (to the speaker or to the listener), and region of interest (ROI), including hands, face, and body. Both aphasic patients and healthy controls mainly fixated the speaker's face. We found a significant co-speech gesture × ROI interaction, indicating that the presence of a co-speech gesture encouraged subjects to look at the speaker. Further, there was a significant gaze direction × ROI × group interaction revealing that aphasic patients showed reduced cumulative fixation duration on the speaker's face compared to healthy controls. Co-speech gestures guide the observer's attention towards the speaker, the source of semantic input. It is discussed whether an underlying semantic processing deficit or a deficit to integrate audio-visual information may cause aphasic patients to explore less the speaker's face. Copyright © 2014 Elsevier Ltd. All rights reserved.

  13. Non-fluent speech following stroke is caused by impaired efference copy.

    Science.gov (United States)

    Feenaughty, Lynda; Basilakos, Alexandra; Bonilha, Leonardo; den Ouden, Dirk-Bart; Rorden, Chris; Stark, Brielle; Fridriksson, Julius

    2017-09-01

    Efference copy is a cognitive mechanism argued to be critical for initiating and monitoring speech: however, the extent to which breakdown of efference copy mechanisms impact speech production is unclear. This study examined the best mechanistic predictors of non-fluent speech among 88 stroke survivors. Objective speech fluency measures were subjected to a principal component analysis (PCA). The primary PCA factor was then entered into a multiple stepwise linear regression analysis as the dependent variable, with a set of independent mechanistic variables. Participants' ability to mimic audio-visual speech ("speech entrainment response") was the best independent predictor of non-fluent speech. We suggest that this "speech entrainment" factor reflects integrity of internal monitoring (i.e., efference copy) of speech production, which affects speech initiation and maintenance. Results support models of normal speech production and suggest that therapy focused on speech initiation and maintenance may improve speech fluency for individuals with chronic non-fluent aphasia post stroke.

  14. Zebra finches are sensitive to prosodic features of human speech.

    Science.gov (United States)

    Spierings, Michelle J; ten Cate, Carel

    2014-07-22

    Variation in pitch, amplitude and rhythm adds crucial paralinguistic information to human speech. Such prosodic cues can reveal information about the meaning or emphasis of a sentence or the emotional state of the speaker. To examine the hypothesis that sensitivity to prosodic cues is language independent and not human specific, we tested prosody perception in a controlled experiment with zebra finches. Using a go/no-go procedure, subjects were trained to discriminate between speech syllables arranged in XYXY patterns with prosodic stress on the first syllable and XXYY patterns with prosodic stress on the final syllable. To systematically determine the salience of the various prosodic cues (pitch, duration and amplitude) to the zebra finches, they were subjected to five tests with different combinations of these cues. The zebra finches generalized the prosodic pattern to sequences that consisted of new syllables and used prosodic features over structural ones to discriminate between stimuli. This strong sensitivity to the prosodic pattern was maintained when only a single prosodic cue was available. The change in pitch was treated as more salient than changes in the other prosodic features. These results show that zebra finches are sensitive to the same prosodic cues known to affect human speech perception. © 2014 The Author(s) Published by the Royal Society. All rights reserved.

  15. Visual Temporal Acuity Is Related to Auditory Speech Perception Abilities in Cochlear Implant Users.

    Science.gov (United States)

    Jahn, Kelly N; Stevenson, Ryan A; Wallace, Mark T

    Despite significant improvements in speech perception abilities following cochlear implantation, many prelingually deafened cochlear implant (CI) recipients continue to rely heavily on visual information to develop speech and language. Increased reliance on visual cues for understanding spoken language could lead to the development of unique audiovisual integration and visual-only processing abilities in these individuals. Brain imaging studies have demonstrated that good CI performers, as indexed by auditory-only speech perception abilities, have different patterns of visual cortex activation in response to visual and auditory stimuli as compared with poor CI performers. However, no studies have examined whether speech perception performance is related to any type of visual processing abilities following cochlear implantation. The purpose of the present study was to provide a preliminary examination of the relationship between clinical, auditory-only speech perception tests, and visual temporal acuity in prelingually deafened adult CI users. It was hypothesized that prelingually deafened CI users, who exhibit better (i.e., more acute) visual temporal processing abilities would demonstrate better auditory-only speech perception performance than those with poorer visual temporal acuity. Ten prelingually deafened adult CI users were recruited for this study. Participants completed a visual temporal order judgment task to quantify visual temporal acuity. To assess auditory-only speech perception abilities, participants completed the consonant-nucleus-consonant word recognition test and the AzBio sentence recognition test. Results were analyzed using two-tailed partial Pearson correlations, Spearman's rho correlations, and independent samples t tests. Visual temporal acuity was significantly correlated with auditory-only word and sentence recognition abilities. In addition, proficient CI users, as assessed via auditory-only speech perception performance, demonstrated

  16. The Interaction between Prosody and Meaning in Second Language Speech Production

    Science.gov (United States)

    Jackson, Carrie N.; O'Brien, Mary Grantham

    2011-01-01

    Research has shown that English and German native speakers use prosodic cues during speech production to convey the intended meaning of an utterance. However, little is known about whether American L2 learners of German also use such cues during L2 production. The present study shows that inter-mediate-level L2 learners of German (English L1) use…

  17. Commencement Speech as a Hybrid Polydiscursive Practice

    Directory of Open Access Journals (Sweden)

    Светлана Викторовна Иванова

    2017-12-01

    Full Text Available Discourse and media communication researchers pay attention to the fact that popular discursive and communicative practices have a tendency to hybridization and convergence. Discourse which is understood as language in use is flexible. Consequently, it turns out that one and the same text can represent several types of discourses. A vivid example of this tendency is revealed in American commencement speech / commencement address / graduation speech. A commencement speech is a speech university graduates are addressed with which in compliance with the modern trend is delivered by outstanding media personalities (politicians, athletes, actors, etc.. The objective of this study is to define the specificity of the realization of polydiscursive practices within commencement speech. The research involves discursive, contextual, stylistic and definitive analyses. Methodologically the study is based on the discourse analysis theory, in particular the notion of a discursive practice as a verbalized social practice makes up the conceptual basis of the research. This research draws upon a hundred commencement speeches delivered by prominent representatives of American society since 1980s till now. In brief, commencement speech belongs to institutional discourse public speech embodies. Commencement speech institutional parameters are well represented in speeches delivered by people in power like American and university presidents. Nevertheless, as the results of the research indicate commencement speech institutional character is not its only feature. Conceptual information analysis enables to refer commencement speech to didactic discourse as it is aimed at teaching university graduates how to deal with challenges life is rich in. Discursive practices of personal discourse are also actively integrated into the commencement speech discourse. More than that, existential discursive practices also find their way into the discourse under study. Commencement

  18. Speech disorders - children

    Science.gov (United States)

    ... disorder; Voice disorders; Vocal disorders; Disfluency; Communication disorder - speech disorder; Speech disorder - stuttering ... evaluation tools that can help identify and diagnose speech disorders: Denver Articulation Screening Examination Goldman-Fristoe Test of ...

  19. Replacing Maladaptive Speech with Verbal Labeling Responses: An Analysis of Generalized Responding.

    Science.gov (United States)

    Foxx, R. M.; And Others

    1988-01-01

    Three mentally handicapped students (aged 13, 36, and 40) with maladaptive speech received training to answer questions with verbal labels. The results of their cues-pause-point training showed that the students replaced their maladaptive speech with correct labels (answers) to questions in the training setting and three generalization settings.…

  20. The Perception of "Sine-Wave Speech" by Adults with Developmental Dyslexia.

    Science.gov (United States)

    Rosner, Burton S.; Talcott, Joel B.; Witton, Caroline; Hogg, James D.; Richardson, Alexandra J.; Hansen, Peter C.; Stein, John F.

    2003-01-01

    "Sine-wave speech" sentences contain only four frequency-modulated sine waves, lacking many acoustic cues present in natural speech. Adults with (n=19) and without (n=14) dyslexia were asked to reproduce orally sine-wave utterances in successive trials. Results suggest comprehension of sine-wave sentences is impaired in some adults with…

  1. Speech Processing.

    Science.gov (United States)

    1983-05-01

    The VDE system developed had the capability of recognizing up to 248 separate words in syntactic structures. 4 The two systems described are isolated...AND SPEAKER RECOGNITION by M.J.Hunt 5 ASSESSMENT OF SPEECH SYSTEMS ’ ..- * . by R.K.Moore 6 A SURVEY OF CURRENT EQUIPMENT AND RESEARCH’ by J.S.Bridle...TECHNOLOGY IN NAVY TRAINING SYSTEMS by R.Breaux, M.Blind and R.Lynchard 10 9 I-I GENERAL REVIEW OF MILITARY APPLICATIONS OF VOICE PROCESSING DR. BRUNO

  2. Speech Recognition

    Directory of Open Access Journals (Sweden)

    Adrian Morariu

    2009-01-01

    Full Text Available This paper presents a method of speech recognition by pattern recognition techniques. Learning consists in determining the unique characteristics of a word (cepstral coefficients by eliminating those characteristics that are different from one word to another. For learning and recognition, the system will build a dictionary of words by determining the characteristics of each word to be used in the recognition. Determining the characteristics of an audio signal consists in the following steps: noise removal, sampling it, applying Hamming window, switching to frequency domain through Fourier transform, calculating the magnitude spectrum, filtering data, determining cepstral coefficients.

  3. Acoustic Constraints and Musical Consequences: Exploring Composers' Use of Cues for Musical Emotion.

    Science.gov (United States)

    Schutz, Michael

    2017-01-01

    Emotional communication in music is based in part on the use of pitch and timing, two cues effective in emotional speech. Corpus analyses of natural speech illustrate that happy utterances tend to be higher and faster than sad. Although manipulations altering melodies show that passages changed to be higher and faster sound happier, corpus analyses of unaltered music paralleling those of natural speech have proven challenging. This partly reflects the importance of modality (i.e., major/minor), a powerful musical cue whose use is decidedly imbalanced in Western music. This imbalance poses challenges for creating musical corpora analogous to existing speech corpora for purposes of analyzing emotion. However, a novel examination of music by Bach and Chopin balanced in modality illustrates that, consistent with predictions from speech, their major key (nominally "happy") pieces are approximately a major second higher and 29% faster than their minor key pieces (Poon and Schutz, 2015). Although this provides useful evidence for parallels in use of emotional cues between these domains, it raises questions about how composers "trade off" cue differentiation in music, suggesting interesting new potential research directions. This Focused Review places those results in a broader context, highlighting their connections with previous work on the natural use of cues for musical emotion. Together, these observational findings based on unaltered music-widely recognized for its artistic significance-complement previous experimental work systematically manipulating specific parameters. In doing so, they also provide a useful musical counterpart to fruitful studies of the acoustic cues for emotion found in natural speech.

  4. Acoustic Constraints and Musical Consequences: Exploring Composers' Use of Cues for Musical Emotion

    Directory of Open Access Journals (Sweden)

    Michael Schutz

    2017-11-01

    Full Text Available Emotional communication in music is based in part on the use of pitch and timing, two cues effective in emotional speech. Corpus analyses of natural speech illustrate that happy utterances tend to be higher and faster than sad. Although manipulations altering melodies show that passages changed to be higher and faster sound happier, corpus analyses of unaltered music paralleling those of natural speech have proven challenging. This partly reflects the importance of modality (i.e., major/minor, a powerful musical cue whose use is decidedly imbalanced in Western music. This imbalance poses challenges for creating musical corpora analogous to existing speech corpora for purposes of analyzing emotion. However, a novel examination of music by Bach and Chopin balanced in modality illustrates that, consistent with predictions from speech, their major key (nominally “happy” pieces are approximately a major second higher and 29% faster than their minor key pieces (Poon and Schutz, 2015. Although this provides useful evidence for parallels in use of emotional cues between these domains, it raises questions about how composers “trade off” cue differentiation in music, suggesting interesting new potential research directions. This Focused Review places those results in a broader context, highlighting their connections with previous work on the natural use of cues for musical emotion. Together, these observational findings based on unaltered music—widely recognized for its artistic significance—complement previous experimental work systematically manipulating specific parameters. In doing so, they also provide a useful musical counterpart to fruitful studies of the acoustic cues for emotion found in natural speech.

  5. Acoustic Constraints and Musical Consequences: Exploring Composers' Use of Cues for Musical Emotion

    Science.gov (United States)

    Schutz, Michael

    2017-01-01

    Emotional communication in music is based in part on the use of pitch and timing, two cues effective in emotional speech. Corpus analyses of natural speech illustrate that happy utterances tend to be higher and faster than sad. Although manipulations altering melodies show that passages changed to be higher and faster sound happier, corpus analyses of unaltered music paralleling those of natural speech have proven challenging. This partly reflects the importance of modality (i.e., major/minor), a powerful musical cue whose use is decidedly imbalanced in Western music. This imbalance poses challenges for creating musical corpora analogous to existing speech corpora for purposes of analyzing emotion. However, a novel examination of music by Bach and Chopin balanced in modality illustrates that, consistent with predictions from speech, their major key (nominally “happy”) pieces are approximately a major second higher and 29% faster than their minor key pieces (Poon and Schutz, 2015). Although this provides useful evidence for parallels in use of emotional cues between these domains, it raises questions about how composers “trade off” cue differentiation in music, suggesting interesting new potential research directions. This Focused Review places those results in a broader context, highlighting their connections with previous work on the natural use of cues for musical emotion. Together, these observational findings based on unaltered music—widely recognized for its artistic significance—complement previous experimental work systematically manipulating specific parameters. In doing so, they also provide a useful musical counterpart to fruitful studies of the acoustic cues for emotion found in natural speech. PMID:29249997

  6. Speech-specific audiovisual perception affects identification but not detection of speech

    DEFF Research Database (Denmark)

    Eskelund, Kasper; Andersen, Tobias

    Speech perception is audiovisual as evidenced by the McGurk effect in which watching incongruent articulatory mouth movements can change the phonetic auditory speech percept. This type of audiovisual integration may be specific to speech or be applied to all stimuli in general. To investigate...... of audiovisual integration specific to speech perception. However, the results of Tuomainen et al. might have been influenced by another effect. When observers were naïve, they had little motivation to look at the face. When informed, they knew that the face was relevant for the task and this could increase...... visual detection task. In our first experiment, observers presented with congruent and incongruent audiovisual sine-wave speech stimuli did only show a McGurk effect when informed of the speech nature of the stimulus. Performance on the secondary visual task was very good, thus supporting the finding...

  7. Improving Understanding of Emotional Speech Acoustic Content

    Science.gov (United States)

    Tinnemore, Anna

    Children with cochlear implants show deficits in identifying emotional intent of utterances without facial or body language cues. A known limitation to cochlear implants is the inability to accurately portray the fundamental frequency contour of speech which carries the majority of information needed to identify emotional intent. Without reliable access to the fundamental frequency, other methods of identifying vocal emotion, if identifiable, could be used to guide therapies for training children with cochlear implants to better identify vocal emotion. The current study analyzed recordings of adults speaking neutral sentences with a set array of emotions in a child-directed and adult-directed manner. The goal was to identify acoustic cues that contribute to emotion identification that may be enhanced in child-directed speech, but are also present in adult-directed speech. Results of this study showed that there were significant differences in the variation of the fundamental frequency, the variation of intensity, and the rate of speech among emotions and between intended audiences.

  8. Visual speech information: a help or hindrance in perceptual processing of dysarthric speech.

    Science.gov (United States)

    Borrie, Stephanie A

    2015-03-01

    This study investigated the influence of visual speech information on perceptual processing of neurologically degraded speech. Fifty listeners identified spastic dysarthric speech under both audio (A) and audiovisual (AV) conditions. Condition comparisons revealed that the addition of visual speech information enhanced processing of the neurologically degraded input in terms of (a) acuity (percent phonemes correct) of vowels and consonants and (b) recognition (percent words correct) of predictive and nonpredictive phrases. Listeners exploited stress-based segmentation strategies more readily in AV conditions, suggesting that the perceptual benefit associated with adding visual speech information to the auditory signal-the AV advantage-has both segmental and suprasegmental origins. Results also revealed that the magnitude of the AV advantage can be predicted, to some degree, by the extent to which an individual utilizes syllabic stress cues to inform word recognition in AV conditions. Findings inform the development of a listener-specific model of speech perception that applies to processing of dysarthric speech in everyday communication contexts.

  9. Intelligibility for Binaural Speech with Discarded Low-SNR Speech Components.

    Science.gov (United States)

    Schoenmaker, Esther; van de Par, Steven

    2016-01-01

    Speech intelligibility in multitalker settings improves when the target speaker is spatially separated from the interfering speakers. A factor that may contribute to this improvement is the improved detectability of target-speech components due to binaural interaction in analogy to the Binaural Masking Level Difference (BMLD). This would allow listeners to hear target speech components within specific time-frequency intervals that have a negative SNR, similar to the improvement in the detectability of a tone in noise when these contain disparate interaural difference cues. To investigate whether these negative-SNR target-speech components indeed contribute to speech intelligibility, a stimulus manipulation was performed where all target components were removed when local SNRs were smaller than a certain criterion value. It can be expected that for sufficiently high criterion values target speech components will be removed that do contribute to speech intelligibility. For spatially separated speakers, assuming that a BMLD-like detection advantage contributes to intelligibility, degradation in intelligibility is expected already at criterion values below 0 dB SNR. However, for collocated speakers it is expected that higher criterion values can be applied without impairing speech intelligibility. Results show that degradation of intelligibility for separated speakers is only seen for criterion values of 0 dB and above, indicating a negligible contribution of a BMLD-like detection advantage in multitalker settings. These results show that the spatial benefit is related to a spatial separation of speech components at positive local SNRs rather than to a BMLD-like detection improvement for speech components at negative local SNRs.

  10. Sequence analysis in multilevel models. A study on different sources of patient cues in medical consultations.

    Science.gov (United States)

    Del Piccolo, Lidia; Mazzi, Maria Angela; Dunn, Graham; Sandri, Marco; Zimmermann, Christa

    2007-12-01

    The aims of the study were to explore the importance of macro (patient, physician, consultation) and micro (doctor-patient speech sequences) variables in promoting patient cues (unsolicited new information or expressions of feelings), and to describe the methodological implications related to the study of speech sequences. Patient characteristics, a consultation index of partnership and doctor-patient speech sequences were recorded for 246 primary care consultations in six primary care surgeries in Verona, Italy. Homogeneity and stationarity conditions of speech sequences allowed the creation of a hierarchy of multilevel logit models including micro and macro level variables, with the presence/absence of cues as the dependent variable. We found that emotional distress of the patient increased cues and that cues appeared among other patient expressions and were preceded by physicians' facilitations and handling of emotion. Partnership, in terms of open-ended inquiry, active listening skills and handling of emotion by the physician and active participation by the patient throughout the consultation, reduced cue frequency.

  11. Newborn infants' sensitivity to perceptual cues to lexical and grammatical words.

    Science.gov (United States)

    Shi, R; Werker, J F; Morgan, J L

    1999-09-30

    In our study newborn infants were presented with lists of lexical and grammatical words prepared from natural maternal speech. The results show that newborns are able to categorically discriminate these sets of words based on a constellation of perceptual cues that distinguish them. This general ability to detect and categorically discriminate sets of words on the basis of multiple acoustic and phonological cues may provide a perceptual base that can help older infants bootstrap into the acquisition of grammatical categories and syntactic structure.

  12. A configural dominant account of contextual cueing: Configural cues are stronger than colour cues.

    Science.gov (United States)

    Kunar, Melina A; John, Rebecca; Sweetman, Hollie

    2014-01-01

    Previous work has shown that reaction times to find a target in displays that have been repeated are faster than those for displays that have never been seen before. This learning effect, termed "contextual cueing" (CC), has been shown using contexts such as the configuration of the distractors in the display and the background colour. However, it is not clear how these two contexts interact to facilitate search. We investigated this here by comparing the strengths of these two cues when they appeared together. In Experiment 1, participants searched for a target that was cued by both colour and distractor configural cues, compared with when the target was only predicted by configural information. The results showed that the addition of a colour cue did not increase contextual cueing. In Experiment 2, participants searched for a target that was cued by both colour and distractor configuration compared with when the target was only cued by colour. The results showed that adding a predictive configural cue led to a stronger CC benefit. Experiments 3 and 4 tested the disruptive effects of removing either a learned colour cue or a learned configural cue and whether there was cue competition when colour and configural cues were presented together. Removing the configural cue was more disruptive to CC than removing colour, and configural learning was shown to overshadow the learning of colour cues. The data support a configural dominant account of CC, where configural cues act as the stronger cue in comparison to colour when they are presented together.

  13. Spoken Word Recognition of Chinese Words in Continuous Speech

    Science.gov (United States)

    Yip, Michael C. W.

    2015-01-01

    The present study examined the role of positional probability of syllables played in recognition of spoken word in continuous Cantonese speech. Because some sounds occur more frequently at the beginning position or ending position of Cantonese syllables than the others, so these kinds of probabilistic information of syllables may cue the locations…

  14. The Influence of Direct and Indirect Speech on Mental Representations

    NARCIS (Netherlands)

    A. Eerland (Anita); J.A.A. Engelen (Jan A.A.); R.A. Zwaan (Rolf)

    2013-01-01

    textabstractLanguage can be viewed as a set of cues that modulate the comprehender's thought processes. It is a very subtle instrument. For example, the literature suggests that people perceive direct speech (e.g., Joanne said: 'I went out for dinner last night') as more vivid and perceptually

  15. Multisensory Integration in Cochlear Implant Recipients.

    Science.gov (United States)

    Stevenson, Ryan A; Sheffield, Sterling W; Butera, Iliza M; Gifford, René H; Wallace, Mark T

    Speech perception is inherently a multisensory process involving integration of auditory and visual cues. Multisensory integration in cochlear implant (CI) recipients is a unique circumstance in that the integration occurs after auditory deprivation and the provision of hearing via the CI. Despite the clear importance of multisensory cues for perception, in general, and for speech intelligibility, specifically, the topic of multisensory perceptual benefits in CI users has only recently begun to emerge as an area of inquiry. We review the research that has been conducted on multisensory integration in CI users to date and suggest a number of areas needing further research. The overall pattern of results indicates that many CI recipients show at least some perceptual gain that can be attributable to multisensory integration. The extent of this gain, however, varies based on a number of factors, including age of implantation and specific task being assessed (e.g., stimulus detection, phoneme perception, word recognition). Although both children and adults with CIs obtain audiovisual benefits for phoneme, word, and sentence stimuli, neither group shows demonstrable gain for suprasegmental feature perception. Additionally, only early-implanted children and the highest performing adults obtain audiovisual integration benefits similar to individuals with normal hearing. Increasing age of implantation in children is associated with poorer gains resultant from audiovisual integration, suggesting a sensitive period in development for the brain networks that subserve these integrative functions, as well as length of auditory experience. This finding highlights the need for early detection of and intervention for hearing loss, not only in terms of auditory perception, but also in terms of the behavioral and perceptual benefits of audiovisual processing. Importantly, patterns of auditory, visual, and audiovisual responses suggest that underlying integrative processes may be

  16. Speech and Language Delay

    Science.gov (United States)

    ... OTC Relief for Diarrhea Home Diseases and Conditions Speech and Language Delay Condition Speech and Language Delay Share Print Table of Contents1. ... Treatment6. Everyday Life7. Questions8. Resources What is a speech and language delay? A speech and language delay ...

  17. Reacting to Neighborhood Cues?

    DEFF Research Database (Denmark)

    Danckert, Bolette; Dinesen, Peter Thisted; Sønderskov, Kim Mannemar

    2017-01-01

    is founded on politically sophisticated individuals having a greater comprehension of news and other mass-mediated sources, which makes them less likely to rely on neighborhood cues as sources of information relevant for political attitudes. Based on a unique panel data set with fine-grained information...

  18. Sound frequency affects speech emotion perception: Results from congenital amusia

    Directory of Open Access Journals (Sweden)

    Sydney eLolli

    2015-09-01

    Full Text Available Congenital amusics, or tone-deaf individuals, show difficulty in perceiving and producing small pitch differences. While amusia has marked effects on music perception, its impact on speech perception is less clear. Here we test the hypothesis that individual differences in pitch perception affect judgment of emotion in speech, by applying band-pass filters to spoken statements of emotional speech. A norming study was first conducted on Mechanical Turk to ensure that the intended emotions from the Macquarie Battery for Evaluation of Prosody (MBEP were reliably identifiable by US English speakers. The most reliably identified emotional speech samples were used in in Experiment 1, in which subjects performed a psychophysical pitch discrimination task, and an emotion identification task under band-pass and unfiltered speech conditions. Results showed a significant correlation between pitch discrimination threshold and emotion identification accuracy for band-pass filtered speech, with amusics (defined here as those with a pitch discrimination threshold > 16 Hz performing worse than controls. This relationship with pitch discrimination was not seen in unfiltered speech conditions. Given the dissociation between band-pass filtered and unfiltered speech conditions, we inferred that amusics may be compensating for poorer pitch perception by using speech cues that are filtered out in this manipulation.

  19. Automatic Speech Recognition from Neural Signals: A Focused Review

    Directory of Open Access Journals (Sweden)

    Christian Herff

    2016-09-01

    Full Text Available Speech interfaces have become widely accepted and are nowadays integrated in various real-life applications and devices. They have become a part of our daily life. However, speech interfaces presume the ability to produce intelligible speech, which might be impossible due to either loud environments, bothering bystanders or incapabilities to produce speech (i.e.~patients suffering from locked-in syndrome. For these reasons it would be highly desirable to not speak but to simply envision oneself to say words or sentences. Interfaces based on imagined speech would enable fast and natural communication without the need for audible speech and would give a voice to otherwise mute people.This focused review analyzes the potential of different brain imaging techniques to recognize speech from neural signals by applying Automatic Speech Recognition technology. We argue that modalities based on metabolic processes, such as functional Near Infrared Spectroscopy and functional Magnetic Resonance Imaging, are less suited for Automatic Speech Recognition from neural signals due to low temporal resolution but are very useful for the investigation of the underlying neural mechanisms involved in speech processes. In contrast, electrophysiologic activity is fast enough to capture speech processes and is therefor better suited for ASR. Our experimental results indicate the potential of these signals for speech recognition from neural data with a focus on invasively measured brain activity (electrocorticography. As a first example of Automatic Speech Recognition techniques used from neural signals, we discuss the emph{Brain-to-text} system.

  20. Dog-directed speech: why do we use it and do dogs pay attention to it?

    Science.gov (United States)

    Ben-Aderet, Tobey; Gallego-Abenza, Mario; Reby, David; Mathevon, Nicolas

    2017-01-11

    Pet-directed speech is strikingly similar to infant-directed speech, a peculiar speaking pattern with higher pitch and slower tempo known to engage infants' attention and promote language learning. Here, we report the first investigation of potential factors modulating the use of dog-directed speech, as well as its immediate impact on dogs' behaviour. We recorded adult participants speaking in front of pictures of puppies, adult and old dogs, and analysed the quality of their speech. We then performed playback experiments to assess dogs' reaction to dog-directed speech compared with normal speech. We found that human speakers used dog-directed speech with dogs of all ages and that the acoustic structure of dog-directed speech was mostly independent of dog age, except for sound pitch which was relatively higher when communicating with puppies. Playback demonstrated that, in the absence of other non-auditory cues, puppies were highly reactive to dog-directed speech, and that the pitch was a key factor modulating their behaviour, suggesting that this specific speech register has a functional value in young dogs. Conversely, older dogs did not react differentially to dog-directed speech compared with normal speech. The fact that speakers continue to use dog-directed with older dogs therefore suggests that this speech pattern may mainly be a spontaneous attempt to facilitate interactions with non-verbal listeners. © 2017 The Author(s).

  1. Music and speech prosody: a common rhythm.

    Science.gov (United States)

    Hausen, Maija; Torppa, Ritva; Salmela, Viljami R; Vainio, Martti; Särkämö, Teppo

    2013-01-01

    Disorders of music and speech perception, known as amusia and aphasia, have traditionally been regarded as dissociated deficits based on studies of brain damaged patients. This has been taken as evidence that music and speech are perceived by largely separate and independent networks in the brain. However, recent studies of congenital amusia have broadened this view by showing that the deficit is associated with problems in perceiving speech prosody, especially intonation and emotional prosody. In the present study the association between the perception of music and speech prosody was investigated with healthy Finnish adults (n = 61) using an on-line music perception test including the Scale subtest of Montreal Battery of Evaluation of Amusia (MBEA) and Off-Beat and Out-of-key tasks as well as a prosodic verbal task that measures the perception of word stress. Regression analyses showed that there was a clear association between prosody perception and music perception, especially in the domain of rhythm perception. This association was evident after controlling for music education, age, pitch perception, visuospatial perception, and working memory. Pitch perception was significantly associated with music perception but not with prosody perception. The association between music perception and visuospatial perception (measured using analogous tasks) was less clear. Overall, the pattern of results indicates that there is a robust link between music and speech perception and that this link can be mediated by rhythmic cues (time and stress).

  2. Music and speech prosody: A common rhythm

    Directory of Open Access Journals (Sweden)

    Maija eHausen

    2013-09-01

    Full Text Available Disorders of music and speech perception, known as amusia and aphasia, have traditionally been regarded as dissociated deficits based on studies of brain damaged patients. This has been taken as evidence that music and speech are perceived by largely separate and independent networks in the brain. However, recent studies of congenital amusia have broadened this view by showing that the deficit is associated with problems in perceiving speech prosody, especially intonation and emotional prosody. In the present study the association between the perception of music and speech prosody was investigated with healthy Finnish adults (n = 61 using an on-line music perception test including the Scale subtest of Montreal Battery of Evaluation of Amusia (MBEA and Off-Beat and Out-of-key tasks as well as a prosodic verbal task that measures the perception of word stress. Regression analyses showed that there was a clear association between prosody perception and music perception, especially in the domain of rhythm perception. This association was evident after controlling for music education, age, pitch perception, visuospatial perception and working memory. Pitch perception was significantly associated with music perception but not with prosody perception. The association between music perception and visuospatial perception (measured using analogous tasks was less clear. Overall, the pattern of results indicates that there is a robust link between music and speech perception and that this link can be mediated by rhythmic cues (time and stress.

  3. Music and speech prosody: a common rhythm

    Science.gov (United States)

    Hausen, Maija; Torppa, Ritva; Salmela, Viljami R.; Vainio, Martti; Särkämö, Teppo

    2013-01-01

    Disorders of music and speech perception, known as amusia and aphasia, have traditionally been regarded as dissociated deficits based on studies of brain damaged patients. This has been taken as evidence that music and speech are perceived by largely separate and independent networks in the brain. However, recent studies of congenital amusia have broadened this view by showing that the deficit is associated with problems in perceiving speech prosody, especially intonation and emotional prosody. In the present study the association between the perception of music and speech prosody was investigated with healthy Finnish adults (n = 61) using an on-line music perception test including the Scale subtest of Montreal Battery of Evaluation of Amusia (MBEA) and Off-Beat and Out-of-key tasks as well as a prosodic verbal task that measures the perception of word stress. Regression analyses showed that there was a clear association between prosody perception and music perception, especially in the domain of rhythm perception. This association was evident after controlling for music education, age, pitch perception, visuospatial perception, and working memory. Pitch perception was significantly associated with music perception but not with prosody perception. The association between music perception and visuospatial perception (measured using analogous tasks) was less clear. Overall, the pattern of results indicates that there is a robust link between music and speech perception and that this link can be mediated by rhythmic cues (time and stress). PMID:24032022

  4. Markers of Deception in Italian Speech

    Directory of Open Access Journals (Sweden)

    Katelyn eSpence

    2012-10-01

    Full Text Available Lying is a universal activity and the detection of lying a universal concern. Presently, there is great interest in determining objective measures of deception. The examination of speech, in particular, holds promise in this regard; yet, most of what we know about the relationship between speech and lying is based on the assessment of English-speaking participants. Few studies have examined indicators of deception in languages other than English. The world’s languages differ in significant ways, and cross-linguistic studies of deceptive communications are a research imperative. Here we review some of these differences amongst the world’s languages, and provide an overview of a number of recent studies demonstrating that cross-linguistic research is a worthwhile endeavour. In addition, we report the results of an empirical investigation of pitch, response latency, and speech rate as cues to deception in Italian speech. True and false opinions were elicited in an audio-taped interview. A within subjects analysis revealed no significant difference between the average pitch of the two conditions; however, speech rate was significantly slower, while response latency was longer, during deception compared with truth-telling. We explore the implications of these findings and propose directions for future research, with the aim of expanding the cross-linguistic branch of research on markers of deception.

  5. Mind your pricing cues.

    Science.gov (United States)

    Anderson, Eric; Simester, Duncan

    2003-09-01

    For most of the items they buy, consumers don't have an accurate sense of what the price should be. Ask them to guess how much a four-pack of 35-mm film costs, and you'll get a variety of wrong answers: Most people will underestimate; many will only shrug. Research shows that consumers' knowledge of the market is so far from perfect that it hardly deserves to be called knowledge at all. Yet people happily buy film and other products every day. Is this because they don't care what kind of deal they're getting? No. Remarkably, it's because they rely on retailers to tell them whether they're getting a good price. In subtle and not-so-subtle ways, retailers send signals to customers, telling them whether a given price is relatively high or low. In this article, the authors review several common pricing cues retailers use--"sale" signs, prices that end in 9, signpost items, and price-matching guarantees. They also offer some surprising facts about how--and how well--those cues work. For instance, the authors' tests with several mail-order catalogs reveal that including the word "sale" beside a price can increase demand by more than 50%. The practice of using a 9 at the end of a price to denote a bargain is so common, you'd think customers would be numb to it. Yet in a study the authors did involving a women's clothing catalog, they increased demand by a third just by changing the price of a dress from $34 to $39. Pricing cues are powerful tools for guiding customers' purchasing decisions, but they must be applied judiciously. Used inappropriately, the cues may breach customers' trust, reduce brand equity, and give rise to lawsuits.

  6. SynFace—Speech-Driven Facial Animation for Virtual Speech-Reading Support

    Directory of Open Access Journals (Sweden)

    Giampiero Salvi

    2009-01-01

    Full Text Available This paper describes SynFace, a supportive technology that aims at enhancing audio-based spoken communication in adverse acoustic conditions by providing the missing visual information in the form of an animated talking head. Firstly, we describe the system architecture, consisting of a 3D animated face model controlled from the speech input by a specifically optimised phonetic recogniser. Secondly, we report on speech intelligibility experiments with focus on multilinguality and robustness to audio quality. The system, already available for Swedish, English, and Flemish, was optimised for German and for Swedish wide-band speech quality available in TV, radio, and Internet communication. Lastly, the paper covers experiments with nonverbal motions driven from the speech signal. It is shown that turn-taking gestures can be used to affect the flow of human-human dialogues. We have focused specifically on two categories of cues that may be extracted from the acoustic signal: prominence/emphasis and interactional cues (turn-taking/back-channelling.

  7. Sound of mind : electrophysiological and behavioural evidence for the role of context, variation and informativity in human speech processing

    NARCIS (Netherlands)

    Nixon, Jessie Sophia

    2014-01-01

    Spoken communication involves transmission of a message which takes physical form in acoustic waves. Within any given language, acoustic cues pattern in language-specific ways along language-specific acoustic dimensions to create speech sound contrasts. These cues are utilized by listeners to

  8. Visual-Haptic Integration: Cue Weights are Varied Appropriately, to Account for Changes in Haptic Reliability Introduced by Using a Tool

    OpenAIRE

    Chie Takahashi; Simon J Watt

    2011-01-01

    Tools such as pliers systematically change the relationship between an object's size and the hand opening required to grasp it. Previous work suggests the brain takes this into account, integrating visual and haptic size information that refers to the same object, independent of the similarity of the ‘raw’ visual and haptic signals (Takahashi et al., VSS 2009). Variations in tool geometry also affect the reliability (precision) of haptic size estimates, however, because they alter the change ...

  9. A dominance hierarchy of auditory spatial cues in barn owls.

    Directory of Open Access Journals (Sweden)

    Ilana B Witten

    2010-04-01

    Full Text Available Barn owls integrate spatial information across frequency channels to localize sounds in space.We presented barn owls with synchronous sounds that contained different bands of frequencies (3-5 kHz and 7-9 kHz from different locations in space. When the owls were confronted with the conflicting localization cues from two synchronous sounds of equal level, their orienting responses were dominated by one of the sounds: they oriented toward the location of the low frequency sound when the sources were separated in azimuth; in contrast, they oriented toward the location of the high frequency sound when the sources were separated in elevation. We identified neural correlates of this behavioral effect in the optic tectum (OT, superior colliculus in mammals, which contains a map of auditory space and is involved in generating orienting movements to sounds. We found that low frequency cues dominate the representation of sound azimuth in the OT space map, whereas high frequency cues dominate the representation of sound elevation.We argue that the dominance hierarchy of localization cues reflects several factors: 1 the relative amplitude of the sound providing the cue, 2 the resolution with which the auditory system measures the value of a cue, and 3 the spatial ambiguity in interpreting the cue. These same factors may contribute to the relative weighting of sound localization cues in other species, including humans.

  10. Neuronal basis of speech comprehension.

    Science.gov (United States)

    Specht, Karsten

    2014-01-01

    Verbal communication does not rely only on the simple perception of auditory signals. It is rather a parallel and integrative processing of linguistic and non-linguistic information, involving temporal and frontal areas in particular. This review describes the inherent complexity of auditory speech comprehension from a functional-neuroanatomical perspective. The review is divided into two parts. In the first part, structural and functional asymmetry of language relevant structures will be discus. The second part of the review will discuss recent neuroimaging studies, which coherently demonstrate that speech comprehension processes rely on a hierarchical network involving the temporal, parietal, and frontal lobes. Further, the results support the dual-stream model for speech comprehension, with a dorsal stream for auditory-motor integration, and a ventral stream for extracting meaning but also the processing of sentences and narratives. Specific patterns of functional asymmetry between the left and right hemisphere can also be demonstrated. The review article concludes with a discussion on interactions between the dorsal and ventral streams, particularly the involvement of motor related areas in speech perception processes, and outlines some remaining unresolved issues. This article is part of a Special Issue entitled Human Auditory Neuroimaging. Copyright © 2013 Elsevier B.V. All rights reserved.

  11. Sensorimotor Representation of Speech Perception. Cross-Decoding of Place of Articulation Features during Selective Attention to Syllables in 7T fMRI

    NARCIS (Netherlands)

    Archila-Meléndez, Mario E.; Valente, Giancarlo; Correia, Joao M.; Rouhl, Rob P. W.; van Kranen-Mastenbroek, Vivianne H.; Jansma, Bernadette M.

    2018-01-01

    Sensorimotor integration, the translation between acoustic signals and motoric programs, may constitute a crucial mechanism for speech. During speech perception, the acoustic-motoric translations include the recruitment of cortical areas for the representation of speech articulatory features, such

  12. Speech Recognition with the Advanced Combination Encoder and Transient Emphasis Spectral Maxima Strategies in Nucleus 24 Recipients

    Science.gov (United States)

    Holden, Laura K.; Vandali, Andrew E.; Skinner, Margaret W.; Fourakis, Marios S.; Holden, Timothy A.

    2005-01-01

    One of the difficulties faced by cochlear implant (CI) recipients is perception of low-intensity speech cues. A. E. Vandali (2001) has developed the transient emphasis spectral maxima (TESM) strategy to amplify short-duration, low-level sounds. The aim of the present study was to determine whether speech scores would be significantly higher with…

  13. Weighting of Acoustic Cues to a Manner Distinction by Children with and without Hearing Loss

    Science.gov (United States)

    Nittrouer, Susan; Lowenstein, Joanna H.

    2015-01-01

    Purpose: Children must develop optimal perceptual weighting strategies for processing speech in their first language. Hearing loss can interfere with that development, especially if cochlear implants are required. The three goals of this study were to measure, for children with and without hearing loss: (a) cue weighting for a manner distinction,…

  14. Cross-Linguistic Differences in Prosodic Cues to Syntactic Disambiguation in German and English

    Science.gov (United States)

    O'Brien, Mary Grantham; Jackson, Carrie N.; Gardner, Christine E.

    2014-01-01

    This study examined whether late-learning English-German second language (L2) learners and late-learning German-English L2 learners use prosodic cues to disambiguate temporarily ambiguous first language and L2 sentences during speech production. Experiments 1a and 1b showed that English-German L2 learners and German-English L2 learners used a…

  15. Availability of binaural cues for pediatric bilateral cochlear implant recipients.

    Science.gov (United States)

    Sheffield, Sterling W; Haynes, David S; Wanna, George B; Labadie, Robert F; Gifford, René H

    2015-03-01

    Bilateral implant recipients theoretically have access to binaural cues. Research in postlingually deafened adults with cochlear implants (CIs) indicates minimal evidence for true binaural hearing. Congenitally deafened children who experience spatial hearing with bilateral CIs, however, might perceive binaural cues in the CI signal differently. There is limited research examining binaural hearing in children with CIs, and the few published studies are limited by the use of unrealistic speech stimuli and background noise. The purposes of this study were to (1) replicate our previous study of binaural hearing in postlingually deafened adults with AzBio sentences in prelingually deafened children with the pediatric version of the AzBio sentences, and (2) replicate previous studies of binaural hearing in children with CIs using more open-set sentences and more realistic background noise (i.e., multitalker babble). The study was a within-participant, repeated-measures design. The study sample consisted of 14 children with bilateral CIs with at least 25 mo of listening experience. Speech recognition was assessed using sentences presented in multitalker babble at a fixed signal-to-noise ratio. Test conditions included speech at 0° with noise presented at 0° (S0N0), on the side of the first CI (90° or 270°) (S0N1stCI), and on the side of the second CI (S0N2ndCI) as well as speech presented at 0° with noise presented semidiffusely from eight speakers at 45° intervals. Estimates of summation, head shadow, squelch, and spatial release from masking were calculated. Results of test conditions commonly reported in the literature (S0N0, S0N1stCI, S0N2ndCI) are consistent with results from previous research in adults and children with bilateral CIs, showing minimal summation and squelch but typical head shadow and spatial release from masking. However, bilateral benefit over the better CI with speech at 0° was much larger with semidiffuse noise. Congenitally deafened

  16. THE ONTOGENESIS OF SPEECH DEVELOPMENT

    Directory of Open Access Journals (Sweden)

    T. E. Braudo

    2017-01-01

    Full Text Available The purpose of this article is to acquaint the specialists, working with children having developmental disorders, with age-related norms for speech development. Many well-known linguists and psychologists studied speech ontogenesis (logogenesis. Speech is a higher mental function, which integrates many functional systems. Speech development in infants during the first months after birth is ensured by the innate hearing and emerging ability to fix the gaze on the face of an adult. Innate emotional reactions are also being developed during this period, turning into nonverbal forms of communication. At about 6 months a baby starts to pronounce some syllables; at 7–9 months – repeats various sounds combinations, pronounced by adults. At 10–11 months a baby begins to react on the words, referred to him/her. The first words usually appear at an age of 1 year; this is the start of the stage of active speech development. At this time it is acceptable, if a child confuses or rearranges sounds, distorts or misses them. By the age of 1.5 years a child begins to understand abstract explanations of adults. Significant vocabulary enlargement occurs between 2 and 3 years; grammatical structures of the language are being formed during this period (a child starts to use phrases and sentences. Preschool age (3–7 y. o. is characterized by incorrect, but steadily improving pronunciation of sounds and phonemic perception. The vocabulary increases; abstract speech and retelling are being formed. Children over 7 y. o. continue to improve grammar, writing and reading skills. The described stages may not have strict age boundaries, as soon as they are dependent not only on environment, but also on the child’s mental constitution, heredity and character.

  17. Cortical activity patterns predict robust speech discrimination ability in noise

    Science.gov (United States)

    Shetake, Jai A.; Wolf, Jordan T.; Cheung, Ryan J.; Engineer, Crystal T.; Ram, Satyananda K.; Kilgard, Michael P.

    2012-01-01

    The neural mechanisms that support speech discrimination in noisy conditions are poorly understood. In quiet conditions, spike timing information appears to be used in the discrimination of speech sounds. In this study, we evaluated the hypothesis that spike timing is also used to distinguish between speech sounds in noisy conditions that significantly degrade neural responses to speech sounds. We tested speech sound discrimination in rats and recorded primary auditory cortex (A1) responses to speech sounds in background noise of different intensities and spectral compositions. Our behavioral results indicate that rats, like humans, are able to accurately discriminate consonant sounds even in the presence of background noise that is as loud as the speech signal. Our neural recordings confirm that speech sounds evoke degraded but detectable responses in noise. Finally, we developed a novel neural classifier that mimics behavioral discrimination. The classifier discriminates between speech sounds by comparing the A1 spatiotemporal activity patterns evoked on single trials with the average spatiotemporal patterns evoked by known sounds. Unlike classifiers in most previous studies, this classifier is not provided with the stimulus onset time. Neural activity analyzed with the use of relative spike timing was well correlated with behavioral speech discrimination in quiet and in noise. Spike timing information integrated over longer intervals was required to accurately predict rat behavioral speech discrimination in noisy conditions. The similarity of neural and behavioral discrimination of speech in noise suggests that humans and rats may employ similar brain mechanisms to solve this problem. PMID:22098331

  18. Speech and Communication Disorders

    Science.gov (United States)

    ... to being completely unable to speak or understand speech. Causes include Hearing disorders and deafness Voice problems, ... or those caused by cleft lip or palate Speech problems like stuttering Developmental disabilities Learning disorders Autism ...

  19. The role of visual spatial attention in audiovisual speech perception

    DEFF Research Database (Denmark)

    Andersen, Tobias; Tiippana, K.; Laarni, J.

    2009-01-01

    Auditory and visual information is integrated when perceiving speech, as evidenced by the McGurk effect in which viewing an incongruent talking face categorically alters auditory speech perception. Audiovisual integration in speech perception has long been considered automatic and pre-attentive b......Auditory and visual information is integrated when perceiving speech, as evidenced by the McGurk effect in which viewing an incongruent talking face categorically alters auditory speech perception. Audiovisual integration in speech perception has long been considered automatic and pre...... from each of the faces and from the voice on the auditory speech percept. We found that directing visual spatial attention towards a face increased the influence of that face on auditory perception. However, the influence of the voice on auditory perception did not change suggesting that audiovisual...... integration did not change. Visual spatial attention was also able to select between the faces when lip reading. This suggests that visual spatial attention acts at the level of visual speech perception prior to audiovisual integration and that the effect propagates through audiovisual integration...

  20. Cue conflicts in context

    DEFF Research Database (Denmark)

    Boeg Thomsen, Ditte; Poulsen, Mads

    2015-01-01

    When learning their first language, children develop strategies for assigning semantic roles to sentence structures, depending on morphosyntactic cues such as case and word order. Traditionally, comprehension experiments have presented transitive clauses in isolation, and crosslinguistically...... preschoolers. However, object-first clauses may be context-sensitive structures, which are infelicitous in isolation. In a second act-out study we presented OVS clauses in supportive and unsupportive discourse contexts and in isolation and found that five-to-six-year-olds’ OVS comprehension was enhanced...

  1. Free Speech Yearbook 1978.

    Science.gov (United States)

    Phifer, Gregg, Ed.

    The 17 articles in this collection deal with theoretical and practical freedom of speech issues. The topics include: freedom of speech in Marquette Park, Illinois; Nazis in Skokie, Illinois; freedom of expression in the Confederate States of America; Robert M. LaFollette's arguments for free speech and the rights of Congress; the United States…

  2. The Functional Connectome of Speech Control.

    Directory of Open Access Journals (Sweden)

    Stefan Fuertinger

    2015-07-01

    Full Text Available In the past few years, several studies have been directed to understanding the complexity of functional interactions between different brain regions during various human behaviors. Among these, neuroimaging research installed the notion that speech and language require an orchestration of brain regions for comprehension, planning, and integration of a heard sound with a spoken word. However, these studies have been largely limited to mapping the neural correlates of separate speech elements and examining distinct cortical or subcortical circuits involved in different aspects of speech control. As a result, the complexity of the brain network machinery controlling speech and language remained largely unknown. Using graph theoretical analysis of functional MRI (fMRI data in healthy subjects, we quantified the large-scale speech network topology by constructing functional brain networks of increasing hierarchy from the resting state to motor output of meaningless syllables to complex production of real-life speech as well as compared to non-speech-related sequential finger tapping and pure tone discrimination networks. We identified a segregated network of highly connected local neural communities (hubs in the primary sensorimotor and parietal regions, which formed a commonly shared core hub network across the examined conditions, with the left area 4p playing an important role in speech network organization. These sensorimotor core hubs exhibited features of flexible hubs based on their participation in several functional domains across different networks and ability to adaptively switch long-range functional connectivity depending on task content, resulting in a distinct community structure of each examined network. Specifically, compared to other tasks, speech production was characterized by the formation of six distinct neural communities with specialized recruitment of the prefrontal cortex, insula, putamen, and thalamus, which collectively

  3. Inner Speech: Development, Cognitive Functions, Phenomenology, and Neurobiology

    Science.gov (United States)

    2015-01-01

    Inner speech—also known as covert speech or verbal thinking—has been implicated in theories of cognitive development, speech monitoring, executive function, and psychopathology. Despite a growing body of knowledge on its phenomenology, development, and function, approaches to the scientific study of inner speech have remained diffuse and largely unintegrated. This review examines prominent theoretical approaches to inner speech and methodological challenges in its study, before reviewing current evidence on inner speech in children and adults from both typical and atypical populations. We conclude by considering prospects for an integrated cognitive science of inner speech, and present a multicomponent model of the phenomenon informed by developmental, cognitive, and psycholinguistic considerations. Despite its variability among individuals and across the life span, inner speech appears to perform significant functions in human cognition, which in some cases reflect its developmental origins and its sharing of resources with other cognitive processes. PMID:26011789

  4. Atypical audio-visual speech perception and McGurk effects in children with specific language impairment.

    Science.gov (United States)

    Leybaert, Jacqueline; Macchi, Lucie; Huyse, Aurélie; Champoux, François; Bayard, Clémence; Colin, Cécile; Berthommier, Frédéric

    2014-01-01

    Audiovisual speech perception of children with specific language impairment (SLI) and children with typical language development (TLD) was compared in two experiments using /aCa/ syllables presented in the context of a masking release paradigm. Children had to repeat syllables presented in auditory alone, visual alone (speechreading), audiovisual congruent and incongruent (McGurk) conditions. Stimuli were masked by either stationary (ST) or amplitude modulated (AM) noise. Although children with SLI were less accurate in auditory and audiovisual speech perception, they showed similar auditory masking release effect than children with TLD. Children with SLI also had less correct responses in speechreading than children with TLD, indicating impairment in phonemic processing of visual speech information. In response to McGurk stimuli, children with TLD showed more fusions in AM noise than in ST noise, a consequence of the auditory masking release effect and of the influence of visual information. Children with SLI did not show this effect systematically, suggesting they were less influenced by visual speech. However, when the visual cues were easily identified, the profile of responses to McGurk stimuli was similar in both groups, suggesting that children with SLI do not suffer from an impairment of audiovisual integration. An analysis of percent of information transmitted revealed a deficit in the children with SLI, particularly for the place of articulation feature. Taken together, the data support the hypothesis of an intact peripheral processing of auditory speech information, coupled with a supra modal deficit of phonemic categorization in children with SLI. Clinical implications are discussed.

  5. Cueing listeners to attend to a target talker progressively improves word report as the duration of the cue-target interval lengthens to 2,000 ms.

    Science.gov (United States)

    Holmes, Emma; Kitterick, Padraig T; Summerfield, A Quentin

    2018-04-25

    Endogenous attention is typically studied by presenting instructive cues in advance of a target stimulus array. For endogenous visual attention, task performance improves as the duration of the cue-target interval increases up to 800 ms. Less is known about how endogenous auditory attention unfolds over time or the mechanisms by which an instructive cue presented in advance of an auditory array improves performance. The current experiment used five cue-target intervals (0, 250, 500, 1,000, and 2,000 ms) to compare four hypotheses for how preparatory attention develops over time in a multi-talker listening task. Young adults were cued to attend to a target talker who spoke in a mixture of three talkers. Visual cues indicated the target talker's spatial location or their gender. Participants directed attention to location and gender simultaneously ("objects") at all cue-target intervals. Participants were consistently faster and more accurate at reporting words spoken by the target talker when the cue-target interval was 2,000 ms than 0 ms. In addition, the latency of correct responses progressively shortened as the duration of the cue-target interval increased from 0 to 2,000 ms. These findings suggest that the mechanisms involved in preparatory auditory attention develop gradually over time, taking at least 2,000 ms to reach optimal configuration, yet providing cumulative improvements in speech intelligibility as the duration of the cue-target interval increases from 0 to 2,000 ms. These results demonstrate an improvement in performance for cue-target intervals longer than those that have been reported previously in the visual or auditory modalities.

  6. Cue reactivity towards shopping cues in female participants.

    Science.gov (United States)

    Starcke, Katrin; Schlereth, Berenike; Domass, Debora; Schöler, Tobias; Brand, Matthias

    2013-03-01

    Background and aims It is currently under debate whether pathological buying can be considered as a behavioural addiction. Addictions have often been investigated with cue-reactivity paradigms to assess subjective, physiological and neural craving reactions. The current study aims at testing whether cue reactivity towards shopping cues is related to pathological buying tendencies. Methods A sample of 66 non-clinical female participants rated shopping related pictures concerning valence, arousal, and subjective craving. In a subgroup of 26 participants, electrodermal reactions towards those pictures were additionally assessed. Furthermore, all participants were screened concerning pathological buying tendencies and baseline craving for shopping. Results Results indicate a relationship between the subjective ratings of the shopping cues and pathological buying tendencies, even if baseline craving for shopping was controlled for. Electrodermal reactions were partly related to the subjective ratings of the cues. Conclusions Cue reactivity may be a potential correlate of pathological buying tendencies. Thus, pathological buying may be accompanied by craving reactions towards shopping cues. Results support the assumption that pathological buying can be considered as a behavioural addiction. From a methodological point of view, results support the view that the cue-reactivity paradigm is suited for the investigation of craving reactions in pathological buying and future studies should implement this paradigm in clinical samples.

  7. Individual differences in speech-in-noise perception parallel neural speech processing and attention in preschoolers

    Science.gov (United States)

    Thompson, Elaine C.; Carr, Kali Woodruff; White-Schwoch, Travis; Otto-Meyer, Sebastian; Kraus, Nina

    2016-01-01

    From bustling classrooms to unruly lunchrooms, school settings are noisy. To learn effectively in the unwelcome company of numerous distractions, children must clearly perceive speech in noise. In older children and adults, speech-in-noise perception is supported by sensory and cognitive processes, but the correlates underlying this critical listening skill in young children (3–5 year olds) remain undetermined. Employing a longitudinal design (two evaluations separated by ~12 months), we followed a cohort of 59 preschoolers, ages 3.0–4.9, assessing word-in-noise perception, cognitive abilities (intelligence, short-term memory, attention), and neural responses to speech. Results reveal changes in word-in-noise perception parallel changes in processing of the fundamental frequency (F0), an acoustic cue known for playing a role central to speaker identification and auditory scene analysis. Four unique developmental trajectories (speech-in-noise perception groups) confirm this relationship, in that improvements and declines in word-in-noise perception couple with enhancements and diminishments of F0 encoding, respectively. Improvements in word-in-noise perception also pair with gains in attention. Word-in-noise perception does not relate to strength of neural harmonic representation or short-term memory. These findings reinforce previously-reported roles of F0 and attention in hearing speech in noise in older children and adults, and extend this relationship to preschool children. PMID:27864051

  8. Perceived gender in clear and conversational speech

    Science.gov (United States)

    Booz, Jaime A.

    Although many studies have examined acoustic and sociolinguistic differences between male and female speech, the relationship between talker speaking style and perceived gender has not yet been explored. The present study attempts to determine whether clear speech, a style adopted by talkers who perceive some barrier to effective communication, shifts perceptions of femininity for male and female talkers. Much of our understanding of gender perception in voice and speech is based on sustained vowels or single words, eliminating temporal, prosodic, and articulatory cues available in more naturalistic, connected speech. Thus, clear and conversational sentence stimuli, selected from the 41 talkers of the Ferguson Clear Speech Database (Ferguson, 2004) were presented to 17 normal-hearing listeners, aged 18 to 30. They rated the talkers' gender using a visual analog scale with "masculine" and "feminine" endpoints. This response method was chosen to account for within-category shifts of gender perception by allowing nonbinary responses. Mixed-effects regression analysis of listener responses revealed a small but significant effect of speaking style, and this effect was larger for male talkers than female talkers. Because of the high degree of talker variability observed for talker gender, acoustic analyses of these sentences were undertaken to determine the relationship between acoustic changes in clear and conversational speech and perceived femininity. Results of these analyses showed that mean fundamental frequency (fo) and f o standard deviation were significantly correlated to perceived gender for both male and female talkers, and vowel space was significantly correlated only for male talkers. Speaking rate and breathiness measures (CPPS) were not significantly related for either group. Outcomes of this study indicate that adopting a clear speaking style is correlated with increases in perceived femininity. Although the increase was small, some changes associated

  9. A magnetorheological haptic cue accelerator for manual transmission vehicles

    International Nuclear Information System (INIS)

    Han, Young-Min; Noh, Kyung-Wook; Choi, Seung-Bok; Lee, Yang-Sub

    2010-01-01

    This paper proposes a new haptic cue function for manual transmission vehicles to achieve optimal gear shifting. This function is implemented on the accelerator pedal by utilizing a magnetorheological (MR) brake mechanism. By combining the haptic cue function with the accelerator pedal, the proposed haptic cue device can transmit the optimal moment of gear shifting for manual transmission to a driver without requiring the driver's visual attention. As a first step to achieve this goal, a MR fluid-based haptic device is devised to enable rotary motion of the accelerator pedal. Taking into account spatial limitations, the design parameters are optimally determined using finite element analysis to maximize the relative control torque. The proposed haptic cue device is then manufactured and its field-dependent torque and time response are experimentally evaluated. Then the manufactured MR haptic cue device is integrated with the accelerator pedal. A simple virtual vehicle emulating the operation of the engine of a passenger vehicle is constructed and put into communication with the haptic cue device. A feed-forward torque control algorithm for the haptic cue is formulated and control performances are experimentally evaluated and presented in the time domain

  10. Speech in spinocerebellar ataxia.

    Science.gov (United States)

    Schalling, Ellika; Hartelius, Lena

    2013-12-01

    Spinocerebellar ataxias (SCAs) are a heterogeneous group of autosomal dominant cerebellar ataxias clinically characterized by progressive ataxia, dysarthria and a range of other concomitant neurological symptoms. Only a few studies include detailed characterization of speech symptoms in SCA. Speech symptoms in SCA resemble ataxic dysarthria but symptoms related to phonation may be more prominent. One study to date has shown an association between differences in speech and voice symptoms related to genotype. More studies of speech and voice phenotypes are motivated, to possibly aid in clinical diagnosis. In addition, instrumental speech analysis has been demonstrated to be a reliable measure that may be used to monitor disease progression or therapy outcomes in possible future pharmacological treatments. Intervention by speech and language pathologists should go beyond assessment. Clinical guidelines for management of speech, communication and swallowing need to be developed for individuals with progressive cerebellar ataxia. Copyright © 2013 Elsevier Inc. All rights reserved.

  11. Training to Improve Hearing Speech in Noise: Biological Mechanisms

    Science.gov (United States)

    Song, Judy H.; Skoe, Erika; Banai, Karen

    2012-01-01

    We investigated training-related improvements in listening in noise and the biological mechanisms mediating these improvements. Training-related malleability was examined using a program that incorporates cognitively based listening exercises to improve speech-in-noise perception. Before and after training, auditory brainstem responses to a speech syllable were recorded in quiet and multitalker noise from adults who ranged in their speech-in-noise perceptual ability. Controls did not undergo training but were tested at intervals equivalent to the trained subjects. Trained subjects exhibited significant improvements in speech-in-noise perception that were retained 6 months later. Subcortical responses in noise demonstrated training-related enhancements in the encoding of pitch-related cues (the fundamental frequency and the second harmonic), particularly for the time-varying portion of the syllable that is most vulnerable to perceptual disruption (the formant transition region). Subjects with the largest strength of pitch encoding at pretest showed the greatest perceptual improvement. Controls exhibited neither neurophysiological nor perceptual changes. We provide the first demonstration that short-term training can improve the neural representation of cues important for speech-in-noise perception. These results implicate and delineate biological mechanisms contributing to learning success, and they provide a conceptual advance to our understanding of the kind of training experiences that can influence sensory processing in adulthood. PMID:21799207

  12. Interdependent processing and encoding of speech and concurrent background noise.

    Science.gov (United States)

    Cooper, Angela; Brouwer, Susanne; Bradlow, Ann R

    2015-05-01

    Speech processing can often take place in adverse listening conditions that involve the mixing of speech and background noise. In this study, we investigated processing dependencies between background noise and indexical speech features, using a speeded classification paradigm (Garner, 1974; Exp. 1), and whether background noise is encoded and represented in memory for spoken words in a continuous recognition memory paradigm (Exp. 2). Whether or not the noise spectrally overlapped with the speech signal was also manipulated. The results of Experiment 1 indicated that background noise and indexical features of speech (gender, talker identity) cannot be completely segregated during processing, even when the two auditory streams are spectrally nonoverlapping. Perceptual interference was asymmetric, whereby irrelevant indexical feature variation in the speech signal slowed noise classification to a greater extent than irrelevant noise variation slowed speech classification. This asymmetry may stem from the fact that speech features have greater functional relevance to listeners, and are thus more difficult to selectively ignore than background noise. Experiment 2 revealed that a recognition cost for words embedded in different types of background noise on the first and second occurrences only emerged when the noise and the speech signal were spectrally overlapping. Together, these data suggest integral processing of speech and background noise, modulated by the level of processing and the spectral separation of the speech and noise.

  13. Digital speech processing using Matlab

    CERN Document Server

    Gopi, E S

    2014-01-01

    Digital Speech Processing Using Matlab deals with digital speech pattern recognition, speech production model, speech feature extraction, and speech compression. The book is written in a manner that is suitable for beginners pursuing basic research in digital speech processing. Matlab illustrations are provided for most topics to enable better understanding of concepts. This book also deals with the basic pattern recognition techniques (illustrated with speech signals using Matlab) such as PCA, LDA, ICA, SVM, HMM, GMM, BPN, and KSOM.

  14. Human phoneme recognition depending on speech-intrinsic variability.

    Science.gov (United States)

    Meyer, Bernd T; Jürgens, Tim; Wesker, Thorsten; Brand, Thomas; Kollmeier, Birger

    2010-11-01

    The influence of different sources of speech-intrinsic variation (speaking rate, effort, style and dialect or accent) on human speech perception was investigated. In listening experiments with 16 listeners, confusions of consonant-vowel-consonant (CVC) and vowel-consonant-vowel (VCV) sounds in speech-weighted noise were analyzed. Experiments were based on the OLLO logatome speech database, which was designed for a man-machine comparison. It contains utterances spoken by 50 speakers from five dialect/accent regions and covers several intrinsic variations. By comparing results depending on intrinsic and extrinsic variations (i.e., different levels of masking noise), the degradation induced by variabilities can be expressed in terms of the SNR. The spectral level distance between the respective speech segment and the long-term spectrum of the masking noise was found to be a good predictor for recognition rates, while phoneme confusions were influenced by the distance to spectrally close phonemes. An analysis based on transmitted information of articulatory features showed that voicing and manner of articulation are comparatively robust cues in the presence of intrinsic variations, whereas the coding of place is more degraded. The database and detailed results have been made available for comparisons between human speech recognition (HSR) and automatic speech recognizers (ASR).

  15. Grasp cueing and joint attention.

    Science.gov (United States)

    Tschentscher, Nadja; Fischer, Martin H

    2008-10-01

    We studied how two different hand posture cues affect joint attention in normal observers. Visual targets appeared over lateralized objects, with different delays after centrally presented hand postures. Attention was cued by either hand direction or the congruency between hand aperture and object size. Participants pressed a button when they detected a target. Direction cues alone facilitated target detection following short delays but aperture cues alone were ineffective. In contrast, when hand postures combined direction and aperture cues, aperture congruency effects without directional congruency effects emerged and persisted, but only for power grips. These results suggest that parallel parameter specification makes joint attention mechanisms exquisitely sensitive to the timing and content of contextual cues.

  16. Compound cueing in free recall

    Science.gov (United States)

    Lohnas, Lynn J.; Kahana, Michael J.

    2013-01-01

    According to the retrieved context theory of episodic memory, the cue for recall of an item is a weighted sum of recently activated cognitive states, including previously recalled and studied items as well as their associations. We show that this theory predicts there should be compound cueing in free recall. Specifically, the temporal contiguity effect should be greater when the two most recently recalled items were studied in contiguous list positions. A meta-analysis of published free recall experiments demonstrates evidence for compound cueing in both conditional response probabilities and inter-response times. To help rule out a rehearsal-based account of these compound cueing effects, we conducted an experiment with immediate, delayed and continual-distractor free recall conditions. Consistent with retrieved context theory but not with a rehearsal-based account, compound cueing was present in all conditions, and was not significantly influenced by the presence of interitem distractors. PMID:23957364

  17. Gesture facilitates the syntactic analysis of speech

    Directory of Open Access Journals (Sweden)

    Henning eHolle

    2012-03-01

    Full Text Available Recent research suggests that the brain routinely binds together information from gesture and speech. However, most of this research focused on the integration of representational gestures with the semantic content of speech. Much less is known about how other aspects of gesture, such as emphasis, influence the interpretation of the syntactic relations in a spoken message. Here, we investigated whether beat gestures alter which syntactic structure is assigned to ambiguous spoken German sentences. The P600 component of the Event Related Brain Potential indicated that the more complex syntactic structure is easier to process when the speaker emphasizes the subject of a sentence with a beat. Thus, a simple flick of the hand can change our interpretation of who has been doing what to whom in a spoken sentence. We conclude that gestures and speech are an integrated system. Unlike previous studies, which have shown that the brain effortlessly integrates semantic information from gesture and speech, our study is the first to demonstrate that this integration also occurs for syntactic information. Moreover, the effect appears to be gesture-specific and was not found for other stimuli that draw attention to certain parts of speech, including prosodic emphasis, or a moving visual stimulus with the same trajectory as the gesture. This suggests that only visual emphasis produced with a communicative intention in mind (that is, beat gestures influences language comprehension, but not a simple visual movement lacking such an intention.

  18. Internet video telephony allows speech reading by deaf individuals and improves speech perception by cochlear implant users.

    Directory of Open Access Journals (Sweden)

    Georgios Mantokoudis

    Full Text Available OBJECTIVE: To analyze speech reading through Internet video calls by profoundly hearing-impaired individuals and cochlear implant (CI users. METHODS: Speech reading skills of 14 deaf adults and 21 CI users were assessed using the Hochmair Schulz Moser (HSM sentence test. We presented video simulations using different video resolutions (1280 × 720, 640 × 480, 320 × 240, 160 × 120 px, frame rates (30, 20, 10, 7, 5 frames per second (fps, speech velocities (three different speakers, webcameras (Logitech Pro9000, C600 and C500 and image/sound delays (0-500 ms. All video simulations were presented with and without sound and in two screen sizes. Additionally, scores for live Skype™ video connection and live face-to-face communication were assessed. RESULTS: Higher frame rate (>7 fps, higher camera resolution (>640 × 480 px and shorter picture/sound delay (<100 ms were associated with increased speech perception scores. Scores were strongly dependent on the speaker but were not influenced by physical properties of the camera optics or the full screen mode. There is a significant median gain of +8.5%pts (p = 0.009 in speech perception for all 21 CI-users if visual cues are additionally shown. CI users with poor open set speech perception scores (n = 11 showed the greatest benefit under combined audio-visual presentation (median speech perception +11.8%pts, p = 0.032. CONCLUSION: Webcameras have the potential to improve telecommunication of hearing-impaired individuals.

  19. Comparison of different speech tasks among adults who stutter and adults who do not stutter

    Directory of Open Access Journals (Sweden)

    Ana Paula Ritto

    2016-03-01

    Full Text Available OBJECTIVES: In this study, we compared the performance of both fluent speakers and people who stutter in three different speaking situations: monologue speech, oral reading and choral reading. This study follows the assumption that the neuromotor control of speech can be influenced by external auditory stimuli in both speakers who stutter and speakers who do not stutter. METHOD: Seventeen adults who stutter and seventeen adults who do not stutter were assessed in three speaking tasks: monologue, oral reading (solo reading aloud and choral reading (reading in unison with the evaluator. Speech fluency and rate were measured for each task. RESULTS: The participants who stuttered had a lower frequency of stuttering during choral reading than during monologue and oral reading. CONCLUSIONS: According to the dual premotor system model, choral speech enhanced fluency by providing external cues for the timing of each syllable compensating for deficient internal cues.

  20. Seeing the talker’s face supports executive processing of speech in steady state noise

    OpenAIRE

    Sushmit eMishra; Thomas eLunner; Thomas eLunner; Thomas eLunner; Stefan eStenfelt; Stefan eStenfelt; Jerker eRönnberg; Mary eRudner

    2013-01-01

    Listening to speech in noise depletes cognitive resources, affecting speech processing. The present study investigated how remaining resources or cognitive spare capacity (CSC) can be deployed by young adults with normal hearing. We administered a test of CSC (CSCT, Mishra et al., 2013) along with a battery of established cognitive tests to 20 participants with normal hearing. In the CSCT, lists of two-digit numbers were presented with and without visual cues in quiet, as well as in steady-st...

  1. Using auditory-visual speech to probe the basis of noise-impaired consonant-vowel perception in dyslexia and auditory neuropathy

    Science.gov (United States)

    Ramirez, Joshua; Mann, Virginia

    2005-08-01

    Both dyslexics and auditory neuropathy (AN) subjects show inferior consonant-vowel (CV) perception in noise, relative to controls. To better understand these impairments, natural acoustic speech stimuli that were masked in speech-shaped noise at various intensities were presented to dyslexic, AN, and control subjects either in isolation or accompanied by visual articulatory cues. AN subjects were expected to benefit from the pairing of visual articulatory cues and auditory CV stimuli, provided that their speech perception impairment reflects a relatively peripheral auditory disorder. Assuming that dyslexia reflects a general impairment of speech processing rather than a disorder of audition, dyslexics were not expected to similarly benefit from an introduction of visual articulatory cues. The results revealed an increased effect of noise masking on the perception of isolated acoustic stimuli by both dyslexic and AN subjects. More importantly, dyslexics showed less effective use of visual articulatory cues in identifying masked speech stimuli and lower visual baseline performance relative to AN subjects and controls. Last, a significant positive correlation was found between reading ability and the ameliorating effect of visual articulatory cues on speech perception in noise. These results suggest that some reading impairments may stem from a central deficit of speech processing.

  2. Salience of Tactile Cues: An Examination of Tactor Actuator and Tactile Cue Characteristics

    Science.gov (United States)

    2015-08-01

    Similarly, tactile alerts can help manage and focus attention in a complex high-tempo multitasked environment. Figure 1, while simple, can serve to...tactile cueing on concurrent performance of military and robotics tasks in a simulated multitasking environment. Ergonomics. 2008;51(8):1137–1152...2007;78(3):338. Moorhead IR, Holmes S, Furnell S. Understanding multisensory integration for pilot spatial orientation. Farnborough (UK): QinetiQ

  3. Global Repetition Influences Contextual Cueing

    Science.gov (United States)

    Zang, Xuelian; Zinchenko, Artyom; Jia, Lina; Li, Hong

    2018-01-01

    Our visual system has a striking ability to improve visual search based on the learning of repeated ambient regularities, an effect named contextual cueing. Whereas most of the previous studies investigated contextual cueing effect with the same number of repeated and non-repeated search displays per block, the current study focused on whether a global repetition frequency formed by different presentation ratios between the repeated and non-repeated configurations influence contextual cueing effect. Specifically, the number of repeated and non-repeated displays presented in each block was manipulated: 12:12, 20:4, 4:20, and 4:4 in Experiments 1–4, respectively. The results revealed a significant contextual cueing effect when the global repetition frequency is high (≥1:1 ratio) in Experiments 1, 2, and 4, given that processing of repeated displays was expedited relative to non-repeated displays. Nevertheless, the contextual cueing effect reduced to a non-significant level when the repetition frequency reduced to 4:20 in Experiment 3. These results suggested that the presentation frequency of repeated relative to the non-repeated displays could influence the strength of contextual cueing. In other words, global repetition statistics could be a crucial factor to mediate contextual cueing effect. PMID:29636716

  4. Prediction and imitation in speech

    Directory of Open Access Journals (Sweden)

    Chiara eGambi

    2013-06-01

    Full Text Available It has been suggested that intra- and inter-speaker variability in speech are correlated. Interlocutors have been shown to converge on various phonetic dimensions. In addition, speakers imitate the phonetic properties of voices they are exposed to in shadowing, repetition, and even passive listening tasks. We review three theoretical accounts of speech imitation and convergence phenomena: (i the Episodic Theory (ET of speech perception and production (Goldinger, 1998; (ii the Motor Theory (MT of speech perception (Liberman and Whalen, 2000;Galantucci et al., 2006 ; (iii Communication Accommodation Theory (CAT; Giles et al., 1991;Giles and Coupland, 1991. We argue that no account is able to explain all the available evidence. In particular, there is a need to integrate low-level, mechanistic accounts (like ET and MT and higher-level accounts (like CAT. We propose that this is possible within the framework of an integrated theory of production and comprehension (Pickering & Garrod, in press. Similarly to both ET and MT, this theory assumes parity between production and perception. Uniquely, however, it posits that listeners simulate speakers’ utterances by computing forward-model predictions at many different levels, which are then compared to the incoming phonetic input. In our account phonetic imitation can be achieved via the same mechanism that is responsible for sensorimotor adaptation; i.e. the correction of prediction errors. In addition, the model assumes that the degree to which sensory prediction errors lead to motor adjustments is context-dependent. The notion of context subsumes both the preceding linguistic input and non-linguistic attributes of the situation (e.g., the speaker’s and listener’s social identities, their conversational roles, the listener’s intention to imitate.

  5. Speech Alarms Pilot Study

    Science.gov (United States)

    Sandor, Aniko; Moses, Haifa

    2016-01-01

    Speech alarms have been used extensively in aviation and included in International Building Codes (IBC) and National Fire Protection Association's (NFPA) Life Safety Code. However, they have not been implemented on space vehicles. Previous studies conducted at NASA JSC showed that speech alarms lead to faster identification and higher accuracy. This research evaluated updated speech and tone alerts in a laboratory environment and in the Human Exploration Research Analog (HERA) in a realistic setup.

  6. Phonological Awareness Intervention for Children with Childhood Apraxia of Speech

    Science.gov (United States)

    Moriarty, Brigid C.; Gillon, Gail T.

    2006-01-01

    Aims: To investigate the effectiveness of an integrated phonological awareness intervention to improve the speech production, phonological awareness and printed word decoding skills for three children with childhood apraxia of speech (CAS) aged 7;3, 6;3 and 6;10. The three children presented with severely delayed phonological awareness skills…

  7. Individual differences in using geometric and featural cues to maintain spatial orientation: cue quantity and cue ambiguity are more important than cue type.

    Science.gov (United States)

    Kelly, Jonathan W; McNamara, Timothy P; Bodenheimer, Bobby; Carr, Thomas H; Rieser, John J

    2009-02-01

    Two experiments explored the role of environmental cues in maintaining spatial orientation (sense of self-location and direction) during locomotion. Of particular interest was the importance of geometric cues (provided by environmental surfaces) and featural cues (nongeometric properties provided by striped walls) in maintaining spatial orientation. Participants performed a spatial updating task within virtual environments containing geometric or featural cues that were ambiguous or unambiguous indicators of self-location and direction. Cue type (geometric or featural) did not affect performance, but the number and ambiguity of environmental cues did. Gender differences, interpreted as a proxy for individual differences in spatial ability and/or experience, highlight the interaction between cue quantity and ambiguity. When environmental cues were ambiguous, men stayed oriented with either one or two cues, whereas women stayed oriented only with two. When environmental cues were unambiguous, women stayed oriented with one cue.

  8. Ear, Hearing and Speech

    DEFF Research Database (Denmark)

    Poulsen, Torben

    2000-01-01

    An introduction is given to the the anatomy and the function of the ear, basic psychoacoustic matters (hearing threshold, loudness, masking), the speech signal and speech intelligibility. The lecture note is written for the course: Fundamentals of Acoustics and Noise Control (51001)......An introduction is given to the the anatomy and the function of the ear, basic psychoacoustic matters (hearing threshold, loudness, masking), the speech signal and speech intelligibility. The lecture note is written for the course: Fundamentals of Acoustics and Noise Control (51001)...

  9. Principles of speech coding

    CERN Document Server

    Ogunfunmi, Tokunbo

    2010-01-01

    It is becoming increasingly apparent that all forms of communication-including voice-will be transmitted through packet-switched networks based on the Internet Protocol (IP). Therefore, the design of modern devices that rely on speech interfaces, such as cell phones and PDAs, requires a complete and up-to-date understanding of the basics of speech coding. Outlines key signal processing algorithms used to mitigate impairments to speech quality in VoIP networksOffering a detailed yet easily accessible introduction to the field, Principles of Speech Coding provides an in-depth examination of the

  10. Speech disorder prevention

    Directory of Open Access Journals (Sweden)

    Miladis Fornaris-Méndez

    2017-04-01

    Full Text Available Language therapy has trafficked from a medical focus until a preventive focus. However, difficulties are evidenced in the development of this last task, because he is devoted bigger space to the correction of the disorders of the language. Because the speech disorders is the dysfunction with more frequently appearance, acquires special importance the preventive work that is developed to avoid its appearance. Speech education since early age of the childhood makes work easier for prevent the appearance of speech disorders in the children. The present work has as objective to offer different activities for the prevention of the speech disorders.

  11. Sensory modality of smoking cues modulates neural cue reactivity.

    Science.gov (United States)

    Yalachkov, Yavor; Kaiser, Jochen; Görres, Andreas; Seehaus, Arne; Naumer, Marcus J

    2013-01-01

    Behavioral experiments have demonstrated that the sensory modality of presentation modulates drug cue reactivity. The present study on nicotine addiction tested whether neural responses to smoking cues are modulated by the sensory modality of stimulus presentation. We measured brain activation using functional magnetic resonance imaging (fMRI) in 15 smokers and 15 nonsmokers while they viewed images of smoking paraphernalia and control objects and while they touched the same objects without seeing them. Haptically presented, smoking-related stimuli induced more pronounced neural cue reactivity than visual cues in the left dorsal striatum in smokers compared to nonsmokers. The severity of nicotine dependence correlated positively with the preference for haptically explored smoking cues in the left inferior parietal lobule/somatosensory cortex, right fusiform gyrus/inferior temporal cortex/cerebellum, hippocampus/parahippocampal gyrus, posterior cingulate cortex, and supplementary motor area. These observations are in line with the hypothesized role of the dorsal striatum for the expression of drug habits and the well-established concept of drug-related automatized schemata, since haptic perception is more closely linked to the corresponding object-specific action pattern than visual perception. Moreover, our findings demonstrate that with the growing severity of nicotine dependence, brain regions involved in object perception, memory, self-processing, and motor control exhibit an increasing preference for haptic over visual smoking cues. This difference was not found for control stimuli. Considering the sensory modality of the presented cues could serve to develop more reliable fMRI-specific biomarkers, more ecologically valid experimental designs, and more effective cue-exposure therapies of addiction.

  12. LSVT LOUD and LSVT BIG: Behavioral Treatment Programs for Speech and Body Movement in Parkinson Disease

    Directory of Open Access Journals (Sweden)

    Cynthia Fox

    2012-01-01

    Full Text Available Recent advances in neuroscience have suggested that exercise-based behavioral treatments may improve function and possibly slow progression of motor symptoms in individuals with Parkinson disease (PD. The LSVT (Lee Silverman Voice Treatment Programs for individuals with PD have been developed and researched over the past 20 years beginning with a focus on the speech motor system (LSVT LOUD and more recently have been extended to address limb motor systems (LSVT BIG. The unique aspects of the LSVT Programs include the combination of (a an exclusive target on increasing amplitude (loudness in the speech motor system; bigger movements in the limb motor system, (b a focus on sensory recalibration to help patients recognize that movements with increased amplitude are within normal limits, even if they feel “too loud” or “too big,” and (c training self-cueing and attention to action to facilitate long-term maintenance of treatment outcomes. In addition, the intensive mode of delivery is consistent with principles that drive activity-dependent neuroplasticity and motor learning. The purpose of this paper is to provide an integrative discussion of the LSVT Programs including the rationale for their fundamentals, a summary of efficacy data, and a discussion of limitations and future directions for research.

  13. Current management for word finding difficulties by speech-language therapists in South African remedial schools.

    Science.gov (United States)

    de Rauville, Ingrid; Chetty, Sandhya; Pahl, Jenny

    2006-01-01

    Word finding difficulties frequently found in learners with language learning difficulties (Casby, 1992) are an integral part of Speech-Language Therapists' management role when working with learning disabled children. This study investigated current management for word finding difficulties by 70 Speech-Language Therapists in South African remedial schools. A descriptive survey design using a quantitative and qualitative approach was used. A questionnaire and follow-up focus group discussion were used to collect data. Results highlighted the use of the Renfrew Word Finding Scale (Renfrew, 1972, 1995) as the most frequently used formal assessment tool. Language sample analysis and discourse analysis were the most frequently used informal assessment procedures. Formal intervention programmes were generally not used. Phonetic, phonemic or phonological cueing were the most frequently used therapeutic strategies. The authors note strengths and raise concerns about current management for word finding difficulties in South African remedial schools, particularly in terms of bilingualism. Opportunities are highlighted regarding the development of assessment and intervention measures relevant to the diverse learning disabled population in South Africa.

  14. Cognitive-linguistic effort in multidisciplinary stroke rehabilitation: Decreasing vs. increasing cues for word retrieval.

    Science.gov (United States)

    Choe, Yu-Kyong; Foster, Tammie; Asselin, Abigail; LeVander, Meagan; Baird, Jennifer

    2017-04-01

    Approximately 24% of stroke survivors experience co-occurring aphasia and hemiparesis. These individuals typically attend back-to-back therapy sessions. However, sequentially scheduled therapy may trigger physical and mental fatigue and have an adverse impact on treatment outcomes. The current study tested a hypothesis that exerting less effort during a therapy session would reduce overall fatigue and enhance functional recovery. Two stroke survivors chronically challenged by non-fluent aphasia and right hemiparesis sequentially completed verbal naming and upper-limb tasks on their home computers. The level of cognitive-linguistic effort in speech/language practice was manipulated by presenting verbal naming tasks in two conditions: Decreasing cues (i.e., most-to-least support for word retrieval), and Increasing cues (i.e., least-to-most support). The participants completed the same upper-limb exercises throughout the study periods. Both individuals showed a statistically significant advantage of decreasing cues over increasing cues in word retrieval during the practice period, but not at the end of the practice period or thereafter. The participant with moderate aphasia and hemiparesis achieved clinically meaningful gains in upper-limb functions following the decreasing cues condition, but not after the increasing cues condition. Preliminary findings from the current study suggest a positive impact of decreasing cues in the context of multidisciplinary stroke rehabilitation.

  15. Global cue inconsistency diminishes learning of cue validity

    Directory of Open Access Journals (Sweden)

    Tony Wang

    2016-11-01

    Full Text Available We present a novel two-stage probabilistic learning task that examines the participants’ ability to learn and utilize valid cues across several levels of probabilistic feedback. In the first stage, participants sample from one of three cues that gives predictive information about the outcome of the second stage. Participants are rewarded for correct prediction of the outcome in stage two. Only one of the three cues gives valid predictive information and thus participants can maximise their reward by learning to sample from the valid cue. The validity of this predictive information, however, is reinforced across several levels of probabilistic feedback. A second manipulation involved changing the consistency of the predictive information in stage one and the outcome in stage two. The results show that participants, with higher probabilistic feedback, learned to utilise the valid cue. In inconsistent task conditions, however, participants were significantly less successful in utilising higher validity cues. We interpret this result as implying that learning in probabilistic categorization is based on developing a representation of the task that allows for goal-directed action.

  16. Frontal and temporal contributions to understanding the iconic co-speech gestures that accompany speech.

    Science.gov (United States)

    Dick, Anthony Steven; Mok, Eva H; Raja Beharelle, Anjali; Goldin-Meadow, Susan; Small, Steven L

    2014-03-01

    In everyday conversation, listeners often rely on a speaker's gestures to clarify any ambiguities in the verbal message. Using fMRI during naturalistic story comprehension, we examined which brain regions in the listener are sensitive to speakers' iconic gestures. We focused on iconic gestures that contribute information not found in the speaker's talk, compared with those that convey information redundant with the speaker's talk. We found that three regions-left inferior frontal gyrus triangular (IFGTr) and opercular (IFGOp) portions, and left posterior middle temporal gyrus (MTGp)--responded more strongly when gestures added information to nonspecific language, compared with when they conveyed the same information in more specific language; in other words, when gesture disambiguated speech as opposed to reinforced it. An increased BOLD response was not found in these regions when the nonspecific language was produced without gesture, suggesting that IFGTr, IFGOp, and MTGp are involved in integrating semantic information across gesture and speech. In addition, we found that activity in the posterior superior temporal sulcus (STSp), previously thought to be involved in gesture-speech integration, was not sensitive to the gesture-speech relation. Together, these findings clarify the neurobiology of gesture-speech integration and contribute to an emerging picture of how listeners glean meaning from gestures that accompany speech. Copyright © 2012 Wiley Periodicals, Inc.

  17. Relative cue encoding in the context of sophisticated models of categorization: Separating information from categorization.

    Science.gov (United States)

    Apfelbaum, Keith S; McMurray, Bob

    2015-08-01

    Traditional studies of human categorization often treat the processes of encoding features and cues as peripheral to the question of how stimuli are categorized. However, in domains where the features and cues are less transparent, how information is encoded prior to categorization may constrain our understanding of the architecture of categorization. This is particularly true in speech perception, where acoustic cues to phonological categories are ambiguous and influenced by multiple factors. Here, it is crucial to consider the joint contributions of the information in the input and the categorization architecture. We contrasted accounts that argue for raw acoustic information encoding with accounts that posit that cues are encoded relative to expectations, and investigated how two categorization architectures-exemplar models and back-propagation parallel distributed processing models-deal with each kind of information. Relative encoding, akin to predictive coding, is a form of noise reduction, so it can be expected to improve model accuracy; however, like predictive coding, the use of relative encoding in speech perception by humans is controversial, so results are compared to patterns of human performance, rather than on the basis of overall accuracy. We found that, for both classes of models, in the vast majority of parameter settings, relative cues greatly helped the models approximate human performance. This suggests that expectation-relative processing is a crucial precursor step in phoneme categorization, and that understanding the information content is essential to understanding categorization processes.

  18. Gaze Cueing by Pareidolia Faces

    Directory of Open Access Journals (Sweden)

    Kohske Takahashi

    2013-12-01

    Full Text Available Visual images that are not faces are sometimes perceived as faces (the pareidolia phenomenon. While the pareidolia phenomenon provides people with a strong impression that a face is present, it is unclear how deeply pareidolia faces are processed as faces. In the present study, we examined whether a shift in spatial attention would be produced by gaze cueing of face-like objects. A robust cueing effect was observed when the face-like objects were perceived as faces. The magnitude of the cueing effect was comparable between the face-like objects and a cartoon face. However, the cueing effect was eliminated when the observer did not perceive the objects as faces. These results demonstrated that pareidolia faces do more than give the impression of the presence of faces; indeed, they trigger an additional face-specific attentional process.

  19. Gaze cueing by pareidolia faces.

    Science.gov (United States)

    Takahashi, Kohske; Watanabe, Katsumi

    2013-01-01

    Visual images that are not faces are sometimes perceived as faces (the pareidolia phenomenon). While the pareidolia phenomenon provides people with a strong impression that a face is present, it is unclear how deeply pareidolia faces are processed as faces. In the present study, we examined whether a shift in spatial attention would be produced by gaze cueing of face-like objects. A robust cueing effect was observed when the face-like objects were perceived as faces. The magnitude of the cueing effect was comparable between the face-like objects and a cartoon face. However, the cueing effect was eliminated when the observer did not perceive the objects as faces. These results demonstrated that pareidolia faces do more than give the impression of the presence of faces; indeed, they trigger an additional face-specific attentional process.

  20. SUSTAINABILITY IN THE BOWELS OF SPEECHES

    Directory of Open Access Journals (Sweden)

    Jadir Mauro Galvao

    2012-10-01

    Full Text Available The theme of sustainability has not yet achieved the feat of make up as an integral part the theoretical medley that brings out our most everyday actions, often visits some of our thoughts and permeates many of our speeches. The big event of 2012, the meeting gathered Rio +20 glances from all corners of the planet around that theme as burning, but we still see forward timidly. Although we have no very clear what the term sustainability closes it does not sound quite strange. Associate with things like ecology, planet, wastes emitted by smokestacks of factories, deforestation, recycling and global warming must be related, but our goal in this article is the least of clarifying the term conceptually and more try to observe as it appears in speeches of such conference. When the competent authorities talk about sustainability relate to what? We intend to investigate the lines and between the lines of these speeches, any assumptions associated with the term. Therefore we will analyze the speech of the People´s Summit, the opening speech of President Dilma and emblematic speech of the President of Uruguay, José Pepe Mujica.

  1. Evaluation of multimodal ground cues

    DEFF Research Database (Denmark)

    Nordahl, Rolf; Lecuyer, Anatole; Serafin, Stefania

    2012-01-01

    This chapter presents an array of results on the perception of ground surfaces via multiple sensory modalities,with special attention to non visual perceptual cues, notably those arising from audition and haptics, as well as interactions between them. It also reviews approaches to combining...... synthetic multimodal cues, from vision, haptics, and audition, in order to realize virtual experiences of walking on simulated ground surfaces or other features....

  2. Visual form Cues, Biological Motions, Auditory Cues, and Even Olfactory Cues Interact to Affect Visual Sex Discriminations

    OpenAIRE

    Rick Van Der Zwan; Anna Brooks; Duncan Blair; Coralia Machatch; Graeme Hacker

    2011-01-01

    Johnson and Tassinary (2005) proposed that visually perceived sex is signalled by structural or form cues. They suggested also that biological motion cues signal sex, but do so indirectly. We previously have shown that auditory cues can mediate visual sex perceptions (van der Zwan et al., 2009). Here we demonstrate that structural cues to body shape are alone sufficient for visual sex discriminations but that biological motion cues alone are not. Interestingly, biological motions can resolve ...

  3. Speech recognition in natural background noise.

    Directory of Open Access Journals (Sweden)

    Julien Meyer

    Full Text Available In the real world, human speech recognition nearly always involves listening in background noise. The impact of such noise on speech signals and on intelligibility performance increases with the separation of the listener from the speaker. The present behavioral experiment provides an overview of the effects of such acoustic disturbances on speech perception in conditions approaching ecologically valid contexts. We analysed the intelligibility loss in spoken word lists with increasing listener-to-speaker distance in a typical low-level natural background noise. The noise was combined with the simple spherical amplitude attenuation due to distance, basically changing the signal-to-noise ratio (SNR. Therefore, our study draws attention to some of the most basic environmental constraints that have pervaded spoken communication throughout human history. We evaluated the ability of native French participants to recognize French monosyllabic words (spoken at 65.3 dB(A, reference at 1 meter at distances between 11 to 33 meters, which corresponded to the SNRs most revealing of the progressive effect of the selected natural noise (-8.8 dB to -18.4 dB. Our results showed that in such conditions, identity of vowels is mostly preserved, with the striking peculiarity of the absence of confusion in vowels. The results also confirmed the functional role of consonants during lexical identification. The extensive analysis of recognition scores, confusion patterns and associated acoustic cues revealed that sonorant, sibilant and burst properties were the most important parameters influencing phoneme recognition. . Altogether these analyses allowed us to extract a resistance scale from consonant recognition scores. We also identified specific perceptual consonant confusion groups depending of the place in the words (onset vs. coda. Finally our data suggested that listeners may access some acoustic cues of the CV transition, opening interesting perspectives for

  4. Speech recognition in natural background noise.

    Science.gov (United States)

    Meyer, Julien; Dentel, Laure; Meunier, Fanny

    2013-01-01

    In the real world, human speech recognition nearly always involves listening in background noise. The impact of such noise on speech signals and on intelligibility performance increases with the separation of the listener from the speaker. The present behavioral experiment provides an overview of the effects of such acoustic disturbances on speech perception in conditions approaching ecologically valid contexts. We analysed the intelligibility loss in spoken word lists with increasing listener-to-speaker distance in a typical low-level natural background noise. The noise was combined with the simple spherical amplitude attenuation due to distance, basically changing the signal-to-noise ratio (SNR). Therefore, our study draws attention to some of the most basic environmental constraints that have pervaded spoken communication throughout human history. We evaluated the ability of native French participants to recognize French monosyllabic words (spoken at 65.3 dB(A), reference at 1 meter) at distances between 11 to 33 meters, which corresponded to the SNRs most revealing of the progressive effect of the selected natural noise (-8.8 dB to -18.4 dB). Our results showed that in such conditions, identity of vowels is mostly preserved, with the striking peculiarity of the absence of confusion in vowels. The results also confirmed the functional role of consonants during lexical identification. The extensive analysis of recognition scores, confusion patterns and associated acoustic cues revealed that sonorant, sibilant and burst properties were the most important parameters influencing phoneme recognition. . Altogether these analyses allowed us to extract a resistance scale from consonant recognition scores. We also identified specific perceptual consonant confusion groups depending of the place in the words (onset vs. coda). Finally our data suggested that listeners may access some acoustic cues of the CV transition, opening interesting perspectives for future studies.

  5. Collective speech acts

    NARCIS (Netherlands)

    Meijers, A.W.M.; Tsohatzidis, S.L.

    2007-01-01

    From its early development in the 1960s, speech act theory always had an individualistic orientation. It focused exclusively on speech acts performed by individual agents. Paradigmatic examples are ‘I promise that p’, ‘I order that p’, and ‘I declare that p’. There is a single speaker and a single

  6. Private Speech in Ballet

    Science.gov (United States)

    Johnston, Dale

    2006-01-01

    Authoritarian teaching practices in ballet inhibit the use of private speech. This paper highlights the critical importance of private speech in the cognitive development of young ballet students, within what is largely a non-verbal art form. It draws upon research by Russian psychologist Lev Vygotsky and contemporary socioculturalists, to…

  7. Free Speech Yearbook 1980.

    Science.gov (United States)

    Kane, Peter E., Ed.

    The 11 articles in this collection deal with theoretical and practical freedom of speech issues. The topics covered are (1) the United States Supreme Court and communication theory; (2) truth, knowledge, and a democratic respect for diversity; (3) denial of freedom of speech in Jock Yablonski's campaign for the presidency of the United Mine…

  8. Illustrated Speech Anatomy.

    Science.gov (United States)

    Shearer, William M.

    Written for students in the fields of speech correction and audiology, the text deals with the following: structures involved in respiration; the skeleton and the processes of inhalation and exhalation; phonation and pitch, the larynx, and esophageal speech; muscles involved in articulation; muscles involved in resonance; and the anatomy of the…

  9. Free Speech. No. 38.

    Science.gov (United States)

    Kane, Peter E., Ed.

    This issue of "Free Speech" contains the following articles: "Daniel Schoor Relieved of Reporting Duties" by Laurence Stern, "The Sellout at CBS" by Michael Harrington, "Defending Dan Schorr" by Tome Wicker, "Speech to the Washington Press Club, February 25, 1976" by Daniel Schorr, "Funds…

  10. Spatial attention triggered by unimodal, crossmodal, and bimodal exogenous cues: a comparison of reflexive orienting mechanisms

    NARCIS (Netherlands)

    Santangelo, Valerio; van der Lubbe, Robert Henricus Johannes; Belardinelli, Marta Olivetti; Postma, Albert

    The aim of this study was to establish whether spatial attention triggered by bimodal exogenous cues acts differently as compared to unimodal and crossmodal exogenous cues due to crossmodal integration. In order to investigate this issue, we examined cuing effects in discrimination tasks and

  11. Do We Perceive Others Better than Ourselves? A Perceptual Benefit for Noise-Vocoded Speech Produced by an Average Speaker.

    Directory of Open Access Journals (Sweden)

    William L Schuerman

    Full Text Available In different tasks involving action perception, performance has been found to be facilitated when the presented stimuli were produced by the participants themselves rather than by another participant. These results suggest that the same mental representations are accessed during both production and perception. However, with regard to spoken word perception, evidence also suggests that listeners' representations for speech reflect the input from their surrounding linguistic community rather than their own idiosyncratic productions. Furthermore, speech perception is heavily influenced by indexical cues that may lead listeners to frame their interpretations of incoming speech signals with regard to speaker identity. In order to determine whether word recognition evinces similar self-advantages as found in action perception, it was necessary to eliminate indexical cues from the speech signal. We therefore asked participants to identify noise-vocoded versions of Dutch words that were based on either their own recordings or those of a statistically average speaker. The majority of participants were more accurate for the average speaker than for themselves, even after taking into account differences in intelligibility. These results suggest that the speech representations accessed during perception of noise-vocoded speech are more reflective of the input of the speech community, and hence that speech perception is not necessarily based on representations of one's own speech.

  12. The Auditory-Visual Speech Benefit on Working Memory in Older Adults with Hearing Impairment.

    Science.gov (United States)

    Frtusova, Jana B; Phillips, Natalie A

    2016-01-01

    This study examined the effect of auditory-visual (AV) speech stimuli on working memory in older adults with poorer-hearing (PH) in comparison to age- and education-matched older adults with better hearing (BH). Participants completed a working memory n-back task (0- to 2-back) in which sequences of digits were presented in visual-only (i.e., speech-reading), auditory-only (A-only), and AV conditions. Auditory event-related potentials (ERP) were collected to assess the relationship between perceptual and working memory processing. The behavioral results showed that both groups were faster in the AV condition in comparison to the unisensory conditions. The ERP data showed perceptual facilitation in the AV condition, in the form of reduced amplitudes and latencies of the auditory N1 and/or P1 components, in the PH group. Furthermore, a working memory ERP component, the P3, peaked earlier for both groups in the AV condition compared to the A-only condition. In general, the PH group showed a more robust AV benefit; however, the BH group showed a dose-response relationship between perceptual facilitation and working memory improvement, especially for facilitation of processing speed. Two measures, reaction time and P3 amplitude, suggested that the presence of visual speech cues may have helped the PH group to counteract the demanding auditory processing, to the level that no group differences were evident during the AV modality despite lower performance during the A-only condition. Overall, this study provides support for the theory of an integrated perceptual-cognitive system. The practical significance of these findings is also discussed.

  13. The Auditory-Visual Speech Benefit on Working Memory in Older Adults with Hearing Impairment

    Directory of Open Access Journals (Sweden)

    Jana B. Frtusova

    2016-04-01

    Full Text Available This study examined the effect of auditory-visual (AV speech stimuli on working memory in hearing impaired participants (HIP in comparison to age- and education-matched normal elderly controls (NEC. Participants completed a working memory n-back task (0- to 2-back in which sequences of digits were presented in visual-only (i.e., speech-reading, auditory-only (A-only, and AV conditions. Auditory event-related potentials (ERP were collected to assess the relationship between perceptual and working memory processing. The behavioural results showed that both groups were faster in the AV condition in comparison to the unisensory conditions. The ERP data showed perceptual facilitation in the AV condition, in the form of reduced amplitudes and latencies of the auditory N1 and/or P1 components, in the HIP group. Furthermore, a working memory ERP component, the P3, peaked earlier for both groups in the AV condition compared to the A-only condition. In general, the HIP group showed a more robust AV benefit; however, the NECs showed a dose-response relationship between perceptual facilitation and working memory improvement, especially for facilitation of processing speed. Two measures, reaction time and P3 amplitude, suggested that the presence of visual speech cues may have helped the HIP to counteract the demanding auditory processing, to the level that no group differences were evident during the AV modality despite lower performance during the A-only condition. Overall, this study provides support for the theory of an integrated perceptual-cognitive system. The practical significance of these findings is also discussed.

  14. Motion Cueing Algorithm Development: Human-Centered Linear and Nonlinear Approaches

    Science.gov (United States)

    Houck, Jacob A. (Technical Monitor); Telban, Robert J.; Cardullo, Frank M.

    2005-01-01

    While the performance of flight simulator motion system hardware has advanced substantially, the development of the motion cueing algorithm, the software that transforms simulated aircraft dynamics into realizable motion commands, has not kept pace. Prior research identified viable features from two algorithms: the nonlinear "adaptive algorithm", and the "optimal algorithm" that incorporates human vestibular models. A novel approach to motion cueing, the "nonlinear algorithm" is introduced that combines features from both approaches. This algorithm is formulated by optimal control, and incorporates a new integrated perception model that includes both visual and vestibular sensation and the interaction between the stimuli. Using a time-varying control law, the matrix Riccati equation is updated in real time by a neurocomputing approach. Preliminary pilot testing resulted in the optimal algorithm incorporating a new otolith model, producing improved motion cues. The nonlinear algorithm vertical mode produced a motion cue with a time-varying washout, sustaining small cues for longer durations and washing out large cues more quickly compared to the optimal algorithm. The inclusion of the integrated perception model improved the responses to longitudinal and lateral cues. False cues observed with the NASA adaptive algorithm were absent. The neurocomputing approach was crucial in that the number of presentations of an input vector could be reduced to meet the real time requirement without degrading the quality of the motion cues.

  15. Network speech systems technology program

    Science.gov (United States)

    Weinstein, C. J.

    1981-09-01

    This report documents work performed during FY 1981 on the DCA-sponsored Network Speech Systems Technology Program. The two areas of work reported are: (1) communication system studies in support of the evolving Defense Switched Network (DSN) and (2) design and implementation of satellite/terrestrial interfaces for the Experimental Integrated Switched Network (EISN). The system studies focus on the development and evaluation of economical and endurable network routing procedures. Satellite/terrestrial interface development includes circuit-switched and packet-switched connections to the experimental wideband satellite network. Efforts in planning and coordination of EISN experiments are reported in detail in a separate EISN Experiment Plan.

  16. A distributed approach to speech resource collection

    CSIR Research Space (South Africa)

    Molapo, R

    2013-12-01

    Full Text Available The authors describe the integration of several tools to enable the end-to-end development of an Automatic Speech Recognition system in a typical under-resourced language. The authors analyse the data acquired by each of the tools and develop an ASR...

  17. Audiovisual Discrimination between Laughter and Speech

    NARCIS (Netherlands)

    Petridis, Stavros; Pantic, Maja

    Past research on automatic laughter detection has focused mainly on audio-based detection. Here we present an audiovisual approach to distinguishing laughter from speech and we show that integrating the information from audio and video leads to an improved reliability of audiovisual approach in

  18. Ordinal models of audiovisual speech perception

    DEFF Research Database (Denmark)

    Andersen, Tobias

    2011-01-01

    Audiovisual information is integrated in speech perception. One manifestation of this is the McGurk illusion in which watching the articulating face alters the auditory phonetic percept. Understanding this phenomenon fully requires a computational model with predictive power. Here, we describe...

  19. GALLAUDET'S NEW HEARING AND SPEECH CENTER.

    Science.gov (United States)

    FRISINA, D. ROBERT

    THIS REPROT DESCRIBES THE DESIGN OF A NEW SPEECH AND HEARING CENTER AND ITS INTEGRATION INTO THE OVERALL ARCHITECTURAL SCHEME OF THE CAMPUS. THE CIRCULAR SHAPE WAS SELECTED TO COMPLEMENT THE SURROUNDING STRUCTURES AND COMPENSATE FOR DIFFERENCES IN SITE, WHILE PROVIDING THE ACOUSTICAL ADVANTAGES OF NON-PARALLEL WALLS, AND FACILITATING TRAFFIC FLOW.…

  20. Musician advantage for speech-on-speech perception

    NARCIS (Netherlands)

    Başkent, Deniz; Gaudrain, Etienne

    Evidence for transfer of musical training to better perception of speech in noise has been mixed. Unlike speech-in-noise, speech-on-speech perception utilizes many of the skills that musical training improves, such as better pitch perception and stream segregation, as well as use of higher-level

  1. Speech Production and Speech Discrimination by Hearing-Impaired Children.

    Science.gov (United States)

    Novelli-Olmstead, Tina; Ling, Daniel

    1984-01-01

    Seven hearing impaired children (five to seven years old) assigned to the Speakers group made highly significant gains in speech production and auditory discrimination of speech, while Listeners made only slight speech production gains and no gains in auditory discrimination. Combined speech and auditory training was more effective than auditory…

  2. SPEECH VISUALIZATION SISTEM AS A BASIS FOR SPEECH TRAINING AND COMMUNICATION AIDS

    Directory of Open Access Journals (Sweden)

    Oliana KRSTEVA

    1997-09-01

    Full Text Available One receives much more information through a visual sense than through a tactile one. However, most visual aids for hearing-impaired persons are not wearable because it is difficult to make them compact and it is not a best way to mask always their vision.Generally it is difficult to get the integrated patterns by a single mathematical transform of signals, such as a Foruier transform. In order to obtain the integrated pattern speech parameters should be carefully extracted by an analysis according as each parameter, and a visual pattern, which can intuitively be understood by anyone, must be synthesized from them. Successful integration of speech parameters will never disturb understanding of individual features, so that the system can be used for speech training and communication.

  3. The Production of Emotional Prosody in Varying Degrees of Severity of Apraxia of Speech.

    Science.gov (United States)

    Van Putten, Steffany M.; Walker, Judy P.

    2003-01-01

    A study examined the abilities of three adults with varying degrees of apraxia of speech (AOS) to produce emotional prosody. Acoustic analyses of the subjects' productions revealed that unlike the control subject, the subjects with AOS did not produce differences in duration and amplitude cues to convey different emotions. (Contains references.)…

  4. Children's Responses to Computer-Synthesized Speech in Educational Media: Gender Consistency and Gender Similarity Effects

    Science.gov (United States)

    Lee, Kwan Min; Liao, Katharine; Ryu, Seoungho

    2007-01-01

    This study examines children's social responses to gender cues in synthesized speech in a computer-based instruction setting. Eighty 5th-grade elementary school children were randomly assigned to one of the conditions in a full-factorial 2 (participant gender) x 2 (voice gender) x 2 (content gender) experiment. Results show that children apply…

  5. Use of "um" in the Deceptive Speech of a Convicted Murderer

    Science.gov (United States)

    Villar, Gina; Arciuli, Joanne; Mallard, David

    2012-01-01

    Previous studies have demonstrated a link between language behaviors and deception; however, questions remain about the role of specific linguistic cues, especially in real-life high-stakes lies. This study investigated use of the so-called filler, "um," in externally verifiable truthful versus deceptive speech of a convicted murderer. The data…

  6. Habituation of adult sea lamprey repeatedly exposed to damage-released alarm and predator cues

    Science.gov (United States)

    Imre, Istvan; Di Rocco, Richard T.; Brown, Grant E.; Johnson, Nicholas

    2016-01-01

    Predation is an unforgiving selective pressure affecting the life history, morphology and behaviour of prey organisms. Selection should favour organisms that have the ability to correctly assess the information content of alarm cues. This study investigated whether adult sea lamprey Petromyzon marinus habituate to conspecific damage-released alarm cues (fresh and decayed sea lamprey extract), a heterospecific damage-released alarm cue (white sucker Catostomus commersoniiextract), predator cues (Northern water snake Nerodia sipedon washing, human saliva and 2-phenylethylamine hydrochloride (PEA HCl)) and a conspecific damage-released alarm cue and predator cue combination (fresh sea lamprey extract and human saliva) after they were pre-exposed 4 times or 8 times, respectively, to a given stimulus the previous night. Consistent with our prediction, adult sea lamprey maintained an avoidance response to conspecific damage-released alarm cues (fresh and decayed sea lamprey extract), a predator cue presented at high relative concentration (PEA HCl) and a conspecific damage-released alarm cue and predator cue combination (fresh sea lamprey extract plus human saliva), irrespective of previous exposure level. As expected, adult sea lamprey habituated to a sympatric heterospecific damage-released alarm cue (white sucker extract) and a predator cue presented at lower relative concentration (human saliva). Adult sea lamprey did not show any avoidance of the Northern water snake washing and the Amazon sailfin catfish extract (heterospecific control). This study suggests that conspecific damage-released alarm cues and PEA HCl present the best options as natural repellents in an integrated management program aimed at controlling the abundance of sea lamprey in the Laurentian Great Lakes.

  7. Inner Speech's Relationship With Overt Speech in Poststroke Aphasia.

    Science.gov (United States)

    Stark, Brielle C; Geva, Sharon; Warburton, Elizabeth A

    2017-09-18

    Relatively preserved inner speech alongside poor overt speech has been documented in some persons with aphasia (PWA), but the relationship of overt speech with inner speech is still largely unclear, as few studies have directly investigated these factors. The present study investigates the relationship of relatively preserved inner speech in aphasia with selected measures of language and cognition. Thirty-eight persons with chronic aphasia (27 men, 11 women; average age 64.53 ± 13.29 years, time since stroke 8-111 months) were classified as having relatively preserved inner and overt speech (n = 21), relatively preserved inner speech with poor overt speech (n = 8), or not classified due to insufficient measurements of inner and/or overt speech (n = 9). Inner speech scores (by group) were correlated with selected measures of language and cognition from the Comprehensive Aphasia Test (Swinburn, Porter, & Al, 2004). The group with poor overt speech showed a significant relationship of inner speech with overt naming (r = .95, p speech and language and cognition factors were not significant for the group with relatively good overt speech. As in previous research, we show that relatively preserved inner speech is found alongside otherwise severe production deficits in PWA. PWA with poor overt speech may rely more on preserved inner speech for overt picture naming (perhaps due to shared resources with verbal working memory) and for written picture description (perhaps due to reliance on inner speech due to perceived task difficulty). Assessments of inner speech may be useful as a standard component of aphasia screening, and therapy focused on improving and using inner speech may prove clinically worthwhile. https://doi.org/10.23641/asha.5303542.

  8. Environmental Contamination of Normal Speech.

    Science.gov (United States)

    Harley, Trevor A.

    1990-01-01

    Environmentally contaminated speech errors (irrelevant words or phrases derived from the speaker's environment and erroneously incorporated into speech) are hypothesized to occur at a high level of speech processing, but with a relatively late insertion point. The data indicate that speech production processes are not independent of other…

  9. Speech misperception: speaking and seeing interfere differently with hearing.

    Directory of Open Access Journals (Sweden)

    Takemi Mochida

    Full Text Available Speech perception is thought to be linked to speech motor production. This linkage is considered to mediate multimodal aspects of speech perception, such as audio-visual and audio-tactile integration. However, direct coupling between articulatory movement and auditory perception has been little studied. The present study reveals a clear dissociation between the effects of a listener's own speech action and the effects of viewing another's speech movements on the perception of auditory phonemes. We assessed the intelligibility of the syllables [pa], [ta], and [ka] when listeners silently and simultaneously articulated syllables that were congruent/incongruent with the syllables they heard. The intelligibility was compared with a condition where the listeners simultaneously watched another's mouth producing congruent/incongruent syllables, but did not articulate. The intelligibility of [ta] and [ka] were degraded by articulating [ka] and [ta] respectively, which are associated with the same primary articulator (tongue as the heard syllables. But they were not affected by articulating [pa], which is associated with a different primary articulator (lips from the heard syllables. In contrast, the intelligibility of [ta] and [ka] was degraded by watching the production of [pa]. These results indicate that the articulatory-induced distortion of speech perception occurs in an articulator-specific manner while visually induced distortion does not. The articulator-specific nature of the auditory-motor interaction in speech perception suggests that speech motor processing directly contributes to our ability to hear speech.

  10. The effect of filtered speech feedback on the frequency of stuttering

    Science.gov (United States)

    Rami, Manish Krishnakant

    2000-10-01

    whispered speech conditions all decreased the frequency of stuttering while the approximate glottal source did not. It is suggested that articulatory events, chiefly the encoded speech output of the vocal tract origin, afford effective cues and induces fluent speech in people who stutter.

  11. How much does language proficiency by non-native listeners influence speech audiometric tests in noise?

    Science.gov (United States)

    Warzybok, Anna; Brand, Thomas; Wagener, Kirsten C; Kollmeier, Birger

    2015-01-01

    The current study investigates the extent to which the linguistic complexity of three commonly employed speech recognition tests and second language proficiency influence speech recognition thresholds (SRTs) in noise in non-native listeners. SRTs were measured for non-natives and natives using three German speech recognition tests: the digit triplet test (DTT), the Oldenburg sentence test (OLSA), and the Göttingen sentence test (GÖSA). Sixty-four non-native and eight native listeners participated. Non-natives can show native-like SRTs in noise only for the linguistically easy speech material (DTT). Furthermore, the limitation of phonemic-acoustical cues in digit triplets affects speech recognition to the same extent in non-natives and natives. For more complex and less familiar speech materials, non-natives, ranging from basic to advanced proficiency in German, require on average 3-dB better signal-to-noise ratio for the OLSA and 6-dB for the GÖSA to obtain 50% speech recognition compared to native listeners. In clinical audiology, SRT measurements with a closed-set speech test (i.e. DTT for screening or OLSA test for clinical purposes) should be used with non-native listeners rather than open-set speech tests (such as the GÖSA or HINT), especially if a closed-set version in the patient's own native language is available.

  12. APPRECIATING SPEECH THROUGH GAMING

    Directory of Open Access Journals (Sweden)

    Mario T Carreon

    2014-06-01

    Full Text Available This paper discusses the Speech and Phoneme Recognition as an Educational Aid for the Deaf and Hearing Impaired (SPREAD application and the ongoing research on its deployment as a tool for motivating deaf and hearing impaired students to learn and appreciate speech. This application uses the Sphinx-4 voice recognition system to analyze the vocalization of the student and provide prompt feedback on their pronunciation. The packaging of the application as an interactive game aims to provide additional motivation for the deaf and hearing impaired student through visual motivation for them to learn and appreciate speech.

  13. Global Freedom of Speech

    DEFF Research Database (Denmark)

    Binderup, Lars Grassme

    2007-01-01

    , as opposed to a legal norm, that curbs exercises of the right to free speech that offend the feelings or beliefs of members from other cultural groups. The paper rejects the suggestion that acceptance of such a norm is in line with liberal egalitarian thinking. Following a review of the classical liberal...... egalitarian reasons for free speech - reasons from overall welfare, from autonomy and from respect for the equality of citizens - it is argued that these reasons outweigh the proposed reasons for curbing culturally offensive speech. Currently controversial cases such as that of the Danish Cartoon Controversy...

  14. Sensorimotor Adaptation Following Exposure to Ambiguous Inertial Motion Cues

    Science.gov (United States)

    Wood, S. J.; Clement, G. R.; Rupert, A. H.; Reschke, M. F.; Harm, D. L.; Guedry, F. E.

    2007-01-01

    The central nervous system must resolve the ambiguity of inertial motion sensory cues in order to derive accurate spatial orientation awareness. Adaptive changes in how inertial cues from the otolith system are integrated with other sensory information lead to perceptual and postural disturbances upon return to Earth s gravity. The primary goals of this ground-based research investigation are to explore physiological mechanisms and operational implications of tilt-translation disturbances during and following re-entry, and to evaluate a tactile prosthesis as a countermeasure for improving control of whole-body orientation during tilt and translation motion.

  15. Fusion of audio and visual cues for laughter detection

    NARCIS (Netherlands)

    Petridis, Stavros; Pantic, Maja

    Past research on automatic laughter detection has focused mainly on audio-based detection. Here we present an audio- visual approach to distinguishing laughter from speech and we show that integrating the information from audio and video channels leads to improved performance over single-modal

  16. Gaze Cueing by Pareidolia Faces

    OpenAIRE

    Kohske Takahashi; Katsumi Watanabe

    2013-01-01

    Visual images that are not faces are sometimes perceived as faces (the pareidolia phenomenon). While the pareidolia phenomenon provides people with a strong impression that a face is present, it is unclear how deeply pareidolia faces are processed as faces. In the present study, we examined whether a shift in spatial attention would be produced by gaze cueing of face-like objects. A robust cueing effect was observed when the face-like objects were perceived as faces. The magnitude of the cuei...

  17. Charisma in business speeches

    DEFF Research Database (Denmark)

    Niebuhr, Oliver; Brem, Alexander; Novák-Tót, Eszter

    2016-01-01

    to business speeches. Consistent with the public opinion, our findings are indicative of Steve Jobs being a more charismatic speaker than Mark Zuckerberg. Beyond previous studies, our data suggest that rhythm and emphatic accentuation are also involved in conveying charisma. Furthermore, the differences...... between Steve Jobs and Mark Zuckerberg and the investor- and customer-related sections of their speeches support the modern understanding of charisma as a gradual, multiparametric, and context-sensitive concept....

  18. Speech spectrum envelope modeling

    Czech Academy of Sciences Publication Activity Database

    Vích, Robert; Vondra, Martin

    Vol. 4775, - (2007), s. 129-137 ISSN 0302-9743. [COST Action 2102 International Workshop. Vietri sul Mare, 29.03.2007-31.03.2007] R&D Projects: GA AV ČR(CZ) 1ET301710509 Institutional research plan: CEZ:AV0Z20670512 Keywords : speech * speech processing * cepstral analysis Subject RIV: JA - Electronics ; Optoelectronics, Electrical Engineering Impact factor: 0.302, year: 2005

  19. Compass cues used by a nocturnal bull ant, Myrmecia midas.

    Science.gov (United States)

    Freas, Cody A; Narendra, Ajay; Cheng, Ken

    2017-05-01

    Ants use both terrestrial landmarks and celestial cues to navigate to and from their nest location. These cues persist even as light levels drop during the twilight/night. Here, we determined the compass cues used by a nocturnal bull ant, Myrmecia midas , in which the majority of individuals begin foraging during the evening twilight period. Myrmecia midas foragers with vectors of ≤5   m when displaced to unfamiliar locations did not follow the home vector, but instead showed random heading directions. Foragers with larger home vectors (≥10   m) oriented towards the fictive nest, indicating a possible increase in cue strength with vector length. When the ants were displaced locally to create a conflict between the home direction indicated by the path integrator and terrestrial landmarks, foragers oriented using landmark information exclusively and ignored any accumulated home vector regardless of vector length. When the visual landmarks at the local displacement site were blocked, foragers were unable to orient to the nest direction and their heading directions were randomly distributed. Myrmecia midas ants typically nest at the base of the tree and some individuals forage on the same tree. Foragers collected on the nest tree during evening twilight were unable to orient towards the nest after small lateral displacements away from the nest. This suggests the possibility of high tree fidelity and an inability to extrapolate landmark compass cues from information collected on the tree and at the nest site to close displacement sites. © 2017. Published by The Company of Biologists Ltd.

  20. Zebra finches can use positional and transitional cues to distinguish vocal element strings.

    Science.gov (United States)

    Chen, Jiani; Ten Cate, Carel

    2015-08-01

    Learning sequences is of great importance to humans and non-human animals. Many motor and mental actions, such as singing in birds and speech processing in humans, rely on sequential learning. At least two mechanisms are considered to be involved in such learning. The chaining theory proposes that learning of sequences relies on memorizing the transitions between adjacent items, while the positional theory suggests that learners encode the items according to their ordinal position in the sequence. Positional learning is assumed to dominate sequential learning. However, human infants exposed to a string of speech sounds can learn transitional (chaining) cues. So far, it is not clear whether birds, an increasingly important model for examining vocal processing, can do this. In this study we use a Go-Nogo design to examine whether zebra finches can use transitional cues to distinguish artificially constructed strings of song elements. Zebra finches were trained with sequences differing in transitional and positional information and next tested with novel strings sharing positional and transitional similarities with the training strings. The results show that they can attend to both transitional and positional cues and that their sequential coding strategies can be biased toward transitional cues depending on the learning context. This article is part of a Special Issue entitled: In Honor of Jerry Hogan. Copyright © 2014 Elsevier B.V. All rights reserved.

  1. Memory for speech and speech for memory.

    Science.gov (United States)

    Locke, J L; Kutz, K J

    1975-03-01

    Thirty kindergarteners, 15 who substituted /w/ for /r/ and 15 with correct articulation, received two perception tests and a memory test that included /w/ and /r/ in minimally contrastive syllables. Although both groups had nearly perfect perception of the experimenter's productions of /w/ and /r/, misarticulating subjects perceived their own tape-recorded w/r productions as /w/. In the memory task these same misarticulating subjects committed significantly more /w/-/r/ confusions in unspoken recall. The discussion considers why people subvocally rehearse; a developmental period in which children do not rehearse; ways subvocalization may aid recall, including motor and acoustic encoding; an echoic store that provides additional recall support if subjects rehearse vocally, and perception of self- and other- produced phonemes by misarticulating children-including its relevance to a motor theory of perception. Evidence is presented that speech for memory can be sufficiently impaired to cause memory disorder. Conceptions that restrict speech disorder to an impairment of communication are challenged.

  2. Adapting Speech Recognition in Augmented Reality for Mobile Devices in Outdoor Environments

    OpenAIRE

    Pascoal, Rui; Ribeiro, Ricardo; Batista, Fernando; de Almeida, Ana

    2017-01-01

    This paper describes the process of integrating automatic speech recognition (ASR) into a mobile application and explores the benefits and challenges of integrating speech with augmented reality (AR) in outdoor environments. The augmented reality allows end-users to interact with the information displayed and perform tasks, while increasing the user’s perception about the real world by adding virtual information to it. Speech is the most natural way of communication: it allows hands-free inte...

  3. Predicting speech intelligibility in conditions with nonlinearly processed noisy speech

    DEFF Research Database (Denmark)

    Jørgensen, Søren; Dau, Torsten

    2013-01-01

    The speech-based envelope power spectrum model (sEPSM; [1]) was proposed in order to overcome the limitations of the classical speech transmission index (STI) and speech intelligibility index (SII). The sEPSM applies the signal-tonoise ratio in the envelope domain (SNRenv), which was demonstrated...... to successfully predict speech intelligibility in conditions with nonlinearly processed noisy speech, such as processing with spectral subtraction. Moreover, a multiresolution version (mr-sEPSM) was demonstrated to account for speech intelligibility in various conditions with stationary and fluctuating...

  4. Using Zebra-speech to study sequential and simultaneous speech segregation in a cochlear-implant simulation.

    Science.gov (United States)

    Gaudrain, Etienne; Carlyon, Robert P

    2013-01-01

    Previous studies have suggested that cochlear implant users may have particular difficulties exploiting opportunities to glimpse clear segments of a target speech signal in the presence of a fluctuating masker. Although it has been proposed that this difficulty is associated with a deficit in linking the glimpsed segments across time, the details of this mechanism are yet to be explained. The present study introduces a method called Zebra-speech developed to investigate the relative contribution of simultaneous and sequential segregation mechanisms in concurrent speech perception, using a noise-band vocoder to simulate cochlear implants. One experiment showed that the saliency of the difference between the target and the masker is a key factor for Zebra-speech perception, as it is for sequential segregation. Furthermore, forward masking played little or no role, confirming that intelligibility was not limited by energetic masking but by across-time linkage abilities. In another experiment, a binaural cue was used to distinguish the target and the masker. It showed that the relative contribution of simultaneous and sequential segregation depended on the spectral resolution, with listeners relying more on sequential segregation when the spectral resolution was reduced. The potential of Zebra-speech as a segregation enhancement strategy for cochlear implants is discussed.

  5. Analysis of engagement behavior in children during dyadic interactions using prosodic cues.

    Science.gov (United States)

    Gupta, Rahul; Bone, Daniel; Lee, Sungbok; Narayanan, Shrikanth

    2016-05-01

    Child engagement is defined as the interaction of a child with his/her environment in a contextually appropriate manner. Engagement behavior in children is linked to socio-emotional and cognitive state assessment with enhanced engagement identified with improved skills. A vast majority of studies however rely solely, and often implicitly, on subjective perceptual measures of engagement. Access to automatic quantification could assist researchers/clinicians to objectively interpret engagement with respect to a target behavior or condition, and furthermore inform mechanisms for improving engagement in various settings. In this paper, we present an engagement prediction system based exclusively on vocal cues observed during structured interaction between a child and a psychologist involving several tasks. Specifically, we derive prosodic cues that capture engagement levels across the various tasks. Our experiments suggest that a child's engagement is reflected not only in the vocalizations, but also in the speech of the interacting psychologist. Moreover, we show that prosodic cues are informative of the engagement phenomena not only as characterized over the entire task (i.e., global cues), but also in short term patterns (i.e., local cues). We perform a classification experiment assigning the engagement of a child into three discrete levels achieving an unweighted average recall of 55.8% (chance is 33.3%). While the systems using global cues and local level cues are each statistically significant in predicting engagement, we obtain the best results after fusing these two components. We perform further analysis of the cues at local and global levels to achieve insights linking specific prosodic patterns to the engagement phenomenon. We observe that while the performance of our model varies with task setting and interacting psychologist, there exist universal prosodic patterns reflective of engagement.

  6. Temporal predictive mechanisms modulate motor reaction time during initiation and inhibition of speech and hand movement.

    Science.gov (United States)

    Johari, Karim; Behroozmand, Roozbeh

    2017-08-01

    Skilled movement is mediated by motor commands executed with extremely fine temporal precision. The question of how the brain incorporates temporal information to perform motor actions has remained unanswered. This study investigated the effect of stimulus temporal predictability on response timing of speech and hand movement. Subjects performed a randomized vowel vocalization or button press task in two counterbalanced blocks in response to temporally-predictable and unpredictable visual cues. Results indicated that speech and hand reaction time was decreased for predictable compared with unpredictable stimuli. This finding suggests that a temporal predictive code is established to capture temporal dynamics of sensory cues in order to produce faster movements in responses to predictable stimuli. In addition, results revealed a main effect of modality, indicating faster hand movement compared with speech. We suggest that this effect is accounted for by the inherent complexity of speech production compared with hand movement. Lastly, we found that movement inhibition was faster than initiation for both hand and speech, suggesting that movement initiation requires a longer processing time to coordinate activities across multiple regions in the brain. These findings provide new insights into the mechanisms of temporal information processing during initiation and inhibition of speech and hand movement. Copyright © 2017 Elsevier B.V. All rights reserved.

  7. Estimating location without external cues.

    Directory of Open Access Journals (Sweden)

    Allen Cheung

    2014-10-01

    Full Text Available The ability to determine one's location is fundamental to spatial navigation. Here, it is shown that localization is theoretically possible without the use of external cues, and without knowledge of initial position or orientation. With only error-prone self-motion estimates as input, a fully disoriented agent can, in principle, determine its location in familiar spaces with 1-fold rotational symmetry. Surprisingly, localization does not require the sensing of any external cue, including the boundary. The combination of self-motion estimates and an internal map of the arena provide enough information for localization. This stands in conflict with the supposition that 2D arenas are analogous to open fields. Using a rodent error model, it is shown that the localization performance which can be achieved is enough to initiate and maintain stable firing patterns like those of grid cells, starting from full disorientation. Successful localization was achieved when the rotational asymmetry was due to the external boundary, an interior barrier or a void space within an arena. Optimal localization performance was found to depend on arena shape, arena size, local and global rotational asymmetry, and the structure of the path taken during localization. Since allothetic cues including visual and boundary contact cues were not present, localization necessarily relied on the fusion of idiothetic self-motion cues and memory of the boundary. Implications for spatial navigation mechanisms are discussed, including possible relationships with place field overdispersion and hippocampal reverse replay. Based on these results, experiments are suggested to identify if and where information fusion occurs in the mammalian spatial memory system.

  8. Blind speech separation system for humanoid robot with FastICA for audio filtering and separation

    Science.gov (United States)

    Budiharto, Widodo; Santoso Gunawan, Alexander Agung

    2016-07-01

    Nowadays, there are many developments in building intelligent humanoid robot, mainly in order to handle voice and image. In this research, we propose blind speech separation system using FastICA for audio filtering and separation that can be used in education or entertainment. Our main problem is to separate the multi speech sources and also to filter irrelevant noises. After speech separation step, the results will be integrated with our previous speech and face recognition system which is based on Bioloid GP robot and Raspberry Pi 2 as controller. The experimental results show the accuracy of our blind speech separation system is about 88% in command and query recognition cases.

  9. Music and Speech Perception in Children Using Sung Speech.

    Science.gov (United States)

    Nie, Yingjiu; Galvin, John J; Morikawa, Michael; André, Victoria; Wheeler, Harley; Fu, Qian-Jie

    2018-01-01

    This study examined music and speech perception in normal-hearing children with some or no musical training. Thirty children (mean age = 11.3 years), 15 with and 15 without formal music training participated in the study. Music perception was measured using a melodic contour identification (MCI) task; stimuli were a piano sample or sung speech with a fixed timbre (same word for each note) or a mixed timbre (different words for each note). Speech perception was measured in quiet and in steady noise using a matrix-styled sentence recognition task; stimuli were naturally intonated speech or sung speech with a fixed pitch (same note for each word) or a mixed pitch (different notes for each word). Significant musician advantages were observed for MCI and speech in noise but not for speech in quiet. MCI performance was significantly poorer with the mixed timbre stimuli. Speech performance in noise was significantly poorer with the fixed or mixed pitch stimuli than with spoken speech. Across all subjects, age at testing and MCI performance were significantly correlated with speech performance in noise. MCI and speech performance in quiet was significantly poorer for children than for adults from a related study using the same stimuli and tasks; speech performance in noise was significantly poorer for young than for older children. Long-term music training appeared to benefit melodic pitch perception and speech understanding in noise in these pediatric listeners.

  10. Evidence for cue-independent spatial representation in the human auditory cortex during active listening.

    Science.gov (United States)

    Higgins, Nathan C; McLaughlin, Susan A; Rinne, Teemu; Stecker, G Christopher

    2017-09-05

    Few auditory functions are as important or as universal as the capacity for auditory spatial awareness (e.g., sound localization). That ability relies on sensitivity to acoustical cues-particularly interaural time and level differences (ITD and ILD)-that correlate with sound-source locations. Under nonspatial listening conditions, cortical sensitivity to ITD and ILD takes the form of broad contralaterally dominated response functions. It is unknown, however, whether that sensitivity reflects representations of the specific physical cues or a higher-order representation of auditory space (i.e., integrated cue processing), nor is it known whether responses to spatial cues are modulated by active spatial listening. To investigate, sensitivity to parametrically varied ITD or ILD cues was measured using fMRI during spatial and nonspatial listening tasks. Task type varied across blocks where targets were presented in one of three dimensions: auditory location, pitch, or visual brightness. Task effects were localized primarily to lateral posterior superior temporal gyrus (pSTG) and modulated binaural-cue response functions differently in the two hemispheres. Active spatial listening (location tasks) enhanced both contralateral and ipsilateral responses in the right hemisphere but maintained or enhanced contralateral dominance in the left hemisphere. Two observations suggest integrated processing of ITD and ILD. First, overlapping regions in medial pSTG exhibited significant sensitivity to both cues. Second, successful classification of multivoxel patterns was observed for both cue types and-critically-for cross-cue classification. Together, these results suggest a higher-order representation of auditory space in the human auditory cortex that at least partly integrates the specific underlying cues.

  11. Near-optimal integration of facial form and motion.

    Science.gov (United States)

    Dobs, Katharina; Ma, Wei Ji; Reddy, Leila

    2017-09-08

    Human perception consists of the continuous integration of sensory cues pertaining to the same object. While it has been fairly well shown that humans use an optimal strategy when integrating low-level cues proportional to their relative reliability, the integration processes underlying high-level perception are much less understood. Here we investigate cue integration in a complex high-level perceptual system, the human face processing system. We tested cue integration of facial form and motion in an identity categorization task and found that an optimal model could successfully predict subjects' identity choices. Our results suggest that optimal cue integration may be implemented across different levels of the visual processing hierarchy.

  12. Practical speech user interface design

    CERN Document Server

    Lewis, James R

    2010-01-01

    Although speech is the most natural form of communication between humans, most people find using speech to communicate with machines anything but natural. Drawing from psychology, human-computer interaction, linguistics, and communication theory, Practical Speech User Interface Design provides a comprehensive yet concise survey of practical speech user interface (SUI) design. It offers practice-based and research-based guidance on how to design effective, efficient, and pleasant speech applications that people can really use. Focusing on the design of speech user interfaces for IVR application

  13. The Influence of Cue Reliability and Cue Representation on Spatial Reorientation in Young Children

    Science.gov (United States)

    Lyons, Ian M.; Huttenlocher, Janellen; Ratliff, Kristin R.

    2014-01-01

    Previous studies of children's reorientation have focused on cue representation (e.g., whether cues are geometric) as a predictor of performance but have not addressed cue reliability (the regularity of the relation between a given cue and an outcome) as a predictor of performance. Here we address both factors within the same series of…

  14. Cues for localization in the horizontal plane

    DEFF Research Database (Denmark)

    Jeppesen, Jakob; Møller, Henrik

    2005-01-01

    manipulated in HRTFs used for binaural synthesis of sound in the horizontal plane. The manipulation of cues resulted in HRTFs with cues ranging from correct combinations of spectral information and ITDs to combinations with severely conflicting cues. Both the ITD and the spectral information seem...

  15. The role of spectral and temporal cues in voice gender discrimination by normal-hearing listeners and cochlear implant users.

    Science.gov (United States)

    Fu, Qian-Jie; Chinchilla, Sherol; Galvin, John J

    2004-09-01

    The present study investigated the relative importance of temporal and spectral cues in voice gender discrimination and vowel recognition by normal-hearing subjects listening to an acoustic simulation of cochlear implant speech processing and by cochlear implant users. In the simulation, the number of speech processing channels ranged from 4 to 32, thereby varying the spectral resolution; the cutoff frequencies of the channels' envelope filters ranged from 20 to 320 Hz, thereby manipulating the available temporal cues. For normal-hearing subjects, results showed that both voice gender discrimination and vowel recognition scores improved as the number of spectral channels was increased. When only 4 spectral channels were available, voice gender discrimination significantly improved as the envelope filter cutoff frequency was increased from 20 to 320 Hz. For all spectral conditions, increasing the amount of temporal information had no significant effect on vowel recognition. Both voice gender discrimination and vowel recognition scores were highly variable among implant users. The performance of cochlear implant listeners was similar to that of normal-hearing subjects listening to comparable speech processing (4-8 spectral channels). The results suggest that both spectral and temporal cues contribute to voice gender discrimination and that temporal cues are especially important for cochlear implant users to identify the voice gender when there is reduced spectral resolution.

  16. Detection of Clinical Depression in Adolescents’ Speech During Family Interactions

    Science.gov (United States)

    Low, Lu-Shih Alex; Maddage, Namunu C.; Lech, Margaret; Sheeber, Lisa B.; Allen, Nicholas B.

    2013-01-01

    The properties of acoustic speech have previously been investigated as possible cues for depression in adults. However, these studies were restricted to small populations of patients and the speech recordings were made during patients’ clinical interviews or fixed-text reading sessions. Symptoms of depression often first appear during adolescence at a time when the voice is changing, in both males and females, suggesting that specific studies of these phenomena in adolescent populations are warranted. This study investigated acoustic correlates of depression in a large sample of 139 adolescents (68 clinically depressed and 71 controls). Speech recordings were made during naturalistic interactions between adolescents and their parents. Prosodic, cepstral, spectral, and glottal features, as well as features derived from the Teager energy operator (TEO), were tested within a binary classification framework. Strong gender differences in classification accuracy were observed. The TEO-based features clearly outperformed all other features and feature combinations, providing classification accuracy ranging between 81%–87% for males and 72%–79% for females. Close, but slightly less accurate, results were obtained by combining glottal features with prosodic and spectral features (67%–69% for males and 70%–75% for females). These findings indicate the importance of nonlinear mechanisms associated with the glottal flow formation as cues for clinical depression. PMID:21075715

  17. Processing changes when listening to foreign-accented speech

    Directory of Open Access Journals (Sweden)

    Carlos eRomero-Rivas

    2015-03-01

    Full Text Available This study investigates the mechanisms responsible for fast changes in processing foreign-accented speech. Event Related brain Potentials (ERPs were obtained while native speakers of Spanish listened to native and foreign-accented speakers of Spanish. We observed a less positive P200 component for foreign-accented speech relative to native speech comprehension. This suggests that the extraction of spectral information and other important acoustic features was hampered during foreign-accented speech comprehension. However, the amplitude of the N400 component for foreign-accented speech comprehension decreased across the experiment, suggesting the use of a higher level, lexical mechanism. Furthermore, during native speech comprehension, semantic violations in the critical words elicited an N400 effect followed by a late positivity. During foreign-accented speech comprehension, semantic violations only elicited an N400 effect. Overall, our results suggest that, despite a lack of improvement in phonetic discrimination, native listeners experience changes at lexical-semantic levels of processing after brief exposure to foreign-accented speech. Moreover, these results suggest that lexical access, semantic integration and linguistic re-analysis processes are permeable to external factors, such as the accent of the speaker.

  18. Internet Video Telephony Allows Speech Reading by Deaf Individuals and Improves Speech Perception by Cochlear Implant Users

    Science.gov (United States)

    Mantokoudis, Georgios; Dähler, Claudia; Dubach, Patrick; Kompis, Martin; Caversaccio, Marco D.; Senn, Pascal

    2013-01-01

    Objective To analyze speech reading through Internet video calls by profoundly hearing-impaired individuals and cochlear implant (CI) users. Methods Speech reading skills of 14 deaf adults and 21 CI users were assessed using the Hochmair Schulz Moser (HSM) sentence test. We presented video simulations using different video resolutions (1280×720, 640×480, 320×240, 160×120 px), frame rates (30, 20, 10, 7, 5 frames per second (fps)), speech velocities (three different speakers), webcameras (Logitech Pro9000, C600 and C500) and image/sound delays (0–500 ms). All video simulations were presented with and without sound and in two screen sizes. Additionally, scores for live Skype™ video connection and live face-to-face communication were assessed. Results Higher frame rate (>7 fps), higher camera resolution (>640×480 px) and shorter picture/sound delay (<100 ms) were associated with increased speech perception scores. Scores were strongly dependent on the speaker but were not influenced by physical properties of the camera optics or the full screen mode. There is a significant median gain of +8.5%pts (p = 0.009) in speech perception for all 21 CI-users if visual cues are additionally shown. CI users with poor open set speech perception scores (n = 11) showed the greatest benefit under combined audio-visual presentation (median speech perception +11.8%pts, p = 0.032). Conclusion Webcameras have the potential to improve telecommunication of hearing-impaired individuals. PMID:23359119

  19. Under-resourced speech recognition based on the speech manifold

    CSIR Research Space (South Africa)

    Sahraeian, R

    2015-09-01

    Full Text Available Conventional acoustic modeling involves estimating many parameters to effectively model feature distributions. The sparseness of speech and text data, however, degrades the reliability of the estimation process and makes speech recognition a...

  20. The role of periodicity in perceiving speech in quiet and in background noise.

    Science.gov (United States)

    Steinmetzger, Kurt; Rosen, Stuart

    2015-12-01

    The ability of normal-hearing listeners to perceive sentences in quiet and in background noise was investigated in a variety of conditions mixing the presence and absence of periodicity (i.e., voicing) in both target and masker. Experiment 1 showed that in quiet, aperiodic noise-vocoded speech and speech with a natural amount of periodicity were equally intelligible, while fully periodic speech was much harder to understand. In Experiments 2 and 3, speech reception thresholds for these targets were measured in the presence of four different maskers: speech-shaped noise, harmonic complexes with a dynamically varying F0 contour, and 10 Hz amplitude-modulated versions of both. For experiment 2, results of experiment 1 were used to identify conditions with equal intelligibility in quiet, while in experiment 3 target intelligibility in quiet was near ceiling. In the presence of a masker, periodicity in the target speech mattered little, but listeners strongly benefited from periodicity in the masker. Substantial fluctuating-masker benefits required the target speech to be almost perfectly intelligible in quiet. In summary, results suggest that the ability to exploit periodicity cues may be an even more important factor when attempting to understand speech embedded in noise than the ability to benefit from masker fluctuations.

  1. The Relationship Between Spectral Modulation Detection and Speech Recognition: Adult Versus Pediatric Cochlear Implant Recipients.

    Science.gov (United States)

    Gifford, René H; Noble, Jack H; Camarata, Stephen M; Sunderhaus, Linsey W; Dwyer, Robert T; Dawant, Benoit M; Dietrich, Mary S; Labadie, Robert F

    2018-01-01

    Adult cochlear implant (CI) recipients demonstrate a reliable relationship between spectral modulation detection and speech understanding. Prior studies documenting this relationship have focused on postlingually deafened adult CI recipients-leaving an open question regarding the relationship between spectral resolution and speech understanding for adults and children with prelingual onset of deafness. Here, we report CI performance on the measures of speech recognition and spectral modulation detection for 578 CI recipients including 477 postlingual adults, 65 prelingual adults, and 36 prelingual pediatric CI users. The results demonstrated a significant correlation between spectral modulation detection and various measures of speech understanding for 542 adult CI recipients. For 36 pediatric CI recipients, however, there was no significant correlation between spectral modulation detection and speech understanding in quiet or in noise nor was spectral modulation detection significantly correlated with listener age or age at implantation. These findings suggest that pediatric CI recipients might not depend upon spectral resolution for speech understanding in the same manner as adult CI recipients. It is possible that pediatric CI users are making use of different cues, such as those contained within the temporal envelope, to achieve high levels of speech understanding. Further investigation is warranted to investigate the relationship between spectral and temporal resolution and speech recognition to describe the underlying mechanisms driving peripheral auditory processing in pediatric CI users.

  2. Speech Alarms Pilot Study

    Science.gov (United States)

    Sandor, A.; Moses, H. R.

    2016-01-01

    Currently on the International Space Station (ISS) and other space vehicles Caution & Warning (C&W) alerts are represented with various auditory tones that correspond to the type of event. This system relies on the crew's ability to remember what each tone represents in a high stress, high workload environment when responding to the alert. Furthermore, crew receive a year or more in advance of the mission that makes remembering the semantic meaning of the alerts more difficult. The current system works for missions conducted close to Earth where ground operators can assist as needed. On long duration missions, however, they will need to work off-nominal events autonomously. There is evidence that speech alarms may be easier and faster to recognize, especially during an off-nominal event. The Information Presentation Directed Research Project (FY07-FY09) funded by the Human Research Program included several studies investigating C&W alerts. The studies evaluated tone alerts currently in use with NASA flight deck displays along with candidate speech alerts. A follow-on study used four types of speech alerts to investigate how quickly various types of auditory alerts with and without a speech component - either at the beginning or at the end of the tone - can be identified. Even though crew were familiar with the tone alert from training or direct mission experience, alerts starting with a speech component were identified faster than alerts starting with a tone. The current study replicated the results from the previous study in a more rigorous experimental design to determine if the candidate speech alarms are ready for transition to operations or if more research is needed. Four types of alarms (caution, warning, fire, and depressurization) were presented to participants in both tone and speech formats in laboratory settings and later in the Human Exploration Research Analog (HERA). In the laboratory study, the alerts were presented by software and participants were

  3. Intelligibility of speech of children with speech and sound disorders

    OpenAIRE

    Ivetac, Tina

    2014-01-01

    The purpose of this study is to examine speech intelligibility of children with primary speech and sound disorders aged 3 to 6 years in everyday life. The research problem is based on the degree to which parents or guardians, immediate family members (sister, brother, grandparents), extended family members (aunt, uncle, cousin), child's friends, other acquaintances, child's teachers and strangers understand the speech of children with speech sound disorders. We examined whether the level ...

  4. Robust Speech/Non-Speech Classification in Heterogeneous Multimedia Content

    NARCIS (Netherlands)

    Huijbregts, M.A.H.; de Jong, Franciska M.G.

    In this paper we present a speech/non-speech classification method that allows high quality classification without the need to know in advance what kinds of audible non-speech events are present in an audio recording and that does not require a single parameter to be tuned on in-domain data. Because

  5. Motor Training: Comparison of Visual and Auditory Coded Proprioceptive Cues

    Directory of Open Access Journals (Sweden)

    Philip Jepson

    2012-05-01

    Full Text Available Self-perception of body posture and movement is achieved through multi-sensory integration, particularly the utilisation of vision, and proprioceptive information derived from muscles and joints. Disruption to these processes can occur following a neurological accident, such as stroke, leading to sensory and physical impairment. Rehabilitation can be helped through use of augmented visual and auditory biofeedback to stimulate neuro-plasticity, but the effective design and application of feedback, particularly in the auditory domain, is non-trivial. Simple auditory feedback was tested by comparing the stepping accuracy of normal subjects when given a visual spatial target (step length and an auditory temporal target (step duration. A baseline measurement of step length and duration was taken using optical motion capture. Subjects (n=20 took 20 ‘training’ steps (baseline ±25% using either an auditory target (950 Hz tone, bell-shaped gain envelope or visual target (spot marked on the floor and were then asked to replicate the target step (length or duration corresponding to training with all feedback removed. Visual cues (mean percentage error=11.5%; SD ± 7.0%; auditory cues (mean percentage error = 12.9%; SD ± 11.8%. Visual cues elicit a high degree of accuracy both in training and follow-up un-cued tasks; despite the novelty of the auditory cues present for subjects, the mean accuracy of subjects approached that for visual cues, and initial results suggest a limited amount of practice using auditory cues can improve performance.

  6. Visual cues for data mining

    Science.gov (United States)

    Rogowitz, Bernice E.; Rabenhorst, David A.; Gerth, John A.; Kalin, Edward B.

    1996-04-01

    This paper describes a set of visual techniques, based on principles of human perception and cognition, which can help users analyze and develop intuitions about tabular data. Collections of tabular data are widely available, including, for example, multivariate time series data, customer satisfaction data, stock market performance data, multivariate profiles of companies and individuals, and scientific measurements. In our approach, we show how visual cues can help users perform a number of data mining tasks, including identifying correlations and interaction effects, finding clusters and understanding the semantics of cluster membership, identifying anomalies and outliers, and discovering multivariate relationships among variables. These cues are derived from psychological studies on perceptual organization, visual search, perceptual scaling, and color perception. These visual techniques are presented as a complement to the statistical and algorithmic methods more commonly associated with these tasks, and provide an interactive interface for the human analyst.

  7. Eliciting nicotine craving with virtual smoking cues.

    Science.gov (United States)

    Gamito, Pedro; Oliveira, Jorge; Baptista, André; Morais, Diogo; Lopes, Paulo; Rosa, Pedro; Santos, Nuno; Brito, Rodrigo

    2014-08-01

    Craving is a strong desire to consume that emerges in every case of substance addiction. Previous studies have shown that eliciting craving with an exposure cues protocol can be a useful option for the treatment of nicotine dependence. Thus, the main goal of this study was to develop a virtual platform in order to induce craving in smokers. Fifty-five undergraduate students were randomly assigned to two different virtual environments: high arousal contextual cues and low arousal contextual cues scenarios (17 smokers with low nicotine dependency were excluded). An eye-tracker system was used to evaluate attention toward these cues. Eye fixation on smoking-related cues differed between smokers and nonsmokers, indicating that smokers focused more often on smoking-related cues than nonsmokers. Self-reports of craving are in agreement with these results and suggest a significant increase in craving after exposure to smoking cues. In sum, these data support the use of virtual environments for eliciting craving.

  8. Experience with a second language affects the use of fundamental frequency in speech segmentation

    Science.gov (United States)

    Broersma, Mirjam; Cho, Taehong; Kim, Sahyang; Martínez-García, Maria Teresa; Connell, Katrina

    2017-01-01

    This study investigates whether listeners’ experience with a second language learned later in life affects their use of fundamental frequency (F0) as a cue to word boundaries in the segmentation of an artificial language (AL), particularly when the cues to word boundaries conflict between the first language (L1) and second language (L2). F0 signals phrase-final (and thus word-final) boundaries in French but word-initial boundaries in English. Participants were functionally monolingual French listeners, functionally monolingual English listeners, bilingual L1-English L2-French listeners, and bilingual L1-French L2-English listeners. They completed the AL-segmentation task with F0 signaling word-final boundaries or without prosodic cues to word boundaries (monolingual groups only). After listening to the AL, participants completed a forced-choice word-identification task in which the foils were either non-words or part-words. The results show that the monolingual French listeners, but not the monolingual English listeners, performed better in the presence of F0 cues than in the absence of such cues. Moreover, bilingual status modulated listeners’ use of F0 cues to word-final boundaries, with bilingual French listeners performing less accurately than monolingual French listeners on both word types but with bilingual English listeners performing more accurately than monolingual English listeners on non-words. These findings not only confirm that speech segmentation is modulated by the L1, but also newly demonstrate that listeners’ experience with the L2 (French or English) affects their use of F0 cues in speech segmentation. This suggests that listeners’ use of prosodic cues to word boundaries is adaptive and non-selective, and can change as a function of language experience. PMID:28738093

  9. Functional connectivity between face-movement and speech-intelligibility areas during auditory-only speech perception.

    Science.gov (United States)

    Schall, Sonja; von Kriegstein, Katharina

    2014-01-01

    It has been proposed that internal simulation of the talking face of visually-known speakers facilitates auditory speech recognition. One prediction of this view is that brain areas involved in auditory-only speech comprehension interact with visual face-movement sensitive areas, even under auditory-only listening conditions. Here, we test this hypothesis using connectivity analyses of functional magnetic resonance imaging (fMRI) data. Participants (17 normal participants, 17 developmental prosopagnosics) first learned six speakers via brief voice-face or voice-occupation training (comprehension. Overall, the present findings indicate that learned visual information is integrated into the analysis of auditory-only speech and that this integration results from the interaction of task-relevant face-movement and auditory speech-sensitive areas.

  10. Innovative Speech Reconstructive Surgery

    OpenAIRE

    Hashem Shemshadi

    2003-01-01

    Proper speech functioning in human being, depends on the precise coordination and timing balances in a series of complex neuro nuscular movements and actions. Starting from the prime organ of energy source of expelled air from respirato y system; deliver such air to trigger vocal cords; swift changes of this phonatory episode to a comprehensible sound in RESONACE and final coordination of all head and neck structures to elicit final speech in ...

  11. The chairman's speech

    International Nuclear Information System (INIS)

    Allen, A.M.

    1986-01-01

    The paper contains a transcript of a speech by the chairman of the UKAEA, to mark the publication of the 1985/6 annual report. The topics discussed in the speech include: the Chernobyl accident and its effect on public attitudes to nuclear power, management and disposal of radioactive waste, the operation of UKAEA as a trading fund, and the UKAEA development programmes. The development programmes include work on the following: fast reactor technology, thermal reactors, reactor safety, health and safety aspects of water cooled reactors, the Joint European Torus, and under-lying research. (U.K.)

  12. The minor third communicates sadness in speech, mirroring its use in music.

    Science.gov (United States)

    Curtis, Meagan E; Bharucha, Jamshed J

    2010-06-01

    There is a long history of attempts to explain why music is perceived as expressing emotion. The relationship between pitches serves as an important cue for conveying emotion in music. The musical interval referred to as the minor third is generally thought to convey sadness. We reveal that the minor third also occurs in the pitch contour of speech conveying sadness. Bisyllabic speech samples conveying four emotions were recorded by 9 actresses. Acoustic analyses revealed that the relationship between the 2 salient pitches of the sad speech samples tended to approximate a minor third. Participants rated the speech samples for perceived emotion, and the use of numerous acoustic parameters as cues for emotional identification was modeled using regression analysis. The minor third was the most reliable cue for identifying sadness. Additional participants rated musical intervals for emotion, and their ratings verified the historical association between the musical minor third and sadness. These findings support the theory that human vocal expressions and music share an acoustic code for communicating sadness.

  13. Visualizing structures of speech expressiveness

    DEFF Research Database (Denmark)

    Herbelin, Bruno; Jensen, Karl Kristoffer; Graugaard, Lars

    2008-01-01

    Speech is both beautiful and informative. In this work, a conceptual study of the speech, through investigation of the tower of Babel, the archetypal phonemes, and a study of the reasons of uses of language is undertaken in order to create an artistic work investigating the nature of speech. The ....... The artwork is presented at the Re:New festival in May 2008....

  14. Experience-Dependency of Reliance on Local Visual and Idiothetic Cues for Spatial Representations Created in the Absence of Distal Information

    Directory of Open Access Journals (Sweden)

    Fabian Draht

    2017-06-01

    Full Text Available Spatial encoding in the hippocampus is based on a range of different input sources. To generate spatial representations, reliable sensory cues from the external environment are integrated with idiothetic cues, derived from self-movement, that enable path integration and directional perception. In this study, we examined to what extent idiothetic cues significantly contribute to spatial representations and navigation: we recorded place cells while rodents navigated towards two visually identical chambers in 180° orientation via two different paths in darkness and in the absence of reliable auditory or olfactory cues. Our goal was to generate a conflict between local visual and direction-specific information, and then to assess which strategy was prioritized in different learning phases. We observed that, in the absence of distal cues, place fields are initially controlled by local visual cues that override idiothetic cues, but that with multiple exposures to the paradigm, spaced at intervals of days, idiothetic cues become increasingly implemented in generating an accurate spatial representation. Taken together, these data support that, in the absence of distal cues, local visual cues are prioritized in the generation of context-specific spatial representations through place cells, whereby idiothetic cues are deemed unreliable. With cumulative exposures to the environments, the animal learns to attend to subtle idiothetic cues to resolve the conflict between visual and direction-specific information.

  15. Experience-Dependency of Reliance on Local Visual and Idiothetic Cues for Spatial Representations Created in the Absence of Distal Information.

    Science.gov (United States)

    Draht, Fabian; Zhang, Sijie; Rayan, Abdelrahman; Schönfeld, Fabian; Wiskott, Laurenz; Manahan-Vaughan, Denise

    2017-01-01

    Spatial encoding in the hippocampus is based on a range of different input sources. To generate spatial representations, reliable sensory cues from the external environment are integrated with idiothetic cues, derived from self-movement, that enable path integration and directional perception. In this study, we examined to what extent idiothetic cues significantly contribute to spatial representations and navigation: we recorded place cells while rodents navigated towards two visually identical chambers in 180° orientation via two different paths in darkness and in the absence of reliable auditory or olfactory cues. Our goal was to generate a conflict between local visual and direction-specific information, and then to assess which strategy was prioritized in different learning phases. We observed that, in the absence of distal cues, place fields are initially controlled by local visual cues that override idiothetic cues, but that with multiple exposures to the paradigm, spaced at intervals of days, idiothetic cues become increasingly implemented in generating an accurate spatial representation. Taken together, these data support that, in the absence of distal cues, local visual cues are prioritized in the generation of context-specific spatial representations through place cells, whereby idiothetic cues are deemed unreliable. With cumulative exposures to the environments, the animal learns to attend to subtle idiothetic cues to resolve the conflict between visual and direction-specific information.

  16. Speech networks at rest and in action: interactions between functional brain networks controlling speech production

    Science.gov (United States)

    Fuertinger, Stefan

    2015-01-01

    Speech production is one of the most complex human behaviors. Although brain activation during speaking has been well investigated, our understanding of interactions between the brain regions and neural networks remains scarce. We combined seed-based interregional correlation analysis with graph theoretical analysis of functional MRI data during the resting state and sentence production in healthy subjects to investigate the interface and topology of functional networks originating from the key brain regions controlling speech, i.e., the laryngeal/orofacial motor cortex, inferior frontal and superior temporal gyri, supplementary motor area, cingulate cortex, putamen, and thalamus. During both resting and speaking, the interactions between these networks were bilaterally distributed and centered on the sensorimotor brain regions. However, speech production preferentially recruited the inferior parietal lobule (IPL) and cerebellum into the large-scale network, suggesting the importance of these regions in facilitation of the transition from the resting state to speaking. Furthermore, the cerebellum (lobule VI) was the most prominent region showing functional influences on speech-network integration and segregation. Although networks were bilaterally distributed, interregional connectivity during speaking was stronger in the left vs. right hemisphere, which may have underlined a more homogeneous overlap between the examined networks in the left hemisphere. Among these, the laryngeal motor cortex (LMC) established a core network that fully overlapped with all other speech-related networks, determining the extent of network interactions. Our data demonstrate complex interactions of large-scale brain networks controlling speech production and point to the critical role of the LMC, IPL, and cerebellum in the formation of speech production network. PMID:25673742

  17. Speech networks at rest and in action: interactions between functional brain networks controlling speech production.

    Science.gov (United States)

    Simonyan, Kristina; Fuertinger, Stefan

    2015-04-01

    Speech production is one of the most complex human behaviors. Although brain activation during speaking has been well investigated, our understanding of interactions between the brain regions and neural networks remains scarce. We combined seed-based interregional correlation analysis with graph theoretical analysis of functional MRI data during the resting state and sentence production in healthy subjects to investigate the interface and topology of functional networks originating from the key brain regions controlling speech, i.e., the laryngeal/orofacial motor cortex, inferior frontal and superior temporal gyri, supplementary motor area, cingulate cortex, putamen, and thalamus. During both resting and speaking, the interactions between these networks were bilaterally distributed and centered on the sensorimotor brain regions. However, speech production preferentially recruited the inferior parietal lobule (IPL) and cerebellum into the large-scale network, suggesting the importance of these regions in facilitation of the transition from the resting state to speaking. Furthermore, the cerebellum (lobule VI) was the most prominent region showing functional influences on speech-network integration and segregation. Although networks were bilaterally distributed, interregional connectivity during speaking was stronger in the left vs. right hemisphere, which may have underlined a more homogeneous overlap between the examined networks in the left hemisphere. Among these, the laryngeal motor cortex (LMC) established a core network that fully overlapped with all other speech-related networks, determining the extent of network interactions. Our data demonstrate complex interactions of large-scale brain networks controlling speech production and point to the critical role of the LMC, IPL, and cerebellum in the formation of speech production network. Copyright © 2015 the American Physiological Society.

  18. Seeing the talker’s face supports executive processing of speech in steady state noise

    Directory of Open Access Journals (Sweden)

    Sushmit eMishra

    2013-11-01

    Full Text Available Listening to speech in noise depletes cognitive resources, affecting speech processing. The present study investigated how remaining resources or cognitive spare capacity (CSC can be deployed by young adults with normal hearing. We administered a test of CSC (CSCT, Mishra et al., 2013 along with a battery of established cognitive tests to 20 participants with normal hearing. In the CSCT, lists of two-digit numbers were presented with and without visual cues in quiet, as well as in steady-state and speech-like noise at a high intelligibility level. In low load conditions, two numbers were recalled according to instructions inducing executive processing (updating, inhibition and in high load conditions the participants were additionally instructed to recall one extra number, which was the always the first item in the list. In line with previous findings, results showed that CSC was sensitive to memory load and executive function but generally not related to working memory capacity. Furthermore, CSCT scores in quiet were lowered by visual cues, probably due to distraction. In steady-state noise, the presence of visual cues improved CSCT scores, probably by enabling better encoding. Contrary to our expectation, CSCT performance was disrupted more in steady-state than speech-like noise, although only without visual cues, possibly because selective attention could be used to ignore the speech-like background and provide an enriched representation of target items in working memory similar to that obtained in quiet. This interpretation is supported by a consistent association between CSCT scores and updating skills.

  19. Seeing the talker's face supports executive processing of speech in steady state noise.

    Science.gov (United States)

    Mishra, Sushmit; Lunner, Thomas; Stenfelt, Stefan; Rönnberg, Jerker; Rudner, Mary

    2013-01-01

    Listening to speech in noise depletes cognitive resources, affecting speech processing. The present study investigated how remaining resources or cognitive spare capacity (CSC) can be deployed by young adults with normal hearing. We administered a test of CSC (CSCT; Mishra et al., 2013) along with a battery of established cognitive tests to 20 participants with normal hearing. In the CSCT, lists of two-digit numbers were presented with and without visual cues in quiet, as well as in steady-state and speech-like noise at a high intelligibility level. In low load conditions, two numbers were recalled according to instructions inducing executive processing (updating, inhibition) and in high load conditions the participants were additionally instructed to recall one extra number, which was the always the first item in the list. In line with previous findings, results showed that CSC was sensitive to memory load and executive function but generally not related to working memory capacity (WMC). Furthermore, CSCT scores in quiet were lowered by visual cues, probably due to distraction. In steady-state noise, the presence of visual cues improved CSCT scores, probably by enabling better encoding. Contrary to our expectation, CSCT performance was disrupted more in steady-state than speech-like noise, although only without visual cues, possibly because selective attention could be used to ignore the speech-like background and provide an enriched representation of target items in working memory similar to that obtained in quiet. This interpretation is supported by a consistent association between CSCT scores and updating skills.

  20. Seeing the talker’s face supports executive processing of speech in steady state noise

    Science.gov (United States)

    Mishra, Sushmit; Lunner, Thomas; Stenfelt, Stefan; Rönnberg, Jerker; Rudner, Mary

    2013-01-01

    Listening to speech in noise depletes cognitive resources, affecting speech processing. The present study investigated how remaining resources or cognitive spare capacity (CSC) can be deployed by young adults with normal hearing. We administered a test of CSC (CSCT; Mishra et al., 2013) along with a battery of established cognitive tests to 20 participants with normal hearing. In the CSCT, lists of two-digit numbers were presented with and without visual cues in quiet, as well as in steady-state and speech-like noise at a high intelligibility level. In low load conditions, two numbers were recalled according to instructions inducing executive processing (updating, inhibition) and in high load conditions the participants were additionally instructed to recall one extra number, which was the always the first item in the list. In line with previous findings, results showed that CSC was sensitive to memory load and executive function but generally not related to working memory capacity (WMC). Furthermore, CSCT scores in quiet were lowered by visual cues, probably due to distraction. In steady-state noise, the presence of visual cues improved CSCT scores, probably by enabling better encoding. Contrary to our expectation, CSCT performance was disrupted more in steady-state than speech-like noise, although only without visual cues, possibly because selective attention could be used to ignore the speech-like background and provide an enriched representation of target items in working memory similar to that obtained in quiet. This interpretation is supported by a consistent association between CSCT scores and updating skills. PMID:24324411

  1. Modern Tools in Patient-Centred Speech Therapy for Romanian Language

    Directory of Open Access Journals (Sweden)

    Mirela Danubianu

    2016-03-01

    Full Text Available The most common way to communicate with those around us is speech. Suffering from a speech disorder can have negative social effects: from leaving the individuals with low confidence and moral to problems with social interaction and the ability to live independently like adults. The speech therapy intervention is a complex process having particular objectives such as: discovery and identification of speech disorder and directing the therapy to correction, recovery, compensation, adaptation and social integration of patients. Computer-based Speech Therapy systems are a real help for therapists by creating a special learning environment. The Romanian language is a phonetic one, with special linguistic particularities. This paper aims to present a few computer-based speech therapy systems developed for the treatment of various speech disorders specific to Romanian language.

  2. A treat for the eyes. An eye-tracking study on children's attention to unhealthy and healthy food cues in media content.

    Science.gov (United States)

    Spielvogel, Ines; Matthes, Jörg; Naderer, Brigitte; Karsay, Kathrin

    2018-06-01

    Based on cue reactivity theory, food cues embedded in media content can lead to physiological and psychological responses in children. Research suggests that unhealthy food cues are represented more extensively and interactively in children's media environments than healthy ones. However, it is not clear to this date whether children react differently to unhealthy compared to healthy food cues. In an experimental study with 56 children (55.4% girls; M age  = 8.00, SD = 1.58), we used eye-tracking to determine children's attention to unhealthy and healthy food cues embedded in a narrative cartoon movie. Besides varying the food type (i.e., healthy vs. unhealthy), we also manipulated the integration levels of food cues with characters (i.e., level of food integration; no interaction vs. handling vs. consumption), and we assessed children's individual susceptibility factors by measuring the impact of their hunger level. Our results indicated that unhealthy food cues attract children's visual attention to a larger extent than healthy cues. However, their initial visual interest did not differ between unhealthy and healthy food cues. Furthermore, an increase in the level of food integration led to an increase in visual attention. Our findings showed no moderating impact of hunger. We conclude that especially unhealthy food cues with an interactive connection trigger cue reactivity in children. Copyright © 2018 Elsevier Ltd. All rights reserved.

  3. The Influence of Surface and Deep Cues on Primary and Secondary School Students' Assessment of Relevance in Web Menus

    Science.gov (United States)

    Rouet, Jean-Francois; Ros, Christine; Goumi, Antonine; Macedo-Rouet, Monica; Dinet, Jerome

    2011-01-01

    Two experiments investigated primary and secondary school students' Web menu selection strategies using simulated Web search tasks. It was hypothesized that students' selections of websites depend on their perception and integration of multiple relevance cues. More specifically, students should be able to disentangle superficial cues (e.g.,…

  4. Subliminal Cueing of Selection Behavior in a Virtual Environment

    OpenAIRE

    Aranyi, Gabor; Kouider, Sid; Lindsay, Alan; Prins, Hielke; Ahmed, Imtiaj; Jacucci, Giulio; Negri, Paolo; Gamberini, Luciano; Pizzi, David; Cavazza, Marc

    2014-01-01

    The performance of current graphics engines makes it possible to incorporate subliminal cues within virtual environments (VEs), providing an additional way of communication,\\ud fully integrated with the exploration of a virtual scene. In order to advance the application of subliminal information in this area, it is necessary to explore how techniques\\ud previously reported as rendering information subliminal in the psychological literature can be successfully implemented in VEs. Previous lite...

  5. Multisensory speech perception without the left superior temporal sulcus.

    Science.gov (United States)

    Baum, Sarah H; Martin, Randi C; Hamilton, A Cris; Beauchamp, Michael S

    2012-09-01

    Converging evidence suggests that the left superior temporal sulcus (STS) is a critical site for multisensory integration of auditory and visual information during speech perception. We report a patient, SJ, who suffered a stroke that damaged the left tempo-parietal area, resulting in mild anomic aphasia. Structural MRI showed complete destruction of the left middle and posterior STS, as well as damage to adjacent areas in the temporal and parietal lobes. Surprisingly, SJ demonstrated preserved multisensory integration measured with two independent tests. First, she perceived the McGurk effect, an illusion that requires integration of auditory and visual speech. Second, her perception of morphed audiovisual speech with ambiguous auditory or visual information was significantly influenced by the opposing modality. To understand the neural basis for this preserved multisensory integration, blood-oxygen level dependent functional magnetic resonance imaging (BOLD fMRI) was used to examine brain responses to audiovisual speech in SJ and 23 healthy age-matched controls. In controls, bilateral STS activity was observed. In SJ, no activity was observed in the damaged left STS but in the right STS, more cortex was active in SJ than in any of the normal controls. Further, the amplitude of the BOLD response in right STS response to McGurk stimuli was significantly greater in SJ than in controls. The simplest explanation of these results is a reorganization of SJ's cortical language networks such that the right STS now subserves multisensory integration of speech. Copyright © 2012 Elsevier Inc. All rights reserved.

  6. Workshop: Welcoming speech

    International Nuclear Information System (INIS)

    Lummerzheim, D.

    1994-01-01

    The welcoming speech underlines the fact that any validation process starting with calculation methods and ending with studies on the long-term behaviour of a repository system can only be effected through laboratory, field and natural-analogue studies. The use of natural analogues (NA) is to secure the biosphere and to verify whether this safety really exists. (HP) [de

  7. Hearing speech in music.

    Science.gov (United States)

    Ekström, Seth-Reino; Borg, Erik

    2011-01-01

    The masking effect of a piano composition, played at different speeds and in different octaves, on speech-perception thresholds was investigated in 15 normal-hearing and 14 moderately-hearing-impaired subjects. Running speech (just follow conversation, JFC) testing and use of hearing aids increased the everyday validity of the findings. A comparison was made with standard audiometric noises [International Collegium of Rehabilitative Audiology (ICRA) noise and speech spectrum-filtered noise (SPN)]. All masking sounds, music or noise, were presented at the same equivalent sound level (50 dBA). The results showed a significant effect of piano performance speed and octave (Ptempo had the largest effect; and high octave and slow tempo, the smallest. Music had a lower masking effect than did ICRA noise with two or six speakers at normal vocal effort (Pmusic offers an interesting opportunity for studying masking under realistic conditions, where spectral and temporal features can be varied independently. The results have implications for composing music with vocal parts, designing acoustic environments and creating a balance between speech perception and privacy in social settings.

  8. Hearing speech in music

    Directory of Open Access Journals (Sweden)

    Seth-Reino Ekström

    2011-01-01

    Full Text Available The masking effect of a piano composition, played at different speeds and in different octaves, on speech-perception thresholds was investigated in 15 normal-hearing and 14 moderately-hearing-impaired subjects. Running speech (just follow conversation, JFC testing and use of hearing aids increased the everyday validity of the findings. A comparison was made with standard audiometric noises [International Collegium of Rehabilitative Audiology (ICRA noise and speech spectrum-filtered noise (SPN]. All masking sounds, music or noise, were presented at the same equivalent sound level (50 dBA. The results showed a significant effect of piano performance speed and octave (P<.01. Low octave and fast tempo had the largest effect; and high octave and slow tempo, the smallest. Music had a lower masking effect than did ICRA noise with two or six speakers at normal vocal effort (P<.01 and SPN (P<.05. Subjects with hearing loss had higher masked thresholds than the normal-hearing subjects (P<.01, but there were smaller differences between masking conditions (P<.01. It is pointed out that music offers an interesting opportunity for studying masking under realistic conditions, where spectral and temporal features can be varied independently. The results have implications for composing music with vocal parts, designing acoustic environments and creating a balance between speech perception and privacy in social settings.

  9. Free Speech Yearbook 1979.

    Science.gov (United States)

    Kane, Peter E., Ed.

    The seven articles in this collection deal with theoretical and practical freedom of speech issues. Topics covered are: the United States Supreme Court, motion picture censorship, and the color line; judicial decision making; the established scientific community's suppression of the ideas of Immanuel Velikovsky; the problems of avant-garde jazz,…

  10. Nobel peace speech

    Directory of Open Access Journals (Sweden)

    Joshua FRYE

    2017-07-01

    Full Text Available The Nobel Peace Prize has long been considered the premier peace prize in the world. According to Geir Lundestad, Secretary of the Nobel Committee, of the 300 some peace prizes awarded worldwide, “none is in any way as well known and as highly respected as the Nobel Peace Prize” (Lundestad, 2001. Nobel peace speech is a unique and significant international site of public discourse committed to articulating the universal grammar of peace. Spanning over 100 years of sociopolitical history on the world stage, Nobel Peace Laureates richly represent an important cross-section of domestic and international issues increasingly germane to many publics. Communication scholars’ interest in this rhetorical genre has increased in the past decade. Yet, the norm has been to analyze a single speech artifact from a prestigious or controversial winner rather than examine the collection of speeches for generic commonalities of import. In this essay, we analyze the discourse of Nobel peace speech inductively and argue that the organizing principle of the Nobel peace speech genre is the repetitive form of normative liberal principles and values that function as rhetorical topoi. These topoi include freedom and justice and appeal to the inviolable, inborn right of human beings to exercise certain political and civil liberties and the expectation of equality of protection from totalitarian and tyrannical abuses. The significance of this essay to contemporary communication theory is to expand our theoretical understanding of rhetoric’s role in the maintenance and development of an international and cross-cultural vocabulary for the grammar of peace.

  11. Awareness of rhythm patterns in speech and music in children with specific language impairments

    Directory of Open Access Journals (Sweden)

    Ruth eCumming

    2015-12-01

    Full Text Available Children with specific language impairments (SLIs show impaired perception and production of language, and also show impairments in perceiving auditory cues to rhythm (amplitude rise time [ART] and sound duration and in tapping to a rhythmic beat. Here we explore potential links between language development and rhythm perception in 45 children with SLI and 50 age-matched controls. We administered three rhythmic tasks, a musical beat detection task, a tapping-to-music task, and a novel music/speech task, which varied rhythm and pitch cues independently or together in both speech and music. Via low-pass filtering, the music sounded as though it was played from a low-quality radio and the speech sounded as though it was muffled (heard behind the door. We report data for all of the SLI children (N = 45, IQ varying, as well as for two independent subgroupings with intact IQ. One subgroup, Pure SLI, had intact phonology and reading (N=16, the other, SLI PPR (N=15, had impaired phonology and reading. When IQ varied (all SLI children, we found significant group differences in all the rhythmic tasks. For the Pure SLI group, there were rhythmic impairments in the tapping task only. For children with SLI and poor phonology (SLI PPR, group differences were found in all of the filtered speech/music AXB tasks. We conclude that difficulties with rhythmic cues in both speech and music are present in children with SLIs, but that some rhythmic measures are more sensitive than others. The data are interpreted within a ‘prosodic phrasing’ hypothesis, and we discuss the potential utility of rhythmic and musical interventions in remediating speech and language difficulties in children.

  12. Retrieval-induced forgetting and interference between cues:Training a cue-outcome association attenuates retrieval by alternative cues

    OpenAIRE

    Ortega-Castro, Nerea; Vadillo Nistal, Miguel

    2013-01-01

    Some researchers have attempted to determine whether situations in which a single cue is paired with several outcomes (A-B, A-C interference or interference between outcomes) involve the same learning and retrieval mechanisms as situations in which several cues are paired with a single outcome (A-B, C-B interference or interference between cues). Interestingly, current research on a related effect, which is known as retrieval-induced forgetting, can illuminate this debate. Most retrieval-indu...

  13. Integration

    DEFF Research Database (Denmark)

    Emerek, Ruth

    2004-01-01

    Bidraget diskuterer de forskellige intergrationsopfattelse i Danmark - og hvad der kan forstås ved vellykket integration......Bidraget diskuterer de forskellige intergrationsopfattelse i Danmark - og hvad der kan forstås ved vellykket integration...

  14. Metaheuristic applications to speech enhancement

    CERN Document Server

    Kunche, Prajna

    2016-01-01

    This book serves as a basic reference for those interested in the application of metaheuristics to speech enhancement. The major goal of the book is to explain the basic concepts of optimization methods and their use in heuristic optimization in speech enhancement to scientists, practicing engineers, and academic researchers in speech processing. The authors discuss why it has been a challenging problem for researchers to develop new enhancement algorithms that aid in the quality and intelligibility of degraded speech. They present powerful optimization methods to speech enhancement that can help to solve the noise reduction problems. Readers will be able to understand the fundamentals of speech processing as well as the optimization techniques, how the speech enhancement algorithms are implemented by utilizing optimization methods, and will be given the tools to develop new algorithms. The authors also provide a comprehensive literature survey regarding the topic.

  15. A glimpsing account of the role of temporal fine structure information in speech recognition.

    Science.gov (United States)

    Apoux, Frédéric; Healy, Eric W

    2013-01-01

    Many behavioral studies have reported a significant decrease in intelligibility when the temporal fine structure (TFS) of a sound mixture is replaced with noise or tones (i.e., vocoder processing). This finding has led to the conclusion that TFS information is critical for speech recognition in noise. How the normal -auditory system takes advantage of the original TFS, however, remains unclear. Three -experiments on the role of TFS in noise are described. All three experiments measured speech recognition in various backgrounds while manipulating the envelope, TFS, or both. One experiment tested the hypothesis that vocoder processing may artificially increase the apparent importance of TFS cues. Another experiment evaluated the relative contribution of the target and masker TFS by disturbing only the TFS of the target or that of the masker. Finally, a last experiment evaluated the -relative contribution of envelope and TFS information. In contrast to previous -studies, however, the original envelope and TFS were both preserved - to some extent - in all conditions. Overall, the experiments indicate a limited influence of TFS and suggest that little speech information is extracted from the TFS. Concomitantly, these experiments confirm that most speech information is carried by the temporal envelope in real-world conditions. When interpreted within the framework of the glimpsing model, the results of these experiments suggest that TFS is primarily used as a grouping cue to select the time-frequency regions -corresponding to the target speech signal.

  16. Unvoiced Speech Recognition Using Tissue-Conductive Acoustic Sensor

    Directory of Open Access Journals (Sweden)

    Heracleous Panikos

    2007-01-01

    Full Text Available We present the use of stethoscope and silicon NAM (nonaudible murmur microphones in automatic speech recognition. NAM microphones are special acoustic sensors, which are attached behind the talker's ear and can capture not only normal (audible speech, but also very quietly uttered speech (nonaudible murmur. As a result, NAM microphones can be applied in automatic speech recognition systems when privacy is desired in human-machine communication. Moreover, NAM microphones show robustness against noise and they might be used in special systems (speech recognition, speech transform, etc. for sound-impaired people. Using adaptation techniques and a small amount of training data, we achieved for a 20 k dictation task a word accuracy for nonaudible murmur recognition in a clean environment. In this paper, we also investigate nonaudible murmur recognition in noisy environments and the effect of the Lombard reflex on nonaudible murmur recognition. We also propose three methods to integrate audible speech and nonaudible murmur recognition using a stethoscope NAM microphone with very promising results.

  17. Unvoiced Speech Recognition Using Tissue-Conductive Acoustic Sensor

    Directory of Open Access Journals (Sweden)

    Hiroshi Saruwatari

    2007-01-01

    Full Text Available We present the use of stethoscope and silicon NAM (nonaudible murmur microphones in automatic speech recognition. NAM microphones are special acoustic sensors, which are attached behind the talker's ear and can capture not only normal (audible speech, but also very quietly uttered speech (nonaudible murmur. As a result, NAM microphones can be applied in automatic speech recognition systems when privacy is desired in human-machine communication. Moreover, NAM microphones show robustness against noise and they might be used in special systems (speech recognition, speech transform, etc. for sound-impaired people. Using adaptation techniques and a small amount of training data, we achieved for a 20 k dictation task a 93.9% word accuracy for nonaudible murmur recognition in a clean environment. In this paper, we also investigate nonaudible murmur recognition in noisy environments and the effect of the Lombard reflex on nonaudible murmur recognition. We also propose three methods to integrate audible speech and nonaudible murmur recognition using a stethoscope NAM microphone with very promising results.

  18. Speaker's voice as a memory cue.

    Science.gov (United States)

    Campeanu, Sandra; Craik, Fergus I M; Alain, Claude

    2015-02-01

    Speaker's voice occupies a central role as the cornerstone of auditory social interaction. Here, we review the evidence suggesting that speaker's voice constitutes an integral context cue in auditory memory. Investigation into the nature of voice representation as a memory cue is essential to understanding auditory memory and the neural correlates which underlie it. Evidence from behavioral and electrophysiological studies suggest that while specific voice reinstatement (i.e., same speaker) often appears to facilitate word memory even without attention to voice at study, the presence of a partial benefit of similar voices between study and test is less clear. In terms of explicit memory experiments utilizing unfamiliar voices, encoding methods appear to play a pivotal role. Voice congruency effects have been found when voice is specifically attended at study (i.e., when relatively shallow, perceptual encoding takes place). These behavioral findings coincide with neural indices of memory performance such as the parietal old/new recollection effect and the late right frontal effect. The former distinguishes between correctly identified old words and correctly identified new words, and reflects voice congruency only when voice is attended at study. Characterization of the latter likely depends upon voice memory, rather than word memory. There is also evidence to suggest that voice effects can be found in implicit memory paradigms. However, the presence of voice effects appears to depend greatly on the task employed. Using a word identification task, perceptual similarity between study and test conditions is, like for explicit memory tests, crucial. In addition, the type of noise employed appears to have a differential effect. While voice effects have been observed when white noise is used at both study and test, using multi-talker babble does not confer the same results. In terms of neuroimaging research modulations, characterization of an implicit memory effect

  19. When to Take a Gesture Seriously: On How We Use and Prioritize Communicative Cues.

    Science.gov (United States)

    Gunter, Thomas C; Weinbrenner, J E Douglas

    2017-08-01

    When people talk, their speech is often accompanied by gestures. Although it is known that co-speech gestures can influence face-to-face communication, it is currently unclear to what extent they are actively used and under which premises they are prioritized to facilitate communication. We investigated these open questions in two experiments that varied how pointing gestures disambiguate the utterances of an interlocutor. Participants, whose event-related brain responses were measured, watched a video, where an actress was interviewed about, for instance, classical literature (e.g., Goethe and Shakespeare). While responding, the actress pointed systematically to the left side to refer to, for example, Goethe, or to the right to refer to Shakespeare. Her final statement was ambiguous and combined with a pointing gesture. The P600 pattern found in Experiment 1 revealed that, when pointing was unreliable, gestures were only monitored for their cue validity and not used for reference tracking related to the ambiguity. However, when pointing was a valid cue (Experiment 2), it was used for reference tracking, as indicated by a reduced N400 for pointing. In summary, these findings suggest that a general prioritization mechanism is in use that constantly monitors and evaluates the use of communicative cues against communicative priors on the basis of accumulated error information.

  20. Evidence for a perception of prosodic cues in bat communication: contact call classification by Megaderma lyra.

    Science.gov (United States)

    Janssen, Simone; Schmidt, Sabine

    2009-07-01

    The perception of prosodic cues in human speech may be rooted in mechanisms common to mammals. The present study explores to what extent bats use rhythm and frequency, typically carrying prosodic information in human speech, for the classification of communication call series. Using a two-alternative, forced choice procedure, we trained Megaderma lyra to discriminate between synthetic contact call series differing in frequency, rhythm on level of calls and rhythm on level of call series, and measured the classification performance for stimuli differing in only one, or two, of the above parameters. A comparison with predictions from models based on one, combinations of two, or all, parameters revealed that the bats based their decision predominantly on frequency and in addition on rhythm on the level of call series, whereas rhythm on level of calls was not taken into account in this paradigm. Moreover, frequency and rhythm on the level of call series were evaluated independently. Our results show that parameters corresponding to prosodic cues in human languages are perceived and evaluated by bats. Thus, these necessary prerequisites for a communication via prosodic structures in mammals have evolved far before human speech.

  1. A frequency bin-wise nonlinear masking algorithm in convolutive mixtures for speech segregation.

    Science.gov (United States)

    Chi, Tai-Shih; Huang, Ching-Wen; Chou, Wen-Sheng

    2012-05-01

    A frequency bin-wise nonlinear masking algorithm is proposed in the spectrogram domain for speech segregation in convolutive mixtures. The contributive weight from each speech source to a time-frequency unit of the mixture spectrogram is estimated by a nonlinear function based on location cues. For each sound source, a non-binary mask is formed from the estimated weights and is multiplied to the mixture spectrogram to extract the sound. Head-related transfer functions (HRTFs) are used to simulate convolutive sound mixtures perceived by listeners. Simulation results show our proposed method outperforms convolutive independent component analysis and degenerate unmixing and estimation technique methods in almost all test conditions.

  2. The impact of brief restriction to articulation on children's subsequent speech production.

    Science.gov (United States)

    Seidl, Amanda; Brosseau-Lapré, Françoise; Goffman, Lisa

    2018-02-01

    This project explored whether disruption of articulation during listening impacts subsequent speech production in 4-yr-olds with and without speech sound disorder (SSD). During novel word learning, typically-developing children showed effects of articulatory disruption as revealed by larger differences between two acoustic cues to a sound contrast, but children with SSD were unaffected by articulatory disruption. Findings suggest that, when typically developing 4-yr-olds experience an articulatory disruption during a listening task, the children's subsequent production is affected. Children with SSD show less influence of articulatory experience during perception, which could be the result of impaired or attenuated ties between perception and articulation.

  3. The influence of spectral characteristics of early reflections on speech intelligibility

    DEFF Research Database (Denmark)

    Arweiler, Iris; Buchholz, Jörg

    2011-01-01

    The auditory system takes advantage of early reflections (ERs) in a room by integrating them with the direct sound (DS) and thereby increasing the effective speech level. In the present paper the benefit from realistic ERs on speech intelligibility in diffuse speech-shaped noise was investigated...... ascribed to their altered spectrum compared to the DS and to the filtering by the torso, head, and pinna. No binaural processing other than a binaural summation effect could be observed....

  4. Targeting extinction and reconsolidation mechanisms to combat the impact of drug cues on addiction.

    Science.gov (United States)

    Taylor, Jane R; Olausson, Peter; Quinn, Jennifer J; Torregrossa, Mary M

    2009-01-01

    Drug addiction is a progressive and compulsive disorder, where recurrent craving and relapse to drug-seeking occur even after long periods of abstinence. A major contributing factor to relapse is drug-associated cues. Here we review behavioral and pharmacological studies outlining novel methods of effective and persistent reductions in cue-induced relapse behavior in animal models. We focus on extinction and reconsolidation of cue-drug associations as the memory processes that are the most likely targets for interventions. Extinction involves the formation of new inhibitory memories rather than memory erasure; thus, it should be possible to facilitate the extinction of cue-drug memories to reduce relapse. We propose that context-dependency of extinction might be altered by mnemonic agents, thereby enhancing the efficacy of cue-exposure therapy as treatment strategy. In contrast, interfering with memory reconsolidation processes can disrupt the integrity or strength of specific cue-drug memories. Reconsolidation is argued to be a distinct process that occurs over a brief time period after memory is reactivated/retrieved - when the memory becomes labile and vulnerable to disruption. Reconsolidation is thought to be an independent, perhaps opposing, process to extinction and disruption of reconsolidation has recently been shown to directly affect subsequent cue-drug memory retrieval in an animal model of relapse. We hypothesize that a combined approach aimed at both enhancing the consolidation of cue-drug extinction and interfering with the reconsolidation of cue-drug memories will have a greater potential for persistently inhibiting cue-induced relapse than either treatment alone.

  5. Within-subjects comparison of the HiRes and Fidelity120 speech processing strategies: speech perception and its relation to place-pitch sensitivity.

    Science.gov (United States)

    Donaldson, Gail S; Dawson, Patricia K; Borden, Lamar Z

    2011-01-01

    Previous studies have confirmed that current steering can increase the number of discriminable pitches available to many cochlear implant (CI) users; however, the ability to perceive additional pitches has not been linked to improved speech perception. The primary goals of this study were to determine (1) whether adult CI users can achieve higher levels of spectral cue transmission with a speech processing strategy that implements current steering (Fidelity120) than with a predecessor strategy (HiRes) and, if so, (2) whether the magnitude of improvement can be predicted from individual differences in place-pitch sensitivity. A secondary goal was to determine whether Fidelity120 supports higher levels of speech recognition in noise than HiRes. A within-subjects repeated measures design evaluated speech perception performance with Fidelity120 relative to HiRes in 10 adult CI users. Subjects used the novel strategy (either HiRes or Fidelity120) for 8 wks during the main study; a subset of five subjects used Fidelity120 for three additional months after the main study. Speech perception was assessed for the spectral cues related to vowel F1 frequency, vowel F2 frequency, and consonant place of articulation; overall transmitted information for vowels and consonants; and sentence recognition in noise. Place-pitch sensitivity was measured for electrode pairs in the apical, middle, and basal regions of the implanted array using a psychophysical pitch-ranking task. With one exception, there was no effect of strategy (HiRes versus Fidelity120) on the speech measures tested, either during the main study (N = 10) or after extended use of Fidelity120 (N = 5). The exception was a small but significant advantage for HiRes over Fidelity120 for consonant perception during the main study. Examination of individual subjects' data revealed that 3 of 10 subjects demonstrated improved perception of one or more spectral cues with Fidelity120 relative to HiRes after 8 wks or longer

  6. Speech Motor Control in Fluent and Dysfluent Speech Production of an Individual with Apraxia of Speech and Broca's Aphasia

    Science.gov (United States)

    van Lieshout, Pascal H. H. M.; Bose, Arpita; Square, Paula A.; Steele, Catriona M.

    2007-01-01

    Apraxia of speech (AOS) is typically described as a motor-speech disorder with clinically well-defined symptoms, but without a clear understanding of the underlying problems in motor control. A number of studies have compared the speech of subjects with AOS to the fluent speech of controls, but only a few have included speech movement data and if…

  7. Post-cueing deficits with maintained cueing benefits in patients with Parkinson's disease dementia

    Directory of Open Access Journals (Sweden)

    Susanne eGräber

    2014-11-01

    Full Text Available In Parkinson’s disease (PD internal cueing mechanisms are impaired leading to symptoms such as like hypokinesia. However external cues can improve movement execution by using cortical resources. These cortical processes can be affected by cognitive decline in dementia.It is still unclear how dementia in PD influences external cueing. We investigated a group of 25 PD patients with dementia (PDD and 25 non-demented PD patients (PDnD matched by age, sex and disease duration in a simple reaction time (SRT task using an additional acoustic cue. PDD patients benefited from the additional cue in similar magnitude as did PDnD patients. However, withdrawal of the cue led to a significantly increased reaction time in the PDD group compared to the PDnD patients. Our results indicate that even PDD patients can benefit from strategies using external cue presentation but the process of cognitive worsening can reduce the effect when cues are withdrawn.

  8. Cue-reactors: individual differences in cue-induced craving after food or smoking abstinence.

    Science.gov (United States)

    Mahler, Stephen V; de Wit, Harriet

    2010-11-10

    Pavlovian conditioning plays a critical role in both drug addiction and binge eating. Recent animal research suggests that certain individuals are highly sensitive to conditioned cues, whether they signal food or drugs. Are certain humans also more reactive to both food and drug cues? We examined cue-induced craving for both cigarettes and food, in the same individuals (n = 15 adult smokers). Subjects viewed smoking-related or food-related images after abstaining from either smoking or eating. Certain individuals reported strong cue-induced craving after both smoking and food cues. That is, subjects who reported strong cue-induced craving for cigarettes also rated stronger cue-induced food craving. In humans, like in nonhumans, there may be a "cue-reactive" phenotype, consisting of individuals who are highly sensitive to conditioned stimuli. This finding extends recent reports from nonhuman studies. Further understanding this subgroup of smokers may allow clinicians to individually tailor therapies for smoking cessation.

  9. Visible propagation from invisible exogenous cueing.

    Science.gov (United States)

    Lin, Zhicheng; Murray, Scott O

    2013-09-20

    Perception and performance is affected not just by what we see but also by what we do not see-inputs that escape our awareness. While conscious processing and unconscious processing have been assumed to be separate and independent, here we report the propagation of unconscious exogenous cueing as determined by conscious motion perception. In a paradigm combining masked exogenous cueing and apparent motion, we show that, when an onset cue was rendered invisible, the unconscious exogenous cueing effect traveled, manifesting at uncued locations (4° apart) in accordance with conscious perception of visual motion; the effect diminished when the cue-to-target distance was 8° apart. In contrast, conscious exogenous cueing manifested in both distances. Further evidence reveals that the unconscious and conscious nonretinotopic effects could not be explained by an attentional gradient, nor by bottom-up, energy-based motion mechanisms, but rather they were subserved by top-down, tracking-based motion mechanisms. We thus term these effects mobile cueing. Taken together, unconscious mobile cueing effects (a) demonstrate a previously unknown degree of flexibility of unconscious exogenous attention; (b) embody a simultaneous dissociation and association of attention and consciousness, in which exogenous attention can occur without cue awareness ("dissociation"), yet at the same time its effect is contingent on conscious motion tracking ("association"); and (c) underscore the interaction of conscious and unconscious processing, providing evidence for an unconscious effect that is not automatic but controlled.

  10. A Framework for Speech Enhancement with Ad Hoc Microphone Arrays

    DEFF Research Database (Denmark)

    Tavakoli, Vincent Mohammad; Jensen, Jesper Rindom; Christensen, Mads Græsbøll

    2016-01-01

    Speech enhancement is vital for improved listening practices. Ad hoc microphone arrays are promising assets for this purpose. Most well-established enhancement techniques with conventional arrays can be adapted into ad hoc scenarios. Despite recent efforts to introduce various ad hoc speech...... enhancement apparatus, a common framework for integration of conventional methods into this new scheme is still missing. This paper establishes such an abstraction based on inter and intra sub-array speech coherencies. Along with measures for signal quality at the input of sub-arrays, a measure of coherency...... is proposed both for sub-array selection in local enhancement approaches, and also for selecting a proper global reference when more than one sub-array are used. Proposed methods within this framework are evaluated with regard to quantitative and qualitative measures, including array gains, the speech...

  11. Hybrid methodological approach to context-dependent speech recognition

    Directory of Open Access Journals (Sweden)

    Dragiša Mišković

    2017-01-01

    Full Text Available Although the importance of contextual information in speech recognition has been acknowledged for a long time now, it has remained clearly underutilized even in state-of-the-art speech recognition systems. This article introduces a novel, methodologically hybrid approach to the research question of context-dependent speech recognition in human–machine interaction. To the extent that it is hybrid, the approach integrates aspects of both statistical and representational paradigms. We extend the standard statistical pattern-matching approach with a cognitively inspired and analytically tractable model with explanatory power. This methodological extension allows for accounting for contextual information which is otherwise unavailable in speech recognition systems, and using it to improve post-processing of recognition hypotheses. The article introduces an algorithm for evaluation of recognition hypotheses, illustrates it for concrete interaction domains, and discusses its implementation within two prototype conversational agents.

  12. The left dorsolateral prefrontal cortex and caudate pathway: New evidence for cue-induced craving of smokers.

    Science.gov (United States)

    Yuan, Kai; Yu, Dahua; Bi, Yanzhi; Wang, Ruonan; Li, Min; Zhang, Yajuan; Dong, Minghao; Zhai, Jinquan; Li, Yangding; Lu, Xiaoqi; Tian, Jie

    2017-09-01

    Although the activation of the prefrontal cortex (PFC) and the striatum had been found in smoking cue induced craving task, whether and how the functional interactions and white matter integrity between these brain regions contribute to craving processing during smoking cue exposure remains unknown. Twenty-five young male smokers and 26 age- and gender-matched nonsmokers participated in the smoking cue-reactivity task. Craving related brain activation was extracted and psychophysiological interactions (PPI) analysis was used to specify the PFC-efferent pathways contributed to smoking cue-induced craving. Diffusion tensor imaging (DTI) and probabilistic tractography was used to explore whether the fiber connectivity strength facilitated functional coupling of the circuit with the smoking cue-induced craving. The PPI analysis revealed the negative functional coupling of the left dorsolateral prefrontal cortex (DLPFC) and the caudate during smoking cue induced craving task, which positively correlated with the craving score. Neither significant activation nor functional connectivity in smoking cue exposure task was detected in nonsmokers. DTI analyses revealed that fiber tract integrity negatively correlated with functional coupling in the DLPFC-caudate pathway and activation of the caudate induced by smoking cue in smokers. Moreover, the relationship between the fiber connectivity integrity of the left DLPFC-caudate and smoking cue induced caudate activation can be fully mediated by functional coupling strength of this circuit in smokers. The present study highlighted the left DLPFC-caudate pathway in smoking cue-induced craving in smokers, which may reflect top-down prefrontal modulation of striatal reward processing in smoking cue induced craving processing. Hum Brain Mapp 38:4644-4656, 2017. © 2017 Wiley Periodicals, Inc. © 2017 Wiley Periodicals, Inc.

  13. [Integrity].

    Science.gov (United States)

    Gómez Rodríguez, Rafael Ángel

    2014-01-01

    To say that someone possesses integrity is to claim that that person is almost predictable about responses to specific situations, that he or she can prudentially judge and to act correctly. There is a closed interrelationship between integrity and autonomy, and the autonomy rests on the deeper moral claim of all humans to integrity of the person. Integrity has two senses of significance for medical ethic: one sense refers to the integrity of the person in the bodily, psychosocial and intellectual elements; and in the second sense, the integrity is the virtue. Another facet of integrity of the person is la integrity of values we cherish and espouse. The physician must be a person of integrity if the integrity of the patient is to be safeguarded. The autonomy has reduced the violations in the past, but the character and virtues of the physician are the ultimate safeguard of autonomy of patient. A field very important in medicine is the scientific research. It is the character of the investigator that determines the moral quality of research. The problem arises when legitimate self-interests are replaced by selfish, particularly when human subjects are involved. The final safeguard of moral quality of research is the character and conscience of the investigator. Teaching must be relevant in the scientific field, but the most effective way to teach virtue ethics is through the example of the a respected scientist.

  14. Audiovisual Temporal Perception in Aging: The Role of Multisensory Integration and Age-Related Sensory Loss.

    Science.gov (United States)

    Brooks, Cassandra J; Chan, Yu Man; Anderson, Andrew J; McKendrick, Allison M

    2018-01-01

    Within each sensory modality, age-related deficits in temporal perception contribute to the difficulties older adults experience when performing everyday tasks. Since perceptual experience is inherently multisensory, older adults also face the added challenge of appropriately integrating or segregating the auditory and visual cues present in our dynamic environment into coherent representations of distinct objects. As such, many studies have investigated how older adults perform when integrating temporal information across audition and vision. This review covers both direct judgments about temporal information (the sound-induced flash illusion, temporal order, perceived synchrony, and temporal rate discrimination) and judgments regarding stimuli containing temporal information (the audiovisual bounce effect and speech perception). Although an age-related increase in integration has been demonstrated on a variety of tasks, research specifically investigating the ability of older adults to integrate temporal auditory and visual cues has produced disparate results. In this short review, we explore what factors could underlie these divergent findings. We conclude that both task-specific differences and age-related sensory loss play a role in the reported disparity in age-related effects on the integration of auditory and visual temporal information.

  15. Audiovisual Temporal Perception in Aging: The Role of Multisensory Integration and Age-Related Sensory Loss

    Science.gov (United States)

    Brooks, Cassandra J.; Chan, Yu Man; Anderson, Andrew J.; McKendrick, Allison M.

    2018-01-01

    Within each sensory modality, age-related deficits in temporal perception contribute to the difficulties older adults experience when performing everyday tasks. Since perceptual experience is inherently multisensory, older adults also face the added challenge of appropriately integrating or segregating the auditory and visual cues present in our dynamic environment into coherent representations of distinct objects. As such, many studies have investigated how older adults perform when integrating temporal information across audition and vision. This review covers both direct judgments about temporal information (the sound-induced flash illusion, temporal order, perceived synchrony, and temporal rate discrimination) and judgments regarding stimuli containing temporal information (the audiovisual bounce effect and speech perception). Although an age-related increase in integration has been demonstrated on a variety of tasks, research specifically investigating the ability of older adults to integrate temporal auditory and visual cues has produced disparate results. In this short review, we explore what factors could underlie these divergent findings. We conclude that both task-specific differences and age-related sensory loss play a role in the reported disparity in age-related effects on the integration of auditory and visual temporal information. PMID:29867415

  16. More than a face: A unified theoretical perspective on nonverbal social cue processing in social anxiety

    Directory of Open Access Journals (Sweden)

    Eva eGilboa-Schechtman

    2013-12-01

    Full Text Available Processing of nonverbal social cues (NVSCs is essential to interpersonal functioning and is particularly relevant to models of social anxiety. This article provides a review of the literature on NVSC processing from the perspective of social rank and affiliation biobehavioral systems, based on functional analysis of human sociality. We examine the potential of this framework for integrating cognitive, interpersonal, and evolutionary accounts of social anxiety. We argue that NVSCs are uniquely suited to rapid and effective conveyance of emotional, motivational, and trait information and that various channels are differentially effective in transmitting such information. First, we review studies on perception of NVSCs through face, voice, and body. We begin with studies that utilized information processing or imaging paradigms to assess NVSC perception. This research demonstrated that social anxiety is associated with biased attention to, and interpretation of, emotional facial expressions and emotional prosody. Findings regarding body and posture remain scarce. Next, we review studies on NVSC expression, which pinpointed links between social anxiety and disturbances in eye gaze, facial expressivity, and vocal properties of spontaneous and planned speech. Again, links between social anxiety and posture were understudied. Although cognitive, interpersonal, and evolutionary theories have described different pathways to social anxiety, all three models focus on interrelations among cognition, subjective experience, and social behavior. NVSC processing and production comprise the juncture where these theories intersect. In light of the conceptualizations emerging from the review, we highlight several directions for future research including focus on NVSCs as indexing reactions to changes in belongingness and social rank, the moderating role of gender, and the therapeutic opportunities offered by embodied cognition to treat social anxiety.

  17. More than a face: a unified theoretical perspective on nonverbal social cue processing in social anxiety

    Science.gov (United States)

    Gilboa-Schechtman, Eva; Shachar-Lavie, Iris

    2013-01-01

    Processing of nonverbal social cues (NVSCs) is essential to interpersonal functioning and is particularly relevant to models of social anxiety. This article provides a review of the literature on NVSC processing from the perspective of social rank and affiliation biobehavioral systems (ABSs), based on functional analysis of human sociality. We examine the potential of this framework for integrating cognitive, interpersonal, and evolutionary accounts of social anxiety. We argue that NVSCs are uniquely suited to rapid and effective conveyance of emotional, motivational, and trait information and that various channels are differentially effective in transmitting such information. First, we review studies on perception of NVSCs through face, voice, and body. We begin with studies that utilized information processing or imaging paradigms to assess NVSC perception. This research demonstrated that social anxiety is associated with biased attention to, and interpretation of, emotional facial expressions (EFEs) and emotional prosody. Findings regarding body and posture remain scarce. Next, we review studies on NVSC expression, which pinpointed links between social anxiety and disturbances in eye gaze, facial expressivity, and vocal properties of spontaneous and planned speech. Again, links between social anxiety and posture were understudied. Although cognitive, interpersonal, and evolutionary theories have described different pathways to social anxiety, all three models focus on interrelations among cognition, subjective experience, and social behavior. NVSC processing and production comprise the juncture where these theories intersect. In light of the conceptualizations emerging from the review, we highlight several directions for future research including focus on NVSCs as indexing reactions to changes in belongingness and social rank, the moderating role of gender, and the therapeutic opportunities offered by embodied cognition to treat social anxiety. PMID:24427129

  18. Predicting automatic speech recognition performance over communication channels from instrumental speech quality and intelligibility scores

    NARCIS (Netherlands)

    Gallardo, L.F.; Möller, S.; Beerends, J.

    2017-01-01

    The performance of automatic speech recognition based on coded-decoded speech heavily depends on the quality of the transmitted signals, determined by channel impairments. This paper examines relationships between speech recognition performance and measurements of speech quality and intelligibility

  19. Hearing faces: how the infant brain matches the face it sees with the speech it hears.

    Science.gov (United States)

    Bristow, Davina; Dehaene-Lambertz, Ghislaine; Mattout, Jeremie; Soares, Catherine; Gliga, Teodora; Baillet, Sylvain; Mangin, Jean-François

    2009-05-01

    Speech is not a purely auditory signal. From around 2 months of age, infants are able to correctly match the vowel they hear with the appropriate articulating face. However, there is no behavioral evidence of integrated audiovisual perception until 4 months of age, at the earliest, when an illusory percept can be created by the fusion of the auditory stimulus and of the facial cues (McGurk effect). To understand how infants initially match the articulatory movements they see with the sounds they hear, we recorded high-density ERPs in response to auditory vowels that followed a congruent or incongruent silently articulating face in 10-week-old infants. In a first experiment, we determined that auditory-visual integration occurs during the early stages of perception as in adults. The mismatch response was similar in timing and in topography whether the preceding vowels were presented visually or aurally. In the second experiment, we studied audiovisual integration in the linguistic (vowel perception) and nonlinguistic (gender perception) domain. We observed a mismatch response for both types of change at similar latencies. Their topographies were significantly different demonstrating that cross-modal integration of these features is computed in parallel by two different networks. Indeed, brain source modeling revealed that phoneme and gender computations were lateralized toward the left and toward the right hemisphere, respectively, suggesting that each hemisphere possesses an early processing bias. We also observed repetition suppression in temporal regions and repetition enhancement in frontal regions. These results underscore how complex and structured is the human cortical organization which sustains communication from the first weeks of life on.

  20. Plant responsiveness to root-root communication of stress cues.

    Science.gov (United States)

    Falik, Omer; Mordoch, Yonat; Ben-Natan, Daniel; Vanunu, Miriam; Goldstein, Oron; Novoplansky, Ariel

    2012-07-01

    Phenotypic plasticity is based on the organism's ability to perceive, integrate and respond to multiple signals and cues informative of environmental opportunities and perils. A growing body of evidence demonstrates that plants are able to adapt to imminent threats by perceiving cues emitted from their damaged neighbours. Here, the hypothesis was tested that unstressed plants are able to perceive and respond to stress cues emitted from their drought- and osmotically stressed neighbours and to induce stress responses in additional unstressed plants. Split-root Pisum sativum, Cynodon dactylon, Digitaria sanguinalis and Stenotaphrum secundatum plants were subjected to osmotic stress or drought while sharing one of their rooting volumes with an unstressed neighbour, which in turn shared its other rooting volume with additional unstressed neighbours. Following the kinetics of stomatal aperture allowed testing for stress responses in both the stressed plants and their unstressed neighbours. In both P. sativum plants and the three wild clonal grasses, infliction of osmotic stress or drought caused stomatal closure in both the stressed plants and in their unstressed neighbours. While both continuous osmotic stress and drought induced prolonged stomatal closure and limited acclimation in stressed plants, their unstressed neighbours habituated to the stress cues and opened their stomata 3-24 h after the beginning of stress induction. The results demonstrate a novel type of plant communication, by which plants might be able to increase their readiness to probable future osmotic and drought stresses. Further work is underway to decipher the identity and mode of operation of the involved communication vectors and to assess the potential ecological costs and benefits of emitting and perceiving drought and osmotic stress cues under various ecological scenarios.

  1. Contextual cueing by global features

    Science.gov (United States)

    Kunar, Melina A.; Flusberg, Stephen J.; Wolfe, Jeremy M.

    2008-01-01

    In visual search tasks, attention can be guided to a target item, appearing amidst distractors, on the basis of simple features (e.g. find the red letter among green). Chun and Jiang’s (1998) “contextual cueing” effect shows that RTs are also speeded if the spatial configuration of items in a scene is repeated over time. In these studies we ask if global properties of the scene can speed search (e.g. if the display is mostly red, then the target is at location X). In Experiment 1a, the overall background color of the display predicted the target location. Here the predictive color could appear 0, 400 or 800 msec in advance of the search array. Mean RTs are faster in predictive than in non-predictive conditions. However, there is little improvement in search slopes. The global color cue did not improve search efficiency. Experiments 1b-1f replicate this effect using different predictive properties (e.g. background orientation/texture, stimuli color etc.). The results show a strong RT effect of predictive background but (at best) only a weak improvement in search efficiency. A strong improvement in efficiency was found, however, when the informative background was presented 1500 msec prior to the onset of the search stimuli and when observers were given explicit instructions to use the cue (Experiment 2). PMID:17355043

  2. Speech is Golden

    DEFF Research Database (Denmark)

    Juel Henrichsen, Peter

    2014-01-01

    on the supply side. The present article reports on a new public action strategy which has taken shape in the course of 2013-14. While Denmark is a small language area, our public sector is well organised and has considerable purchasing power. Across this past year, Danish local authorities have organised around......Most of the Danish municipalities are ready to begin to adopt automatic speech recognition, but at the same time remain nervous following a long series of bad business cases in the recent past. Complaints are voiced over costly licences and low service levels, typical effects of a de facto monopoly...... the speech technology challenge, they have formulated a number of joint questions and new requirements to be met by suppliers and have deliberately worked towards formulating tendering material which will allow fair competition. Public researchers have contributed to this work, including the author...

  3. A configural dominant account of contextual cueing : configural cues are stronger than colour cues

    OpenAIRE

    Kunar, Melina A.; Johnston, Rebecca; Sweetman, Hollie

    2013-01-01

    Previous work has shown that reaction times to find a target in displays that have been repeated are faster than those for displays that have never been seen before. This learning effect, termed “contextual cueing” (CC), has been shown using contexts such as the configuration of the distractors in the display and the background colour. However, it is not clear how these two contexts interact to facilitate search. We investigated this here by comparing the strengths of these two cues when they...

  4. Retrieval of bilingual autobiographical memories: effects of cue language and cue imageability.

    Science.gov (United States)

    Mortensen, Linda; Berntsen, Dorthe; Bohn, Ocke-Schwen

    2015-01-01

    An important issue in theories of bilingual autobiographical memory is whether linguistically encoded memories are represented in language-specific stores or in a common language-independent store. Previous research has found that autobiographical memory retrieval is facilitated when the language of the cue is the same as the language of encoding, consistent with language-specific memory stores. The present study examined whether this language congruency effect is influenced by cue imageability. Danish-English bilinguals retrieved autobiographical memories in response to Danish and English high- or low-imageability cues. Retrieval latencies were shorter to Danish than English cues and shorter to high- than low-imageability cues. Importantly, the cue language effect was stronger for low-than high-imageability cues. To examine the relationship between cue language and the language of internal retrieval, participants identified the language in which the memories were internally retrieved. More memories were retrieved when the cue language was the same as the internal language than when the cue was in the other language, and more memories were identified as being internally retrieved in Danish than English, regardless of the cue language. These results provide further evidence for language congruency effects in bilingual memory and suggest that this effect is influenced by cue imageability.

  5. Multilevel Analysis in Analyzing Speech Data

    Science.gov (United States)

    Guddattu, Vasudeva; Krishna, Y.

    2011-01-01

    The speech produced by human vocal tract is a complex acoustic signal, with diverse applications in phonetics, speech synthesis, automatic speech recognition, speaker identification, communication aids, speech pathology, speech perception, machine translation, hearing research, rehabilitation and assessment of communication disorders and many…

  6. Speech-Language Therapy (For Parents)

    Science.gov (United States)

    ... Staying Safe Videos for Educators Search English Español Speech-Language Therapy KidsHealth / For Parents / Speech-Language Therapy ... most kids with speech and/or language disorders. Speech Disorders, Language Disorders, and Feeding Disorders A speech ...

  7. An oscillopathic approach to developmental dyslexia: From genes to speech processing.

    Science.gov (United States)

    Jiménez-Bravo, Miguel; Marrero, Victoria; Benítez-Burraco, Antonio

    2017-06-30

    Developmental dyslexia is a heterogeneous condition entailing problems with reading and spelling. Several genes have been linked or associated to the disease, many of which contribute to the development and function of brain areas important for auditory and phonological processing. Nonetheless, a clear link between genes, the brain, and the symptoms of dyslexia is still pending. The goal of this paper is contributing to bridge this gap. With this aim, we have focused on how the dyslexic brain fails to process speech sounds and reading cues. We have adopted an oscillatory perspective, according to which dyslexia may result from a deficient integration of different brain rhythms during reading/spellings tasks. Moreover, we show that some candidate genes for this condition are related to brain rhythms. This fresh approach is expected to provide a better understanding of the aetiology and the clinical presentation of developmental dyslexia, but also to achieve an earlier and more accurate diagnosis of the disease. Copyright © 2017 Elsevier B.V. All rights reserved.

  8. The cue is key : design for real-life remembering

    NARCIS (Netherlands)

    Hoven, van den E.A.W.H.; Eggen, J.H.

    2014-01-01

    This paper aims to put the memory cue in the spotlight. We will show how memory cues are incorporated in the area of interaction design. The focus will be on external memory cues: cues that exist outside the human mind but have an internal effect on memory reconstruction. Examples of external cues

  9. [Improving speech comprehension using a new cochlear implant speech processor].

    Science.gov (United States)

    Müller-Deile, J; Kortmann, T; Hoppe, U; Hessel, H; Morsnowski, A

    2009-06-01

    The aim of this multicenter clinical field study was to assess the benefits of the new Freedom 24 sound processor for cochlear implant (CI) users implanted with the Nucleus 24 cochlear implant system. The study included 48 postlingually profoundly deaf experienced CI users who demonstrated speech comprehension performance with their current speech processor on the Oldenburg sentence test (OLSA) in quiet conditions of at least 80% correct scores and who were able to perform adaptive speech threshold testing using the OLSA in noisy conditions. Following baseline measures of speech comprehension performance with their current speech processor, subjects were upgraded to the Freedom 24 speech processor. After a take-home trial period of at least 2 weeks, subject performance was evaluated by measuring the speech reception threshold with the Freiburg multisyllabic word test and speech intelligibility with the Freiburg monosyllabic word test at 50 dB and 70 dB in the sound field. The results demonstrated highly significant benefits for speech comprehension with the new speech processor. Significant benefits for speech comprehension were also demonstrated with the new speech processor when tested in competing background noise.In contrast, use of the Abbreviated Profile of Hearing Aid Benefit (APHAB) did not prove to be a suitably sensitive assessment tool for comparative subjective self-assessment of hearing benefits with each processor. Use of the preprocessing algorithm known as adaptive dynamic range optimization (ADRO) in the Freedom 24 led to additional improvements over the standard upgrade map for speech comprehension in quiet and showed equivalent performance in noise. Through use of the preprocessing beam-forming algorithm BEAM, subjects demonstrated a highly significant improved signal-to-noise ratio for speech comprehension thresholds (i.e., signal-to-noise ratio for 50% speech comprehension scores) when tested with an adaptive procedure using the Oldenburg

  10. Neurophysiology of speech differences in childhood apraxia of speech.

    Science.gov (United States)

    Preston, Jonathan L; Molfese, Peter J; Gumkowski, Nina; Sorcinelli, Andrea; Harwood, Vanessa; Irwin, Julia R; Landi, Nicole

    2014-01-01

    Event-related potentials (ERPs) were recorded during a picture naming task of simple and complex words in children with typical speech and with childhood apraxia of speech (CAS). Results reveal reduced amplitude prior to speaking complex (multisyllabic) words relative to simple (monosyllabic) words for the CAS group over the right hemisphere during a time window thought to reflect phonological encoding of word forms. Group differences were also observed prior to production of spoken tokens regardless of word complexity during a time window just prior to speech onset (thought to reflect motor planning/programming). Results suggest differences in pre-speech neurolinguistic processes.

  11. Distributed acoustic cues for caller identity in macaque vocalization.

    Science.gov (United States)

    Fukushima, Makoto; Doyle, Alex M; Mullarkey, Matthew P; Mishkin, Mortimer; Averbeck, Bruno B

    2015-12-01

    Individual primates can be identified by the sound of their voice. Macaques have demonstrated an ability to discern conspecific identity from a harmonically structured 'coo' call. Voice recognition presumably requires the integrated perception of multiple acoustic features. However, it is unclear how this is achieved, given considerable variability across utterances. Specifically, the extent to which information about caller identity is distributed across multiple features remains elusive. We examined these issues by recording and analysing a large sample of calls from eight macaques. Single acoustic features, including fundamental frequency, duration and Weiner entropy, were informative but unreliable for the statistical classification of caller identity. A combination of multiple features, however, allowed for highly accurate caller identification. A regularized classifier that learned to identify callers from the modulation power spectrum of calls found that specific regions of spectral-temporal modulation were informative for caller identification. These ranges are related to acoustic features such as the call's fundamental frequency and FM sweep direction. We further found that the low-frequency spectrotemporal modulation component contained an indexical cue of the caller body size. Thus, cues for caller identity are distributed across identifiable spectrotemporal components corresponding to laryngeal and supralaryngeal components of vocalizations, and the integration of those cues can enable highly reliable caller identification. Our results demonstrate a clear acoustic basis by which individual macaque vocalizations can be recognized.

  12. Noise and pitch interact during the cortical segregation of concurrent speech.

    Science.gov (United States)

    Bidelman, Gavin M; Yellamsetty, Anusha

    2017-08-01

    Behavioral studies reveal listeners exploit intrinsic differences in voice fundamental frequency (F0) to segregate concurrent speech sounds-the so-called "F0-benefit." More favorable signal-to-noise ratio (SNR) in the environment, an extrinsic acoustic factor, similarly benefits the parsing of simultaneous speech. Here, we examined the neurobiological substrates of these two cues in the perceptual segregation of concurrent speech mixtures. We recorded event-related brain potentials (ERPs) while listeners performed a speeded double-vowel identification task. Listeners heard two concurrent vowels whose F0 differed by zero or four semitones presented in either clean (no noise) or noise-degraded (+5 dB SNR) conditions. Behaviorally, listeners were more accurate in correctly identifying both vowels for larger F0 separations but F0-benefit was more pronounced at more favorable SNRs (i.e., pitch × SNR interaction). Analysis of the ERPs revealed that only the P2 wave (∼200 ms) showed a similar F0 x SNR interaction as behavior and was correlated with listeners' perceptual F0-benefit. Neural classifiers applied to the ERPs further suggested that speech sounds are segregated neurally within 200 ms based on SNR whereas segregation based on pitch occurs later in time (400-700 ms). The earlier timing of extrinsic SNR compared to intrinsic F0-based segregation implies that the cortical extraction of speech from noise is more efficient than differentiating speech based on pitch cues alone, which may recruit additional cortical processes. Findings indicate that noise and pitch differences interact relatively early in cerebral cortex and that the brain arrives at the identities of concurrent speech mixtures as early as ∼200 ms. Copyright © 2017 Elsevier B.V. All rights reserved.

  13. Satisficing in split-second decision making is characterized by strategic cue discounting.

    Science.gov (United States)

    Oh, Hanna; Beck, Jeffrey M; Zhu, Pingping; Sommer, Marc A; Ferrari, Silvia; Egner, Tobias

    2016-12-01

    Much of our real-life decision making is bounded by uncertain information, limitations in cognitive resources, and a lack of time to allocate to the decision process. It is thought that humans overcome these limitations through satisficing, fast but "good-enough" heuristic decision making that prioritizes some sources of information (cues) while ignoring others. However, the decision-making strategies we adopt under uncertainty and time pressure, for example during emergencies that demand split-second choices, are presently unknown. To characterize these decision strategies quantitatively, the present study examined how people solve a novel multicue probabilistic classification task under varying time pressure, by tracking shifts in decision strategies using variational Bayesian inference. We found that under low time pressure, participants correctly weighted and integrated all available cues to arrive at near-optimal decisions. With increasingly demanding, subsecond time pressures, however, participants systematically discounted a subset of the cue information by dropping the least informative cue(s) from their decision making process. Thus, the human cognitive apparatus copes with uncertainty and severe time pressure by adopting a "drop-the-worst" cue decision making strategy that minimizes cognitive time and effort investment while preserving the consideration of the most diagnostic cue information, thus maintaining "good-enough" accuracy. This advance in our understanding of satisficing strategies could form the basis of predicting human choices in high time pressure scenarios. (PsycINFO Database Record (c) 2016 APA, all rights reserved).

  14. Are computers effective lie detectors? A meta-analysis of linguistic cues to deception.

    Science.gov (United States)

    Hauch, Valerie; Blandón-Gitlin, Iris; Masip, Jaume; Sporer, Siegfried L

    2015-11-01

    This meta-analysis investigates linguistic cues to deception and whether these cues can be detected with computer programs. We integrated operational definitions for 79 cues from 44 studies where software had been used to identify linguistic deception cues. These cues were allocated to six research questions. As expected, the meta-analyses demonstrated that, relative to truth-tellers, liars experienced greater cognitive load, expressed more negative emotions, distanced themselves more from events, expressed fewer sensory-perceptual words, and referred less often to cognitive processes. However, liars were not more uncertain than truth-tellers. These effects were moderated by event type, involvement, emotional valence, intensity of interaction, motivation, and other moderators. Although the overall effect size was small, theory-driven predictions for certain cues received support. These findings not only further our knowledge about the usefulness of linguistic cues to detect deception with computers in applied settings but also elucidate the relationship between language and deception. © 2014 by the Society for Personality and Social Psychology, Inc.

  15. Speech endpoint detection with non-language speech sounds for generic speech processing applications

    Science.gov (United States)

    McClain, Matthew; Romanowski, Brian

    2009-05-01

    Non-language speech sounds (NLSS) are sounds produced by humans that do not carry linguistic information. Examples of these sounds are coughs, clicks, breaths, and filled pauses such as "uh" and "um" in English. NLSS are prominent in conversational speech, but can be a significant source of errors in speech processing applications. Traditionally, these sounds are ignored by speech endpoint detection algorithms, where speech regions are identified in the audio signal prior to processing. The ability to filter NLSS as a pre-processing step can significantly enhance the performance of many speech processing applications, such as speaker identification, language identification, and automatic speech recognition. In order to be used in all such applications, NLSS detection must be performed without the use of language models that provide knowledge of the phonology and lexical structure of speech. This is especially relevant to situations where the languages used in the audio are not known apriori. We present the results of preliminary experiments using data from American and British English speakers, in which segments of audio are classified as language speech sounds (LSS) or NLSS using a set of acoustic features designed for language-agnostic NLSS detection and a hidden-Markov model (HMM) to model speech generation. The results of these experiments indicate that the features and model used are capable of detection certain types of NLSS, such as breaths and clicks, while detection of other types of NLSS such as filled pauses will require future research.

  16. Comparison of Gated Audiovisual Speech Identification in Elderly Hearing Aid Users and Elderly Normal-Hearing Individuals

    Science.gov (United States)

    Lidestam, Björn; Rönnberg, Jerker

    2016-01-01

    The present study compared elderly hearing aid (EHA) users (n = 20) with elderly normal-hearing (ENH) listeners (n = 20) in terms of isolation points (IPs, the shortest time required for correct identification of a speech stimulus) and accuracy of audiovisual gated speech stimuli (consonants, words, and final words in highly and less predictable sentences) presented in silence. In addition, we compared the IPs of audiovisual speech stimuli from the present study with auditory ones extracted from a previous study, to determine the impact of the addition of visual cues. Both participant groups achieved ceiling levels in terms of accuracy in the audiovisual identification of gated speech stimuli; however, the EHA group needed longer IPs for the audiovisual identification of consonants and words. The benefit of adding visual cues to auditory speech stimuli was more evident in the EHA group, as audiovisual presentation significantly shortened the IPs for consonants, words, and final words in less predictable sentences; in the ENH group, audiovisual presentation only shortened the IPs for consonants and words. In conclusion, although the audiovisual benefit was greater for EHA group, this group had inferior performance compared with the ENH group in terms of IPs when supportive semantic context was lacking. Consequently, EHA users needed the initial part of the audiovisual speech signal to be longer than did their counterparts with normal hearing to reach the same level of accuracy in the absence of a semantic context. PMID:27317667

  17. Abortion and compelled physician speech.

    Science.gov (United States)

    Orentlicher, David

    2015-01-01

    Informed consent mandates for abortion providers may infringe the First Amendment's freedom of speech. On the other hand, they may reinforce the physician's duty to obtain informed consent. Courts can promote both doctrines by ensuring that compelled physician speech pertains to medical facts about abortion rather than abortion ideology and that compelled speech is truthful and not misleading. © 2015 American Society of Law, Medicine & Ethics, Inc.

  18. Prosody production networks are modulated by sensory cues and social context.

    Science.gov (United States)

    Klasen, Martin; von Marschall, Clara; Isman, Güldehen; Zvyagintsev, Mikhail; Gur, Ruben C; Mathiak, Klaus

    2018-03-05

    The neurobiology of emotional prosody production is not well investigated. In particular, the effects of cues and social context are not known. The present study sought to differentiate cued from free emotion generation and the effect of social feedback from a human listener. Online speech filtering enabled fMRI during prosodic communication in 30 participants. Emotional vocalizations were a) free, b) auditorily cued, c) visually cued, or d) with interactive feedback. In addition to distributed language networks, cued emotions increased activity in auditory and - in case of visual stimuli - visual cortex. Responses were larger in pSTG at the right hemisphere and the ventral striatum when participants were listened to and received feedback from the experimenter. Sensory, language, and reward networks contributed to prosody production and were modulated by cues and social context. The right pSTG is a central hub for communication in social interactions - in particular for interpersonal evaluation of vocal emotions.

  19. The Accuracy Enhancing Effect of Biasing Cues

    NARCIS (Netherlands)

    W. Vanhouche (Wouter); S.M.J. van Osselaer (Stijn)

    2009-01-01

    textabstractExtrinsic cues such as price and irrelevant attributes have been shown to bias consumers’ product judgments. Results in this article replicate those findings in pretrial judgments but show that such biasing cues can improve quality judgments at a later point in time. Initially biasing

  20. Auditory Emotional Cues Enhance Visual Perception

    Science.gov (United States)

    Zeelenberg, Rene; Bocanegra, Bruno R.

    2010-01-01

    Recent studies show that emotional stimuli impair performance to subsequently presented neutral stimuli. Here we show a cross-modal perceptual enhancement caused by emotional cues. Auditory cue words were followed by a visually presented neutral target word. Two-alternative forced-choice identification of the visual target was improved by…

  1. Cue Reliance in L2 Written Production

    Science.gov (United States)

    Wiechmann, Daniel; Kerz, Elma

    2014-01-01

    Second language learners reach expert levels in relative cue weighting only gradually. On the basis of ensemble machine learning models fit to naturalistic written productions of German advanced learners of English and expert writers, we set out to reverse engineer differences in the weighting of multiple cues in a clause linearization problem. We…

  2. Contextual Cueing Effects across the Lifespan

    Science.gov (United States)

    Merrill, Edward C.; Conners, Frances A.; Roskos, Beverly; Klinger, Mark R.; Klinger, Laura Grofer

    2013-01-01

    The authors evaluated age-related variations in contextual cueing, which reflects the extent to which visuospatial regularities can facilitate search for a target. Previous research produced inconsistent results regarding contextual cueing effects in young children and in older adults, and no study has investigated the phenomenon across the life…

  3. Cues for haptic perception of compliance

    NARCIS (Netherlands)

    Bergmann Tiest, W.M.; Kappers, A.M.L.

    2009-01-01

    For the perception of the hardness of compliant materials, several cues are available. In this paper, the relative roles of force/displacement and surface deformation cues are investigated. We have measured discrimination thresholds with silicone rubber stimuli of differing thickness and compliance.

  4. Speech Recognition on Mobile Devices

    DEFF Research Database (Denmark)

    Tan, Zheng-Hua; Lindberg, Børge

    2010-01-01

    in the mobile context covering motivations, challenges, fundamental techniques and applications. Three ASR architectures are introduced: embedded speech recognition, distributed speech recognition and network speech recognition. Their pros and cons and implementation issues are discussed. Applications within......The enthusiasm of deploying automatic speech recognition (ASR) on mobile devices is driven both by remarkable advances in ASR technology and by the demand for efficient user interfaces on such devices as mobile phones and personal digital assistants (PDAs). This chapter presents an overview of ASR...

  5. How rats combine temporal cues.

    Science.gov (United States)

    Guilhardi, Paulo; Keen, Richard; MacInnis, Mika L M; Church, Russell M

    2005-05-31

    The procedures for classical and operant conditioning, and for many timing procedures, involve the delivery of reinforcers that may be related to the time of previous reinforcers and responses, and to the time of onsets and terminations of stimuli. The behavior resulting from such procedures can be described as bouts of responding that occur in some pattern at some rate. A packet theory of timing and conditioning is described that accounts for such behavior under a wide range of procedures. Applications include the food searching by rats in Skinner boxes under conditions of fixed and random reinforcement, brief and sustained stimuli, and several response-food contingencies. The approach is used to describe how multiple cues from reinforcers and stimuli combine to determine the rate and pattern of response bouts.

  6. Contribution of Binaural Masking Release to Improved Speech Intelligibility for different Masker types.

    Science.gov (United States)

    Sutojo, Sarinah; van de Par, Steven; Schoenmaker, Esther

    2018-06-01

    In situations with competing talkers or in the presence of masking noise, speech intelligibility can be improved by spatially separating the target speaker from the interferers. This advantage is generally referred to as spatial release from masking (SRM) and different mechanisms have been suggested to explain it. One proposed mechanism to benefit from spatial cues is the binaural masking release, which is purely stimulus driven. According to this mechanism, the spatial benefit results from differences in the binaural cues of target and masker, which need to appear simultaneously in time and frequency to improve the signal detection. In an alternative proposed mechanism, the differences in the interaural cues improve the segregation of auditory streams, a process, which involves top-down processing rather than being purely stimulus driven. Other than the cues that produce binaural masking release, the interaural cue differences between target and interferer required to improve stream segregation do not have to appear simultaneously in time and frequency. This study is concerned with the contribution of binaural masking release to SRM for three masker types that differ with respect to the amount of energetic masking they exert. Speech intelligibility was measured, employing a stimulus manipulation that inhibits binaural masking release, and analyzed with a metric to account for the number of better-ear glimpses. Results indicate that the contribution of the stimulus-driven binaural masking release plays a minor role while binaural stream segregation and the availability of glimpses in the better ear had a stronger influence on improving the speech intelligibility. This article is protected by copyright. All rights reserved. This article is protected by copyright. All rights reserved.

  7. Role of short-time acoustic temporal fine structure cues in sentence recognition for normal-hearing listeners.

    Science.gov (United States)

    Hou, Limin; Xu, Li

    2018-02-01

    Short-time processing was employed to manipulate the amplitude, bandwidth, and temporal fine structure (TFS) in sentences. Fifty-two native-English-speaking, normal-hearing listeners participated in four sentence-recognition experiments. Results showed that recovered envelope (E) played an important role in speech recognition when the bandwidth was > 1 equivalent rectangular bandwidth. Removing TFS drastically reduced sentence recognition. Preserving TFS greatly improved sentence recognition when amplitude information was available at a rate ≥ 10 Hz (i.e., time segment ≤ 100 ms). Therefore, the short-time TFS facilitates speech perception together with the recovered E and works with the coarse amplitude cues to provide useful information for speech recognition.

  8. Kin-informative recognition cues in ants

    DEFF Research Database (Denmark)

    Nehring, Volker; Evison, Sophie E F; Santorelli, Lorenzo A

    2011-01-01

    behaviour is thought to be rare in one of the classic examples of cooperation--social insect colonies--because the colony-level costs of individual selfishness select against cues that would allow workers to recognize their closest relatives. In accord with this, previous studies of wasps and ants have...... found little or no kin information in recognition cues. Here, we test the hypothesis that social insects do not have kin-informative recognition cues by investigating the recognition cues and relatedness of workers from four colonies of the ant Acromyrmex octospinosus. Contrary to the theoretical...... prediction, we show that the cuticular hydrocarbons of ant workers in all four colonies are informative enough to allow full-sisters to be distinguished from half-sisters with a high accuracy. These results contradict the hypothesis of non-heritable recognition cues and suggest that there is more potential...

  9. Multiscale Cues Drive Collective Cell Migration

    Science.gov (United States)

    Nam, Ki-Hwan; Kim, Peter; Wood, David K.; Kwon, Sunghoon; Provenzano, Paolo P.; Kim, Deok-Ho

    2016-07-01

    To investigate complex biophysical relationships driving directed cell migration, we developed a biomimetic platform that allows perturbation of microscale geometric constraints with concomitant nanoscale contact guidance architectures. This permits us to elucidate the influence, and parse out the relative contribution, of multiscale features, and define how these physical inputs are jointly processed with oncogenic signaling. We demonstrate that collective cell migration is profoundly enhanced by the addition of contract guidance cues when not otherwise constrained. However, while nanoscale cues promoted migration in all cases, microscale directed migration cues are dominant as the geometric constraint narrows, a behavior that is well explained by stochastic diffusion anisotropy modeling. Further, oncogene activation (i.e. mutant PIK3CA) resulted in profoundly increased migration where extracellular multiscale directed migration cues and intrinsic signaling synergistically conspire to greatly outperform normal cells or any extracellular guidance cues in isolation.

  10. Current trends in multilingual speech processing

    Indian Academy of Sciences (India)

    2016-08-26

    ; speech-to-speech translation; language identification. ... interest owing to two strong driving forces. Firstly, technical advances in speech recognition and synthesis are posing new challenges and opportunities to researchers.

  11. Neural entrainment to rhythmically-presented auditory, visual and audio-visual speech in children

    Directory of Open Access Journals (Sweden)

    Alan James Power

    2012-07-01

    Full Text Available Auditory cortical oscillations have been proposed to play an important role in speech perception. It is suggested that the brain may take temporal ‘samples’ of information from the speech stream at different rates, phase-resetting ongoing oscillations so that they are aligned with similar frequency bands in the input (‘phase locking’. Information from these frequency bands is then bound together for speech perception. To date, there are no explorations of neural phase-locking and entrainment to speech input in children. However, it is clear from studies of language acquisition that infants use both visual speech information and auditory speech information in learning. In order to study neural entrainment to speech in typically-developing children, we use a rhythmic entrainment paradigm (underlying 2 Hz or delta rate based on repetition of the syllable ba, presented in either the auditory modality alone, the visual modality alone, or as auditory-visual speech (via a talking head. To ensure attention to the task, children aged 13 years were asked to press a button as fast as possible when the ba stimulus violated the rhythm for each stream type. Rhythmic violation depended on delaying the occurrence of a ba in the isochronous stream. Neural entrainment was demonstrated for all stream types, and individual differences in standardized measures of language processing were related to auditory entrainment at the theta rate. Further, there was significant modulation of the preferred phase of auditory entrainment in the theta band when visual speech cues were present, indicating cross-modal phase resetting. The rhythmic entrainment paradigm developed here offers a method for exploring individual differences in oscillatory phase locking during development. In particular, a method for assessing neural entrainment and cross-modal phase resetting would be useful for exploring developmental learning difficulties thought to involve temporal sampling

  12. Contribution of auditory working memory to speech understanding in mandarin-speaking cochlear implant users.

    Science.gov (United States)

    Tao, Duoduo; Deng, Rui; Jiang, Ye; Galvin, John J; Fu, Qian-Jie; Chen, Bing

    2014-01-01

    of voice pitch cues (albeit poorly coded by the CI) did not influence the relationship between working memory and speech perception.

  13. Adaptation to delayed auditory feedback induces the temporal recalibration effect in both speech perception and production.

    Science.gov (United States)

    Yamamoto, Kosuke; Kawabata, Hideaki

    2014-12-01

    We ordinarily speak fluently, even though our perceptions of our own voices are disrupted by various environmental acoustic properties. The underlying mechanism of speech is supposed to monitor the temporal relationship between speech production and the perception of auditory feedback, as suggested by a reduction in speech fluency when the speaker is exposed to delayed auditory feedback (DAF). While many studies have reported that DAF influences speech motor processing, its relationship to the temporal tuning effect on multimodal integration, or temporal recalibration, remains unclear. We investigated whether the temporal aspects of both speech perception and production change due to adaptation to the delay between the motor sensation and the auditory feedback. This is a well-used method of inducing temporal recalibration. Participants continually read texts with specific DAF times in order to adapt to the delay. Then, they judged the simultaneity between the motor sensation and the vocal feedback. We measured the rates of speech with which participants read the texts in both the exposure and re-exposure phases. We found that exposure to DAF changed both the rate of speech and the simultaneity judgment, that is, participants' speech gained fluency. Although we also found that a delay of 200 ms appeared to be most effective in decreasing the rates of speech and shifting the distribution on the simultaneity judgment, there was no correlation between these measurements. These findings suggest that both speech motor production and multimodal perception are adaptive to temporal lag but are processed in distinct ways.

  14. Do taste expectations mediate the impact of quality cues on consumers’ choice of chicken?

    DEFF Research Database (Denmark)

    Marian, Livia; Thøgersen, John; Krystallis Krontalis, Athanasios

    . The conjoint design was a metric traditional conjoint approach based on an additive model, where 405 respondents had to rate their willingness to buy and expectations regarding taste of each one of the nine different chickens on scales from 0 to 10. It was thus possible to conduct two different conjoint...... analyses, in order to determine the impact of the quality cues on buying intention on the one hand, and on the expected taste on the other hand. In these models, quality cues are initial variables, while expected taste and willingness to buy are both outcome variables. The two models are then integrated...... when testing whether or not expected taste mediates the effects of quality cues on willingness to buy. Hence, in the mediational model, quality cues are initial variables, willingness to buy is the outcome and expected taste is the mediator. The most effective way to do the mediation analysis is still...

  15. Electrophysiological and hemodynamic mismatch responses in rats listening to human speech syllables.

    Directory of Open Access Journals (Sweden)

    Mahdi Mahmoudzadeh

    Full Text Available Speech is a complex auditory stimulus which is processed according to several time-scales. Whereas consonant discrimination is required to resolve rapid acoustic events, voice perception relies on slower cues. Humans, right from preterm ages, are particularly efficient to encode temporal cues. To compare the capacities of preterms to those observed in other mammals, we tested anesthetized adult rats by using exactly the same paradigm as that used in preterm neonates. We simultaneously recorded neural (using ECoG and hemodynamic responses (using fNIRS to series of human speech syllables and investigated the brain response to a change of consonant (ba vs. ga and to a change of voice (male vs. female. Both methods revealed concordant results, although ECoG measures were more sensitive than fNIRS. Responses to syllables were bilateral, but with marked right-hemispheric lateralization. Responses to voice changes were observed with both methods, while only ECoG was sensitive to consonant changes. These results suggest that rats more effectively processed the speech envelope than fine temporal cues in contrast with human preterm neonates, in whom the opposite effects were observed. Cross-species comparisons constitute a very valuable tool to define the singularities of the human brain and species-specific bias that may help human infants to learn their native language.

  16. Don't speak too fast! Processing of fast rate speech in children with specific language impairment.

    Directory of Open Access Journals (Sweden)

    Hélène Guiraud

    Full Text Available Perception of speech rhythm requires the auditory system to track temporal envelope fluctuations, which carry syllabic and stress information. Reduced sensitivity to rhythmic acoustic cues has been evidenced in children with Specific Language Impairment (SLI, impeding syllabic parsing and speech decoding. Our study investigated whether these children experience specific difficulties processing fast rate speech as compared with typically developing (TD children.Sixteen French children with SLI (8-13 years old with mainly expressive phonological disorders and with preserved comprehension and 16 age-matched TD children performed a judgment task on sentences produced 1 at normal rate, 2 at fast rate or 3 time-compressed. Sensitivity index (d' to semantically incongruent sentence-final words was measured.Overall children with SLI perform significantly worse than TD children. Importantly, as revealed by the significant Group × Speech Rate interaction, children with SLI find it more challenging than TD children to process both naturally or artificially accelerated speech. The two groups do not significantly differ in normal rate speech processing.In agreement with rhythm-processing deficits in atypical language development, our results suggest that children with SLI face difficulties adjusting to rapid speech rate. These findings are interpreted in light of temporal sampling and prosodic phrasing frameworks and of oscillatory mechanisms underlying speech perception.

  17. Don't speak too fast! Processing of fast rate speech in children with specific language impairment.

    Science.gov (United States)

    Guiraud, Hélène; Bedoin, Nathalie; Krifi-Papoz, Sonia; Herbillon, Vania; Caillot-Bascoul, Aurélia; Gonzalez-Monge, Sibylle; Boulenger, Véronique

    2018-01-01

    Perception of speech rhythm requires the auditory system to track temporal envelope fluctuations, which carry syllabic and stress information. Reduced sensitivity to rhythmic acoustic cues has been evidenced in children with Specific Language Impairment (SLI), impeding syllabic parsing and speech decoding. Our study investigated whether these children experience specific difficulties processing fast rate speech as compared with typically developing (TD) children. Sixteen French children with SLI (8-13 years old) with mainly expressive phonological disorders and with preserved comprehension and 16 age-matched TD children performed a judgment task on sentences produced 1) at normal rate, 2) at fast rate or 3) time-compressed. Sensitivity index (d') to semantically incongruent sentence-final words was measured. Overall children with SLI perform significantly worse than TD children. Importantly, as revealed by the significant Group × Speech Rate interaction, children with SLI find it more challenging than TD children to process both naturally or artificially accelerated speech. The two groups do not significantly differ in normal rate speech processing. In agreement with rhythm-processing deficits in atypical language development, our results suggest that children with SLI face difficulties adjusting to rapid speech rate. These findings are interpreted in light of temporal sampling and prosodic phrasing frameworks and of oscillatory mechanisms underlying speech perception.

  18. The influence of spectral and spatial characteristics of early reflections on speech intelligibility

    DEFF Research Database (Denmark)

    Arweiler, Iris; Buchholz, Jörg; Dau, Torsten

    The auditory system employs different strategies to facilitate speech intelligibility in complex listening conditions. One of them is the integration of early reflections (ER’s) with the direct sound (DS) to increase the effective speech level. So far the underlying mechanisms of ER processing have...... of listeners that speech intelligibility improved with added ER energy, but less than with added DS energy. An efficiency factor was introduced to quantify this effect. The difference in speech intelligibility could be mainly ascribed to the differences in the spectrum between the speech signals....... binaural). The direction-dependency could be explained by the spectral changes introduced by the pinna, head, and torso. The results will be important with regard to the influence of signal processing strategies in modern hearing aids on speech intelligibility, because they might alter the spectral...

  19. Studies of Speech Disorders in Schizophrenia. History and State-of-the-art

    Directory of Open Access Journals (Sweden)

    Shedovskiy E. F.

    2015-08-01

    Full Text Available The article reviews studies of speech disorders in schizophrenia. The authors paid attention to a historical course and characterization of studies of areas: the actual psychopathological (speech disorders as a psychopathological symptoms, their description and taxonomy, psychological (isolated neurons and pathopsychological perspective analysis separately analyzed some modern foreign works, covering a variety of approaches to the study of speech disorders in the endogenous mental disorders. Disorders and features of speech are among the most striking manifestations of schizophrenia along with impaired thinking (Savitskaya A. V., Mikirtumov B. E.. With all the variety of symptoms, speech disorders in schizophrenia could be classified and organized. The few clinical psychological studies of speech activity in schizophrenia presented work on the study of generation and standard speech utterance; features verbal associative process, speed parameters of speech utterances. Special attention is given to integrated research in the mainstream of biological psychiatry and genetic trends. It is shown that the topic for more than a half-century history of originality of speech pathology in schizophrenia has received some coverage in the psychiatric and psychological literature and continues to generate interest in the modern integrated multidisciplinary approach

  20. Multimodal Speech Capture System for Speech Rehabilitation and Learning.

    Science.gov (United States)

    Sebkhi, Nordine; Desai, Dhyey; Islam, Mohammad; Lu, Jun; Wilson, Kimberly; Ghovanloo, Maysam

    2017-11-01

    Speech-language pathologists (SLPs) are trained to correct articulation of people diagnosed with motor speech disorders by analyzing articulators' motion and assessing speech outcome while patients speak. To assist SLPs in this task, we are presenting the multimodal speech capture system (MSCS) that records and displays kinematics of key speech articulators, the tongue and lips, along with voice, using unobtrusive methods. Collected speech modalities, tongue motion, lips gestures, and voice are visualized not only in real-time to provide patients with instant feedback but also offline to allow SLPs to perform post-analysis of articulators' motion, particularly the tongue, with its prominent but hardly visible role in articulation. We describe the MSCS hardware and software components, and demonstrate its basic visualization capabilities by a healthy individual repeating the words "Hello World." A proof-of-concept prototype has been successfully developed for this purpose, and will be used in future clinical studies to evaluate its potential impact on accelerating speech rehabilitation by enabling patients to speak naturally. Pattern matching algorithms to be applied to the collected data can provide patients with quantitative and objective feedback on their speech performance, unlike current methods that are mostly subjective, and may vary from one SLP to another.