integrate speech cues: Topics by WorldWideScience.org

Sample records for integrate speech cues

Speech cues contribute to audiovisual spatial integration.

Directory of Open Access Journals (Sweden)

Christopher W Bishop

Full Text Available Speech is the most important form of human communication but ambient sounds and competing talkers often degrade its acoustics. Fortunately the brain can use visual information, especially its highly precise spatial information, to improve speech comprehension in noisy environments. Previous studies have demonstrated that audiovisual integration depends strongly on spatiotemporal factors. However, some integrative phenomena such as McGurk interference persist even with gross spatial disparities, suggesting that spatial alignment is not necessary for robust integration of audiovisual place-of-articulation cues. It is therefore unclear how speech-cues interact with audiovisual spatial integration mechanisms. Here, we combine two well established psychophysical phenomena, the McGurk effect and the ventriloquist's illusion, to explore this dependency. Our results demonstrate that conflicting spatial cues may not interfere with audiovisual integration of speech, but conflicting speech-cues can impede integration in space. This suggests a direct but asymmetrical influence between ventral 'what' and dorsal 'where' pathways.
Modeling the Development of Audiovisual Cue Integration in Speech Perception.

Science.gov (United States)

Getz, Laura M; Nordeen, Elke R; Vrabic, Sarah C; Toscano, Joseph C

2017-03-21

Adult speech perception is generally enhanced when information is provided from multiple modalities. In contrast, infants do not appear to benefit from combining auditory and visual speech information early in development. This is true despite the fact that both modalities are important to speech comprehension even at early stages of language acquisition. How then do listeners learn how to process auditory and visual information as part of a unified signal? In the auditory domain, statistical learning processes provide an excellent mechanism for acquiring phonological categories. Is this also true for the more complex problem of acquiring audiovisual correspondences, which require the learner to integrate information from multiple modalities? In this paper, we present simulations using Gaussian mixture models (GMMs) that learn cue weights and combine cues on the basis of their distributional statistics. First, we simulate the developmental process of acquiring phonological categories from auditory and visual cues, asking whether simple statistical learning approaches are sufficient for learning multi-modal representations. Second, we use this time course information to explain audiovisual speech perception in adult perceivers, including cases where auditory and visual input are mismatched. Overall, we find that domain-general statistical learning techniques allow us to model the developmental trajectory of audiovisual cue integration in speech, and in turn, allow us to better understand the mechanisms that give rise to unified percepts based on multiple cues.
Should visual speech cues (speechreading) be considered when fitting hearing aids?

Science.gov (United States)

Grant, Ken

2002-05-01

When talker and listener are face-to-face, visual speech cues become an important part of the communication environment, and yet, these cues are seldom considered when designing hearing aids. Models of auditory-visual speech recognition highlight the importance of complementary versus redundant speech information for predicting auditory-visual recognition performance. Thus, for hearing aids to work optimally when visual speech cues are present, it is important to know whether the cues provided by amplification and the cues provided by speechreading complement each other. In this talk, data will be reviewed that show nonmonotonicity between auditory-alone speech recognition and auditory-visual speech recognition, suggesting that efforts designed solely to improve auditory-alone recognition may not always result in improved auditory-visual recognition. Data will also be presented showing that one of the most important speech cues for enhancing auditory-visual speech recognition performance, voicing, is often the cue that benefits least from amplification.
Audiovisual Cues and Perceptual Learning of Spectrally Distorted Speech

Science.gov (United States)

Pilling, Michael; Thomas, Sharon

2011-01-01

Two experiments investigate the effectiveness of audiovisual (AV) speech cues (cues derived from both seeing and hearing a talker speak) in facilitating perceptual learning of spectrally distorted speech. Speech was distorted through an eight channel noise-vocoder which shifted the spectral envelope of the speech signal to simulate the properties…
Segmentation cues in conversational speech: Robust semantics and fragile phonotactics

Directory of Open Access Journals (Sweden)

Laurence eWhite

2012-10-01

Full Text Available Multiple cues influence listeners’ segmentation of connected speech into words, but most previous studies have used stimuli elicited in careful readings rather than natural conversation. Discerning word boundaries in conversational speech may differ from the laboratory setting. In particular, a speaker’s articulatory effort – hyperarticulation vs hypoarticulation (H&H – may vary according to communicative demands, suggesting a compensatory relationship whereby acoustic-phonetic cues are attenuated when other information sources strongly guide segmentation. We examined how listeners’ interpretation of segmentation cues is affected by speech style (spontaneous conversation vs read, using cross-modal identity priming. To elicit spontaneous stimuli, we used a map task in which speakers discussed routes around stylised landmarks. These landmarks were two-word phrases in which the strength of potential segmentation cues – semantic likelihood and cross-boundary diphone phonotactics – was systematically varied. Landmark-carrying utterances were transcribed and later re-recorded as read speech.Independent of speech style, we found an interaction between cue valence (favourable/unfavourable and cue type (phonotactics/semantics. Thus, there was an effect of semantic plausibility, but no effect of cross-boundary phonotactics, indicating that the importance of phonotactic segmentation may have been overstated in studies where lexical information was artificially suppressed. These patterns were unaffected by whether the stimuli were elicited in a spontaneous or read context, even though the difference in speech styles was evident in a main effect. Durational analyses suggested speaker-driven cue trade-offs congruent with an H&H account, but these modulations did not impact on listener behaviour. We conclude that previous research exploiting read speech is reliable in indicating the primacy of lexically-based cues in the segmentation of natural
The Neural Basis of Speech Perception through Lipreading and Manual Cues: Evidence from Deaf Native Users of Cued Speech

Science.gov (United States)

Aparicio, Mario; Peigneux, Philippe; Charlier, Brigitte; Balériaux, Danielle; Kavec, Martin; Leybaert, Jacqueline

2017-01-01

We present here the first neuroimaging data for perception of Cued Speech (CS) by deaf adults who are native users of CS. CS is a visual mode of communicating a spoken language through a set of manual cues which accompany lipreading and disambiguate it. With CS, sublexical units of the oral language are conveyed clearly and completely through the visual modality without requiring hearing. The comparison of neural processing of CS in deaf individuals with processing of audiovisual (AV) speech in normally hearing individuals represents a unique opportunity to explore the similarities and differences in neural processing of an oral language delivered in a visuo-manual vs. an AV modality. The study included deaf adult participants who were early CS users and native hearing users of French who process speech audiovisually. Words were presented in an event-related fMRI design. Three conditions were presented to each group of participants. The deaf participants saw CS words (manual + lipread), words presented as manual cues alone, and words presented to be lipread without manual cues. The hearing group saw AV spoken words, audio-alone and lipread-alone. Three findings are highlighted. First, the middle and superior temporal gyrus (excluding Heschl’s gyrus) and left inferior frontal gyrus pars triangularis constituted a common, amodal neural basis for AV and CS perception. Second, integration was inferred in posterior parts of superior temporal sulcus for audio and lipread information in AV speech, but in the occipito-temporal junction, including MT/V5, for the manual cues and lipreading in CS. Third, the perception of manual cues showed a much greater overlap with the regions activated by CS (manual + lipreading) than lipreading alone did. This supports the notion that manual cues play a larger role than lipreading for CS processing. The present study contributes to a better understanding of the role of manual cues as support of visual speech perception in the framework
Experience with speech sounds is not necessary for cue trading by budgerigars (Melopsittacus undulatus.

Directory of Open Access Journals (Sweden)

Mary Flaherty

Full Text Available The influence of experience with human speech sounds on speech perception in budgerigars, vocal mimics whose speech exposure can be tightly controlled in a laboratory setting, was measured. Budgerigars were divided into groups that differed in auditory exposure and then tested on a cue-trading identification paradigm with synthetic speech. Phonetic cue trading is a perceptual phenomenon observed when changes on one cue dimension are offset by changes in another cue dimension while still maintaining the same phonetic percept. The current study examined whether budgerigars would trade the cues of voice onset time (VOT and the first formant onset frequency when identifying syllable initial stop consonants and if this would be influenced by exposure to speech sounds. There were a total of four different exposure groups: No speech exposure (completely isolated, Passive speech exposure (regular exposure to human speech, and two Speech-trained groups. After the exposure period, all budgerigars were tested for phonetic cue trading using operant conditioning procedures. Birds were trained to peck keys in response to different synthetic speech sounds that began with "d" or "t" and varied in VOT and frequency of the first formant at voicing onset. Once training performance criteria were met, budgerigars were presented with the entire intermediate series, including ambiguous sounds. Responses on these trials were used to determine which speech cues were used, if a trading relation between VOT and the onset frequency of the first formant was present, and whether speech exposure had an influence on perception. Cue trading was found in all birds and these results were largely similar to those of a group of humans. Results indicated that prior speech experience was not a requirement for cue trading by budgerigars. The results are consistent with theories that explain phonetic cue trading in terms of a rich auditory encoding of the speech signal.
The contribution of dynamic visual cues to audiovisual speech perception.

Science.gov (United States)

Jaekl, Philip; Pesquita, Ana; Alsius, Agnes; Munhall, Kevin; Soto-Faraco, Salvador

2015-08-01

Seeing a speaker's facial gestures can significantly improve speech comprehension, especially in noisy environments. However, the nature of the visual information from the speaker's facial movements that is relevant for this enhancement is still unclear. Like auditory speech signals, visual speech signals unfold over time and contain both dynamic configural information and luminance-defined local motion cues; two information sources that are thought to engage anatomically and functionally separate visual systems. Whereas, some past studies have highlighted the importance of local, luminance-defined motion cues in audiovisual speech perception, the contribution of dynamic configural information signalling changes in form over time has not yet been assessed. We therefore attempted to single out the contribution of dynamic configural information to audiovisual speech processing. To this aim, we measured word identification performance in noise using unimodal auditory stimuli, and with audiovisual stimuli. In the audiovisual condition, speaking faces were presented as point light displays achieved via motion capture of the original talker. Point light displays could be isoluminant, to minimise the contribution of effective luminance-defined local motion information, or with added luminance contrast, allowing the combined effect of dynamic configural cues and local motion cues. Audiovisual enhancement was found in both the isoluminant and contrast-based luminance conditions compared to an auditory-only condition, demonstrating, for the first time the specific contribution of dynamic configural cues to audiovisual speech improvement. These findings imply that globally processed changes in a speaker's facial shape contribute significantly towards the perception of articulatory gestures and the analysis of audiovisual speech. Copyright © 2015 Elsevier Ltd. All rights reserved.
Preschoolers Benefit from Visually Salient Speech Cues

Science.gov (United States)

Lalonde, Kaylah; Holt, Rachael Frush

2015-01-01

Purpose: This study explored visual speech influence in preschoolers using 3 developmentally appropriate tasks that vary in perceptual difficulty and task demands. They also examined developmental differences in the ability to use visually salient speech cues and visual phonological knowledge. Method: Twelve adults and 27 typically developing 3-…
Suprasegmental lexical stress cues in visual speech can guide spoken-word recognition

NARCIS (Netherlands)

Jesse, A.; McQueen, J.M.

2014-01-01

Visual cues to the individual segments of speech and to sentence prosody guide speech recognition. The present study tested whether visual suprasegmental cues to the stress patterns of words can also constrain recognition. Dutch listeners use acoustic suprasegmental cues to lexical stress (changes
Suprasegmental lexical stress cues in visual speech can guide spoken-word recognition

OpenAIRE

Jesse, A.; McQueen, J.

2014-01-01

Visual cues to the individual segments of speech and to sentence prosody guide speech recognition. The present study tested whether visual suprasegmental cues to the stress patterns of words can also constrain recognition. Dutch listeners use acoustic suprasegmental cues to lexical stress (changes in duration, amplitude, and pitch) in spoken-word recognition. We asked here whether they can also use visual suprasegmental cues. In two categorization experiments, Dutch participants saw a speaker...
Use of amplitude modulation cues recovered from frequency modulation for cochlear implant users when original speech cues are severely degraded.

Science.gov (United States)

Won, Jong Ho; Shim, Hyun Joon; Lorenzi, Christian; Rubinstein, Jay T

2014-06-01

Won et al. (J Acoust Soc Am 132:1113-1119, 2012) reported that cochlear implant (CI) speech processors generate amplitude-modulation (AM) cues recovered from broadband speech frequency modulation (FM) and that CI users can use these cues for speech identification in quiet. The present study was designed to extend this finding for a wide range of listening conditions, where the original speech cues were severely degraded by manipulating either the acoustic signals or the speech processor. The manipulation of the acoustic signals included the presentation of background noise, simulation of reverberation, and amplitude compression. The manipulation of the speech processor included changing the input dynamic range and the number of channels. For each of these conditions, multiple levels of speech degradation were tested. Speech identification was measured for CI users and compared for stimuli having both AM and FM information (intact condition) or FM information only (FM condition). Each manipulation degraded speech identification performance for both intact and FM conditions. Performance for the intact and FM conditions became similar for stimuli having the most severe degradations. Identification performance generally overlapped for the intact and FM conditions. Moreover, identification performance for the FM condition was better than chance performance even at the maximum level of distortion. Finally, significant correlations were found between speech identification scores for the intact and FM conditions. Altogether, these results suggest that despite poor frequency selectivity, CI users can make efficient use of AM cues recovered from speech FM in difficult listening situations.
Perception of Speech Modulation Cues by 6-Month-Old Infants

Science.gov (United States)

Cabrera, Laurianne; Bertoncini, Josiane; Lorenzi, Christian

2013-01-01

Purpose: The capacity of 6-month-old infants to discriminate a voicing contrast (/aba/--/apa/) on the basis of "amplitude modulation (AM) cues" and "frequency modulation (FM) cues" was evaluated. Method: Several vocoded speech conditions were designed to either degrade FM cues in 4 or 32 bands or degrade AM in 32 bands. Infants…
Audiovisual cues benefit recognition of accented speech in noise but not perceptual adaptation.

Science.gov (United States)

Banks, Briony; Gowen, Emma; Munro, Kevin J; Adank, Patti

2015-01-01

Perceptual adaptation allows humans to recognize different varieties of accented speech. We investigated whether perceptual adaptation to accented speech is facilitated if listeners can see a speaker's facial and mouth movements. In Study 1, participants listened to sentences in a novel accent and underwent a period of training with audiovisual or audio-only speech cues, presented in quiet or in background noise. A control group also underwent training with visual-only (speech-reading) cues. We observed no significant difference in perceptual adaptation between any of the groups. To address a number of remaining questions, we carried out a second study using a different accent, speaker and experimental design, in which participants listened to sentences in a non-native (Japanese) accent with audiovisual or audio-only cues, without separate training. Participants' eye gaze was recorded to verify that they looked at the speaker's face during audiovisual trials. Recognition accuracy was significantly better for audiovisual than for audio-only stimuli; however, no statistical difference in perceptual adaptation was observed between the two modalities. Furthermore, Bayesian analysis suggested that the data supported the null hypothesis. Our results suggest that although the availability of visual speech cues may be immediately beneficial for recognition of unfamiliar accented speech in noise, it does not improve perceptual adaptation.
Brain networks engaged in audiovisual integration during speech perception revealed by persistent homology-based network filtration.

Science.gov (United States)

Kim, Heejung; Hahm, Jarang; Lee, Hyekyoung; Kang, Eunjoo; Kang, Hyejin; Lee, Dong Soo

2015-05-01

The human brain naturally integrates audiovisual information to improve speech perception. However, in noisy environments, understanding speech is difficult and may require much effort. Although the brain network is supposed to be engaged in speech perception, it is unclear how speech-related brain regions are connected during natural bimodal audiovisual or unimodal speech perception with counterpart irrelevant noise. To investigate the topological changes of speech-related brain networks at all possible thresholds, we used a persistent homological framework through hierarchical clustering, such as single linkage distance, to analyze the connected component of the functional network during speech perception using functional magnetic resonance imaging. For speech perception, bimodal (audio-visual speech cue) or unimodal speech cues with counterpart irrelevant noise (auditory white-noise or visual gum-chewing) were delivered to 15 subjects. In terms of positive relationship, similar connected components were observed in bimodal and unimodal speech conditions during filtration. However, during speech perception by congruent audiovisual stimuli, the tighter couplings of left anterior temporal gyrus-anterior insula component and right premotor-visual components were observed than auditory or visual speech cue conditions, respectively. Interestingly, visual speech is perceived under white noise by tight negative coupling in the left inferior frontal region-right anterior cingulate, left anterior insula, and bilateral visual regions, including right middle temporal gyrus, right fusiform components. In conclusion, the speech brain network is tightly positively or negatively connected, and can reflect efficient or effortful processes during natural audiovisual integration or lip-reading, respectively, in speech perception.
Acoustic cues identifying phonetic transitions for speech segmentation

CSIR Research Space (South Africa)

Van Niekerk, DR

2008-11-01

Full Text Available The quality of corpus-based text-to-speech (TTS) systems depends strongly on the consistency of boundary placements during phonetic alignments. Expert human transcribers use visually represented acoustic cues in order to consistently place...
The role of continuous low-frequency harmonicity cues for interrupted speech perception in bimodal hearing.

Science.gov (United States)

Oh, Soo Hee; Donaldson, Gail S; Kong, Ying-Yee

2016-04-01

Low-frequency acoustic cues have been shown to enhance speech perception by cochlear-implant users, particularly when target speech occurs in a competing background. The present study examined the extent to which a continuous representation of low-frequency harmonicity cues contributes to bimodal benefit in simulated bimodal listeners. Experiment 1 examined the benefit of restoring a continuous temporal envelope to the low-frequency ear while the vocoder ear received a temporally interrupted stimulus. Experiment 2 examined the effect of providing continuous harmonicity cues in the low-frequency ear as compared to restoring a continuous temporal envelope in the vocoder ear. Findings indicate that bimodal benefit for temporally interrupted speech increases when continuity is restored to either or both ears. The primary benefit appears to stem from the continuous temporal envelope in the low-frequency region providing additional phonetic cues related to manner and F1 frequency; a secondary contribution is provided by low-frequency harmonicity cues when a continuous representation of the temporal envelope is present in the low-frequency, or both ears. The continuous temporal envelope and harmonicity cues of low-frequency speech are thought to support bimodal benefit by facilitating identification of word and syllable boundaries, and by restoring partial phonetic cues that occur during gaps in the temporally interrupted stimulus.
Neural networks supporting audiovisual integration for speech: A large-scale lesion study.

Science.gov (United States)

Hickok, Gregory; Rogalsky, Corianne; Matchin, William; Basilakos, Alexandra; Cai, Julia; Pillay, Sara; Ferrill, Michelle; Mickelsen, Soren; Anderson, Steven W; Love, Tracy; Binder, Jeffrey; Fridriksson, Julius

2018-06-01

Auditory and visual speech information are often strongly integrated resulting in perceptual enhancements for audiovisual (AV) speech over audio alone and sometimes yielding compelling illusory fusion percepts when AV cues are mismatched, the McGurk-MacDonald effect. Previous research has identified three candidate regions thought to be critical for AV speech integration: the posterior superior temporal sulcus (STS), early auditory cortex, and the posterior inferior frontal gyrus. We assess the causal involvement of these regions (and others) in the first large-scale (N = 100) lesion-based study of AV speech integration. Two primary findings emerged. First, behavioral performance and lesion maps for AV enhancement and illusory fusion measures indicate that classic metrics of AV speech integration are not necessarily measuring the same process. Second, lesions involving superior temporal auditory, lateral occipital visual, and multisensory zones in the STS are the most disruptive to AV speech integration. Further, when AV speech integration fails, the nature of the failure-auditory vs visual capture-can be predicted from the location of the lesions. These findings show that AV speech processing is supported by unimodal auditory and visual cortices as well as multimodal regions such as the STS at their boundary. Motor related frontal regions do not appear to play a role in AV speech integration. Copyright © 2018 Elsevier Ltd. All rights reserved.
The role of reverberation-related binaural cues in the externalization of speech

DEFF Research Database (Denmark)

Catic, Jasmina; Santurette, Sébastien; Dau, Torsten

2015-01-01

The perception of externalization of speech sounds was investigated with respect to the monaural and binaural cues available at the listeners’ ears in a reverberant environment. Individualized binaural room impulse responses (BRIRs) were used to simulate externalized sound sources via headphones....... The measured BRIRs were subsequently modified such that the proportion of the response containing binaural vs monaural information was varied. Normal-hearing listeners were presented with speech sounds convolved with such modified BRIRs. Monaural reverberation cues were found to be sufficient...
The role of reverberation-related binaural cues in the externalization of speech.

Science.gov (United States)

Catic, Jasmina; Santurette, Sébastien; Dau, Torsten

2015-08-01

The perception of externalization of speech sounds was investigated with respect to the monaural and binaural cues available at the listeners' ears in a reverberant environment. Individualized binaural room impulse responses (BRIRs) were used to simulate externalized sound sources via headphones. The measured BRIRs were subsequently modified such that the proportion of the response containing binaural vs monaural information was varied. Normal-hearing listeners were presented with speech sounds convolved with such modified BRIRs. Monaural reverberation cues were found to be sufficient for the externalization of a lateral sound source. In contrast, for a frontal source, an increased amount of binaural cues from reflections was required in order to obtain well externalized sound images. It was demonstrated that the interaction between the interaural cues of the direct sound and the reverberation strongly affects the perception of externalization. An analysis of the short-term binaural cues showed that the amount of fluctuations of the binaural cues corresponded well to the externalization ratings obtained in the listening tests. The results further suggested that the precedence effect is involved in the auditory processing of the dynamic binaural cues that are utilized for externalization perception.

Learning to Match Auditory and Visual Speech Cues: Social Influences on Acquisition of Phonological Categories

Science.gov (United States)

Altvater-Mackensen, Nicole; Grossmann, Tobias

2015-01-01

Infants' language exposure largely involves face-to-face interactions providing acoustic and visual speech cues but also social cues that might foster language learning. Yet, both audiovisual speech information and social information have so far received little attention in research on infants' early language development. Using a preferential…
Cross-language differences in cue use for speech segmentation

NARCIS (Netherlands)

Tyler, M.D.; Cutler, A.

2009-01-01

Two artificial-language learning experiments directly compared English, French, and Dutch listeners' use of suprasegmental cues for continuous-speech segmentation. In both experiments, listeners heard unbroken sequences of consonant-vowel syllables, composed of recurring three- and four-syllable
Training the Brain to Weight Speech Cues Differently: A Study of Finnish Second-language Users of English

Science.gov (United States)

Ylinen, Sari; Uther, Maria; Latvala, Antti; Vepsalainen, Sara; Iverson, Paul; Akahane-Yamada, Reiko; Naatanen, Risto

2010-01-01

Foreign-language learning is a prime example of a task that entails perceptual learning. The correct comprehension of foreign-language speech requires the correct recognition of speech sounds. The most difficult speech-sound contrasts for foreign-language learners often are the ones that have multiple phonetic cues, especially if the cues are…
Assessment of Spectral and Temporal Resolution in Cochlear Implant Users Using Psychoacoustic Discrimination and Speech Cue Categorization.

Science.gov (United States)

Winn, Matthew B; Won, Jong Ho; Moon, Il Joon

This study was conducted to measure auditory perception by cochlear implant users in the spectral and temporal domains, using tests of either categorization (using speech-based cues) or discrimination (using conventional psychoacoustic tests). The authors hypothesized that traditional nonlinguistic tests assessing spectral and temporal auditory resolution would correspond to speech-based measures assessing specific aspects of phonetic categorization assumed to depend on spectral and temporal auditory resolution. The authors further hypothesized that speech-based categorization performance would ultimately be a superior predictor of speech recognition performance, because of the fundamental nature of speech recognition as categorization. Nineteen cochlear implant listeners and 10 listeners with normal hearing participated in a suite of tasks that included spectral ripple discrimination, temporal modulation detection, and syllable categorization, which was split into a spectral cue-based task (targeting the /ba/-/da/ contrast) and a timing cue-based task (targeting the /b/-/p/ and /d/-/t/ contrasts). Speech sounds were manipulated to contain specific spectral or temporal modulations (formant transitions or voice onset time, respectively) that could be categorized. Categorization responses were quantified using logistic regression to assess perceptual sensitivity to acoustic phonetic cues. Word recognition testing was also conducted for cochlear implant listeners. Cochlear implant users were generally less successful at utilizing both spectral and temporal cues for categorization compared with listeners with normal hearing. For the cochlear implant listener group, spectral ripple discrimination was significantly correlated with the categorization of formant transitions; both were correlated with better word recognition. Temporal modulation detection using 100- and 10-Hz-modulated noise was not correlated either with the cochlear implant subjects' categorization of
Crossmodal and incremental perception of audiovisual cues to emotional speech.

Science.gov (United States)

Barkhuysen, Pashiera; Krahmer, Emiel; Swerts, Marc

2010-01-01

In this article we report on two experiments about the perception of audiovisual cues to emotional speech. The article addresses two questions: 1) how do visual cues from a speaker's face to emotion relate to auditory cues, and (2) what is the recognition speed for various facial cues to emotion? Both experiments reported below are based on tests with video clips of emotional utterances collected via a variant of the well-known Velten method. More specifically, we recorded speakers who displayed positive or negative emotions, which were congruent or incongruent with the (emotional) lexical content of the uttered sentence. In order to test this, we conducted two experiments. The first experiment is a perception experiment in which Czech participants, who do not speak Dutch, rate the perceived emotional state of Dutch speakers in a bimodal (audiovisual) or a unimodal (audio- or vision-only) condition. It was found that incongruent emotional speech leads to significantly more extreme perceived emotion scores than congruent emotional speech, where the difference between congruent and incongruent emotional speech is larger for the negative than for the positive conditions. Interestingly, the largest overall differences between congruent and incongruent emotions were found for the audio-only condition, which suggests that posing an incongruent emotion has a particularly strong effect on the spoken realization of emotions. The second experiment uses a gating paradigm to test the recognition speed for various emotional expressions from a speaker's face. In this experiment participants were presented with the same clips as experiment I, but this time presented vision-only. The clips were shown in successive segments (gates) of increasing duration. Results show that participants are surprisingly accurate in their recognition of the various emotions, as they already reach high recognition scores in the first gate (after only 160 ms). Interestingly, the recognition scores
When Meaning Is Not Enough: Distributional and Semantic Cues to Word Categorization in Child Directed Speech.

Science.gov (United States)

Feijoo, Sara; Muñoz, Carmen; Amadó, Anna; Serrat, Elisabet

2017-01-01

One of the most important tasks in first language development is assigning words to their grammatical category. The Semantic Bootstrapping Hypothesis postulates that, in order to accomplish this task, children are guided by a neat correspondence between semantic and grammatical categories, since nouns typically refer to objects and verbs to actions. It is this correspondence that guides children's initial word categorization. Other approaches, on the other hand, suggest that children might make use of distributional cues and word contexts to accomplish the word categorization task. According to such approaches, the Semantic Bootstrapping assumption offers an important limitation, as it might not be true that all the nouns that children hear refer to specific objects or people. In order to explore that, we carried out two studies based on analyses of children's linguistic input. We analyzed child-directed speech addressed to four children under the age of 2;6, taken from the CHILDES database. The corpora were selected from the Manchester corpus. The corpora from the four selected children contained a total of 10,681 word types and 364,196 word tokens. In our first study, discriminant analyses were performed using semantic cues alone. The results show that many of the nouns found in parents' speech do not relate to specific objects and that semantic information alone might not be sufficient for successful word categorization. Given that there must be an additional source of information which, alongside with semantics, might assist young learners in word categorization, our second study explores the availability of both distributional and semantic cues in child-directed speech. Our results confirm that this combination might yield better results for word categorization. These results are in line with theories that suggest the need for an integration of multiple cues from different sources in language development.
Common cues to emotion in the dynamic facial expressions of speech and song.

Science.gov (United States)

Livingstone, Steven R; Thompson, William F; Wanderley, Marcelo M; Palmer, Caroline

2015-01-01

Speech and song are universal forms of vocalization that may share aspects of emotional expression. Research has focused on parallels in acoustic features, overlooking facial cues to emotion. In three experiments, we compared moving facial expressions in speech and song. In Experiment 1, vocalists spoke and sang statements each with five emotions. Vocalists exhibited emotion-dependent movements of the eyebrows and lip corners that transcended speech-song differences. Vocalists' jaw movements were coupled to their acoustic intensity, exhibiting differences across emotion and speech-song. Vocalists' emotional movements extended beyond vocal sound to include large sustained expressions, suggesting a communicative function. In Experiment 2, viewers judged silent videos of vocalists' facial expressions prior to, during, and following vocalization. Emotional intentions were identified accurately for movements during and after vocalization, suggesting that these movements support the acoustic message. Experiment 3 compared emotional identification in voice-only, face-only, and face-and-voice recordings. Emotion judgements for voice-only singing were poorly identified, yet were accurate for all other conditions, confirming that facial expressions conveyed emotion more accurately than the voice in song, yet were equivalent in speech. Collectively, these findings highlight broad commonalities in the facial cues to emotion in speech and song, yet highlight differences in perception and acoustic-motor production.
Prediction and constraint in audiovisual speech perception

Science.gov (United States)

Peelle, Jonathan E.; Sommers, Mitchell S.

2015-01-01

During face-to-face conversational speech listeners must efficiently process a rapid and complex stream of multisensory information. Visual speech can serve as a critical complement to auditory information because it provides cues to both the timing of the incoming acoustic signal (the amplitude envelope, influencing attention and perceptual sensitivity) and its content (place and manner of articulation, constraining lexical selection). Here we review behavioral and neurophysiological evidence regarding listeners' use of visual speech information. Multisensory integration of audiovisual speech cues improves recognition accuracy, particularly for speech in noise. Even when speech is intelligible based solely on auditory information, adding visual information may reduce the cognitive demands placed on listeners through increasing precision of prediction. Electrophysiological studies demonstrate oscillatory cortical entrainment to speech in auditory cortex is enhanced when visual speech is present, increasing sensitivity to important acoustic cues. Neuroimaging studies also suggest increased activity in auditory cortex when congruent visual information is available, but additionally emphasize the involvement of heteromodal regions of posterior superior temporal sulcus as playing a role in integrative processing. We interpret these findings in a framework of temporally-focused lexical competition in which visual speech information affects auditory processing to increase sensitivity to auditory information through an early integration mechanism, and a late integration stage that incorporates specific information about a speaker's articulators to constrain the number of possible candidates in a spoken utterance. Ultimately it is words compatible with both auditory and visual information that most strongly determine successful speech perception during everyday listening. Thus, audiovisual speech perception is accomplished through multiple stages of integration, supported
Prediction and constraint in audiovisual speech perception.

Science.gov (United States)

Peelle, Jonathan E; Sommers, Mitchell S

2015-07-01

During face-to-face conversational speech listeners must efficiently process a rapid and complex stream of multisensory information. Visual speech can serve as a critical complement to auditory information because it provides cues to both the timing of the incoming acoustic signal (the amplitude envelope, influencing attention and perceptual sensitivity) and its content (place and manner of articulation, constraining lexical selection). Here we review behavioral and neurophysiological evidence regarding listeners' use of visual speech information. Multisensory integration of audiovisual speech cues improves recognition accuracy, particularly for speech in noise. Even when speech is intelligible based solely on auditory information, adding visual information may reduce the cognitive demands placed on listeners through increasing the precision of prediction. Electrophysiological studies demonstrate that oscillatory cortical entrainment to speech in auditory cortex is enhanced when visual speech is present, increasing sensitivity to important acoustic cues. Neuroimaging studies also suggest increased activity in auditory cortex when congruent visual information is available, but additionally emphasize the involvement of heteromodal regions of posterior superior temporal sulcus as playing a role in integrative processing. We interpret these findings in a framework of temporally-focused lexical competition in which visual speech information affects auditory processing to increase sensitivity to acoustic information through an early integration mechanism, and a late integration stage that incorporates specific information about a speaker's articulators to constrain the number of possible candidates in a spoken utterance. Ultimately it is words compatible with both auditory and visual information that most strongly determine successful speech perception during everyday listening. Thus, audiovisual speech perception is accomplished through multiple stages of integration
Suprasegmental lexical stress cues in visual speech can guide spoken-word recognition.

Science.gov (United States)

Jesse, Alexandra; McQueen, James M

2014-01-01

Visual cues to the individual segments of speech and to sentence prosody guide speech recognition. The present study tested whether visual suprasegmental cues to the stress patterns of words can also constrain recognition. Dutch listeners use acoustic suprasegmental cues to lexical stress (changes in duration, amplitude, and pitch) in spoken-word recognition. We asked here whether they can also use visual suprasegmental cues. In two categorization experiments, Dutch participants saw a speaker say fragments of word pairs that were segmentally identical but differed in their stress realization (e.g., 'ca-vi from cavia "guinea pig" vs. 'ka-vi from kaviaar "caviar"). Participants were able to distinguish between these pairs from seeing a speaker alone. Only the presence of primary stress in the fragment, not its absence, was informative. Participants were able to distinguish visually primary from secondary stress on first syllables, but only when the fragment-bearing target word carried phrase-level emphasis. Furthermore, participants distinguished fragments with primary stress on their second syllable from those with secondary stress on their first syllable (e.g., pro-'jec from projector "projector" vs. 'pro-jec from projectiel "projectile"), independently of phrase-level emphasis. Seeing a speaker thus contributes to spoken-word recognition by providing suprasegmental information about the presence of primary lexical stress.
When speaker identity is unavoidable: Neural processing of speaker identity cues in natural speech.

Science.gov (United States)

Tuninetti, Alba; Chládková, Kateřina; Peter, Varghese; Schiller, Niels O; Escudero, Paola

2017-11-01

Speech sound acoustic properties vary largely across speakers and accents. When perceiving speech, adult listeners normally disregard non-linguistic variation caused by speaker or accent differences, in order to comprehend the linguistic message, e.g. to correctly identify a speech sound or a word. Here we tested whether the process of normalizing speaker and accent differences, facilitating the recognition of linguistic information, is found at the level of neural processing, and whether it is modulated by the listeners' native language. In a multi-deviant oddball paradigm, native and nonnative speakers of Dutch were exposed to naturally-produced Dutch vowels varying in speaker, sex, accent, and phoneme identity. Unexpectedly, the analysis of mismatch negativity (MMN) amplitudes elicited by each type of change shows a large degree of early perceptual sensitivity to non-linguistic cues. This finding on perception of naturally-produced stimuli contrasts with previous studies examining the perception of synthetic stimuli wherein adult listeners automatically disregard acoustic cues to speaker identity. The present finding bears relevance to speech normalization theories, suggesting that at an unattended level of processing, listeners are indeed sensitive to changes in fundamental frequency in natural speech tokens. Copyright © 2017 Elsevier Inc. All rights reserved.
Treatment of Speech Anxiety by Cue-Controlled Relaxation and Desensitization with Professional and Paraprofessional Counselors

Science.gov (United States)

Russell, Richard K.; Wise, Fred

1976-01-01

This investigation compared the relative effectiveness of group-administered cue-controlled relaxation and group systematic desensitization in the treatment of speech anxiety. Also examined was the role of professional versus paraprofessional counselors in implementing the treatment program. A description of the cue-controlled relaxation technique…
Phonetic Category Cues in Adult-Directed Speech: Evidence from Three Languages with Distinct Vowel Characteristics

Science.gov (United States)

Pons, Ferran; Biesanz, Jeremy C.; Kajikawa, Sachiyo; Fais, Laurel; Narayan, Chandan R.; Amano, Shigeaki; Werker, Janet F.

2012-01-01

Using an artificial language learning manipulation, Maye, Werker, and Gerken (2002) demonstrated that infants' speech sound categories change as a function of the distributional properties of the input. In a recent study, Werker et al. (2007) showed that Infant-directed Speech (IDS) input contains reliable acoustic cues that support distributional…
Discriminating individually considerate and authoritarian leaders by speech activity cues

OpenAIRE

Feese, Sebastian; Muaremi, Amir; Arnrich, Bert; Tröster, Gerhard; Meyer, Bertolt; Jonas, Klaus

2011-01-01

Effective leadership can increase team performance, however up to now the influence of specific micro-level behavioral patterns on team performance is unclear. At the same time, current behavior observation methods in social psychology mostly rely on manual video annotations that impede research. In our work, we follow a sensor-based approach to automatically extract speech activity cues to discriminate individualized considerate from authoritarian leadership. On a subset of 35 selected...
Visual-vestibular cue integration for heading perception: applications of optimal cue integration theory.

Science.gov (United States)

Fetsch, Christopher R; Deangelis, Gregory C; Angelaki, Dora E

2010-05-01

The perception of self-motion is crucial for navigation, spatial orientation and motor control. In particular, estimation of one's direction of translation, or heading, relies heavily on multisensory integration in most natural situations. Visual and nonvisual (e.g., vestibular) information can be used to judge heading, but each modality alone is often insufficient for accurate performance. It is not surprising, then, that visual and vestibular signals converge frequently in the nervous system, and that these signals interact in powerful ways at the level of behavior and perception. Early behavioral studies of visual-vestibular interactions consisted mainly of descriptive accounts of perceptual illusions and qualitative estimation tasks, often with conflicting results. In contrast, cue integration research in other modalities has benefited from the application of rigorous psychophysical techniques, guided by normative models that rest on the foundation of ideal-observer analysis and Bayesian decision theory. Here we review recent experiments that have attempted to harness these so-called optimal cue integration models for the study of self-motion perception. Some of these studies used nonhuman primate subjects, enabling direct comparisons between behavioral performance and simultaneously recorded neuronal activity. The results indicate that humans and monkeys can integrate visual and vestibular heading cues in a manner consistent with optimal integration theory, and that single neurons in the dorsal medial superior temporal area show striking correlates of the behavioral effects. This line of research and other applications of normative cue combination models should continue to shed light on mechanisms of self-motion perception and the neuronal basis of multisensory integration.
Causal inference of asynchronous audiovisual speech

Directory of Open Access Journals (Sweden)

John F Magnotti

2013-11-01

Full Text Available During speech perception, humans integrate auditory information from the voice with visual information from the face. This multisensory integration increases perceptual precision, but only if the two cues come from the same talker; this requirement has been largely ignored by current models of speech perception. We describe a generative model of multisensory speech perception that includes this critical step of determining the likelihood that the voice and face information have a common cause. A key feature of the model is that it is based on a principled analysis of how an observer should solve this causal inference problem using the asynchrony between two cues and the reliability of the cues. This allows the model to make predictions abut the behavior of subjects performing a synchrony judgment task, predictive power that does not exist in other approaches, such as post hoc fitting of Gaussian curves to behavioral data. We tested the model predictions against the performance of 37 subjects performing a synchrony judgment task viewing audiovisual speech under a variety of manipulations, including varying asynchronies, intelligibility, and visual cue reliability. The causal inference model outperformed the Gaussian model across two experiments, providing a better fit to the behavioral data with fewer parameters. Because the causal inference model is derived from a principled understanding of the task, model parameters are directly interpretable in terms of stimulus and subject properties.
Comparison of Cue-Controlled Desensitization, Rational Restructuring, and a Credible Placebo in the Treatment of Speech Anxiety.

Science.gov (United States)

Lent, Robert W.; And Others

1981-01-01

The efficacy of cue-controlled desensitization and systematic rational restructuring was compared with a placebo method and a waiting-list control in reducing public speaking and nontargeted anxieties. Cue-controlled desensitization was generally more effective than the other groups in reducing subjective speech anxiety. (Author)
A psychophysical imaging method evidencing auditory cue extraction during speech perception: a group analysis of auditory classification images.

Science.gov (United States)

Varnet, Léo; Knoblauch, Kenneth; Serniclaes, Willy; Meunier, Fanny; Hoen, Michel

2015-01-01

Although there is a large consensus regarding the involvement of specific acoustic cues in speech perception, the precise mechanisms underlying the transformation from continuous acoustical properties into discrete perceptual units remains undetermined. This gap in knowledge is partially due to the lack of a turnkey solution for isolating critical speech cues from natural stimuli. In this paper, we describe a psychoacoustic imaging method known as the Auditory Classification Image technique that allows experimenters to estimate the relative importance of time-frequency regions in categorizing natural speech utterances in noise. Importantly, this technique enables the testing of hypotheses on the listening strategies of participants at the group level. We exemplify this approach by identifying the acoustic cues involved in da/ga categorization with two phonetic contexts, Al- or Ar-. The application of Auditory Classification Images to our group of 16 participants revealed significant critical regions on the second and third formant onsets, as predicted by the literature, as well as an unexpected temporal cue on the first formant. Finally, through a cluster-based nonparametric test, we demonstrate that this method is sufficiently sensitive to detect fine modifications of the classification strategies between different utterances of the same phoneme.
Speech-specificity of two audiovisual integration effects

DEFF Research Database (Denmark)

Eskelund, Kasper; Tuomainen, Jyrki; Andersen, Tobias

2010-01-01

Seeing the talker’s articulatory mouth movements can influence the auditory speech percept both in speech identification and detection tasks. Here we show that these audiovisual integration effects also occur for sine wave speech (SWS), which is an impoverished speech signal that naïve observers...... often fail to perceive as speech. While audiovisual integration in the identification task only occurred when observers were informed of the speech-like nature of SWS, integration occurred in the detection task both for informed and naïve observers. This shows that both speech-specific and general...... mechanisms underlie audiovisual integration of speech....
Atypical audiovisual speech integration in infants at risk for autism.

Directory of Open Access Journals (Sweden)

Jeanne A Guiraud

Full Text Available The language difficulties often seen in individuals with autism might stem from an inability to integrate audiovisual information, a skill important for language development. We investigated whether 9-month-old siblings of older children with autism, who are at an increased risk of developing autism, are able to integrate audiovisual speech cues. We used an eye-tracker to record where infants looked when shown a screen displaying two faces of the same model, where one face is articulating/ba/and the other/ga/, with one face congruent with the syllable sound being presented simultaneously, the other face incongruent. This method was successful in showing that infants at low risk can integrate audiovisual speech: they looked for the same amount of time at the mouths in both the fusible visual/ga/- audio/ba/and the congruent visual/ba/- audio/ba/displays, indicating that the auditory and visual streams fuse into a McGurk-type of syllabic percept in the incongruent condition. It also showed that low-risk infants could perceive a mismatch between auditory and visual cues: they looked longer at the mouth in the mismatched, non-fusible visual/ba/- audio/ga/display compared with the congruent visual/ga/- audio/ga/display, demonstrating that they perceive an uncommon, and therefore interesting, speech-like percept when looking at the incongruent mouth (repeated ANOVA: displays x fusion/mismatch conditions interaction: F(1,16 = 17.153, p = 0.001. The looking behaviour of high-risk infants did not differ according to the type of display, suggesting difficulties in matching auditory and visual information (repeated ANOVA, displays x conditions interaction: F(1,25 = 0.09, p = 0.767, in contrast to low-risk infants (repeated ANOVA: displays x conditions x low/high-risk groups interaction: F(1,41 = 4.466, p = 0.041. In some cases this reduced ability might lead to the poor communication skills characteristic of autism.

Cueing musical emotions: An empirical analysis of 24-piece sets by Bach and Chopin documents parallels with emotional speech.

Science.gov (United States)

Poon, Matthew; Schutz, Michael

2015-01-01

Acoustic cues such as pitch height and timing are effective at communicating emotion in both music and speech. Numerous experiments altering musical passages have shown that higher and faster melodies generally sound "happier" than lower and slower melodies, findings consistent with corpus analyses of emotional speech. However, equivalent corpus analyses of complex time-varying cues in music are less common, due in part to the challenges of assembling an appropriate corpus. Here, we describe a novel, score-based exploration of the use of pitch height and timing in a set of "balanced" major and minor key compositions. Our analysis included all 24 Preludes and 24 Fugues from Bach's Well-Tempered Clavier (book 1), as well as all 24 of Chopin's Preludes for piano. These three sets are balanced with respect to both modality (major/minor) and key chroma ("A," "B," "C," etc.). Consistent with predictions derived from speech, we found major-key (nominally "happy") pieces to be two semitones higher in pitch height and 29% faster than minor-key (nominally "sad") pieces. This demonstrates that our balanced corpus of major and minor key pieces uses low-level acoustic cues for emotion in a manner consistent with speech. A series of post hoc analyses illustrate interesting trade-offs, with sets featuring greater emphasis on timing distinctions between modalities exhibiting the least pitch distinction, and vice-versa. We discuss these findings in the broader context of speech-music research, as well as recent scholarship exploring the historical evolution of cue use in Western music.
A psychophysical imaging method evidencing auditory cue extraction during speech perception: a group analysis of auditory classification images.

Directory of Open Access Journals (Sweden)

Léo Varnet

Full Text Available Although there is a large consensus regarding the involvement of specific acoustic cues in speech perception, the precise mechanisms underlying the transformation from continuous acoustical properties into discrete perceptual units remains undetermined. This gap in knowledge is partially due to the lack of a turnkey solution for isolating critical speech cues from natural stimuli. In this paper, we describe a psychoacoustic imaging method known as the Auditory Classification Image technique that allows experimenters to estimate the relative importance of time-frequency regions in categorizing natural speech utterances in noise. Importantly, this technique enables the testing of hypotheses on the listening strategies of participants at the group level. We exemplify this approach by identifying the acoustic cues involved in da/ga categorization with two phonetic contexts, Al- or Ar-. The application of Auditory Classification Images to our group of 16 participants revealed significant critical regions on the second and third formant onsets, as predicted by the literature, as well as an unexpected temporal cue on the first formant. Finally, through a cluster-based nonparametric test, we demonstrate that this method is sufficiently sensitive to detect fine modifications of the classification strategies between different utterances of the same phoneme.
Spectral integration in speech and non-speech sounds

Science.gov (United States)

Jacewicz, Ewa

2005-04-01

Spectral integration (or formant averaging) was proposed in vowel perception research to account for the observation that a reduction of the intensity of one of two closely spaced formants (as in /u/) produced a predictable shift in vowel quality [Delattre et al., Word 8, 195-210 (1952)]. A related observation was reported in psychoacoustics, indicating that when the components of a two-tone periodic complex differ in amplitude and frequency, its perceived pitch is shifted toward that of the more intense tone [Helmholtz, App. XIV (1875/1948)]. Subsequent research in both fields focused on the frequency interval that separates these two spectral components, in an attempt to determine the size of the bandwidth for spectral integration to occur. This talk will review the accumulated evidence for and against spectral integration within the hypothesized limit of 3.5 Bark for static and dynamic signals in speech perception and psychoacoustics. Based on similarities in the processing of speech and non-speech sounds, it is suggested that spectral integration may reflect a general property of the auditory system. A larger frequency bandwidth, possibly close to 3.5 Bark, may be utilized in integrating acoustic information, including speech, complex signals, or sound quality of a violin.
PRACTICING SPEECH THERAPY INTERVENTION FOR SOCIAL INTEGRATION OF CHILDREN WITH SPEECH DISORDERS

Directory of Open Access Journals (Sweden)

Martin Ofelia POPESCU

2016-11-01

Full Text Available The article presents a concise speech correction intervention program in of dyslalia in conjunction with capacity development of intra, interpersonal and social integration of children with speech disorders. The program main objectives represent: the potential increasing of individual social integration by correcting speech disorders in conjunction with intra- and interpersonal capacity, the potential growth of children and community groups for social integration by optimizing the socio-relational context of children with speech disorder. In the program were included 60 children / students with dyslalia speech disorders (monomorphic and polymorphic dyslalia, from 11 educational institutions - 6 kindergartens and 5 schools / secondary schools, joined with inter-school logopedic centre (CLI from Targu Jiu city and areas of Gorj district. The program was implemented under the assumption that therapeutic-formative intervention to correct speech disorders and facilitate the social integration will lead, in combination with correct pronunciation disorders, to social integration optimization of children with speech disorders. The results conirm the hypothesis and gives facts about the intervention program eficiency.
Electrophysiological evidence for speech-specific audiovisual integration.

Science.gov (United States)

Baart, Martijn; Stekelenburg, Jeroen J; Vroomen, Jean

2014-01-01

Lip-read speech is integrated with heard speech at various neural levels. Here, we investigated the extent to which lip-read induced modulations of the auditory N1 and P2 (measured with EEG) are indicative of speech-specific audiovisual integration, and we explored to what extent the ERPs were modulated by phonetic audiovisual congruency. In order to disentangle speech-specific (phonetic) integration from non-speech integration, we used Sine-Wave Speech (SWS) that was perceived as speech by half of the participants (they were in speech-mode), while the other half was in non-speech mode. Results showed that the N1 obtained with audiovisual stimuli peaked earlier than the N1 evoked by auditory-only stimuli. This lip-read induced speeding up of the N1 occurred for listeners in speech and non-speech mode. In contrast, if listeners were in speech-mode, lip-read speech also modulated the auditory P2, but not if listeners were in non-speech mode, thus revealing speech-specific audiovisual binding. Comparing ERPs for phonetically congruent audiovisual stimuli with ERPs for incongruent stimuli revealed an effect of phonetic stimulus congruency that started at ~200 ms after (in)congruence became apparent. Critically, akin to the P2 suppression, congruency effects were only observed if listeners were in speech mode, and not if they were in non-speech mode. Using identical stimuli, we thus confirm that audiovisual binding involves (partially) different neural mechanisms for sound processing in speech and non-speech mode. © 2013 Published by Elsevier Ltd.
Cueing musical emotions: An empirical analysis of 24-piece sets by Bach and Chopin documents parallels with emotional speech

Directory of Open Access Journals (Sweden)

Matthew ePoon

2015-11-01

Full Text Available Acoustic cues such as pitch height and timing are effective at communicating emotion in both music and speech. Numerous experiments altering musical passages have shown that higher and faster melodies generally sound happier than lower and slower melodies, findings consistent with corpus analyses of emotional speech. However, equivalent corpus analyses of complex time-varying cues in music are less common, due in part to the challenges of assembling an appropriate corpus. Here we describe a novel, score-based exploration of the use of pitch height and timing in a set of balanced major and minor key compositions. Our corpus contained all 24 Preludes and 24 Fugues from Bach’s Well Tempered Clavier (book 1, as well as all 24 of Chopin’s Preludes for piano. These three sets are balanced with respect to both modality (major/minor and key chroma (A, B, C, etc.. Consistent with predictions derived from speech, we found major-key (nominally happy pieces to be two semitones higher in pitch height and 29% faster than minor-key (nominally sad pieces. This demonstrates that our balanced corpus of major and minor key pieces uses low-level acoustic cues for emotion in a manner consistent with speech. A series of post-hoc analyses illustrate interesting trade-offs, with
Comparison of Gated Audiovisual Speech Identification in Elderly Hearing Aid Users and Elderly Normal-Hearing Individuals: Effects of Adding Visual Cues to Auditory Speech Stimuli.

Science.gov (United States)

Moradi, Shahram; Lidestam, Björn; Rönnberg, Jerker

2016-06-17

The present study compared elderly hearing aid (EHA) users (n = 20) with elderly normal-hearing (ENH) listeners (n = 20) in terms of isolation points (IPs, the shortest time required for correct identification of a speech stimulus) and accuracy of audiovisual gated speech stimuli (consonants, words, and final words in highly and less predictable sentences) presented in silence. In addition, we compared the IPs of audiovisual speech stimuli from the present study with auditory ones extracted from a previous study, to determine the impact of the addition of visual cues. Both participant groups achieved ceiling levels in terms of accuracy in the audiovisual identification of gated speech stimuli; however, the EHA group needed longer IPs for the audiovisual identification of consonants and words. The benefit of adding visual cues to auditory speech stimuli was more evident in the EHA group, as audiovisual presentation significantly shortened the IPs for consonants, words, and final words in less predictable sentences; in the ENH group, audiovisual presentation only shortened the IPs for consonants and words. In conclusion, although the audiovisual benefit was greater for EHA group, this group had inferior performance compared with the ENH group in terms of IPs when supportive semantic context was lacking. Consequently, EHA users needed the initial part of the audiovisual speech signal to be longer than did their counterparts with normal hearing to reach the same level of accuracy in the absence of a semantic context. © The Author(s) 2016.
Spectro-temporal cues enhance modulation sensitivity in cochlear implant users

Science.gov (United States)

Zheng, Yi; Escabí, Monty; Litovsky, Ruth Y.

2018-01-01

Although speech understanding is highly variable amongst cochlear implants (CIs) subjects, the remarkably high speech recognition performance of many CI users is unexpected and not well understood. Numerous factors, including neural health and degradation of the spectral information in the speech signal of CIs, likely contribute to speech understanding. We studied the ability to use spectro-temporal modulations, which may be critical for speech understanding and discrimination, and hypothesize that CI users adopt a different perceptual strategy than normal-hearing (NH) individuals, whereby they rely more heavily on joint spectro-temporal cues to enhance detection of auditory cues. Modulation detection sensitivity was studied in CI users and NH subjects using broadband “ripple” stimuli that were modulated spectrally, temporally, or jointly, i.e., spectro-temporally. The spectro-temporal modulation transfer functions of CI users and NH subjects was decomposed into spectral and temporal dimensions and compared to those subjects’ spectral-only and temporal-only modulation transfer functions. In CI users, the joint spectro-temporal sensitivity was better than that predicted by spectral-only and temporal-only sensitivity, indicating a heightened spectro-temporal sensitivity. Such an enhancement through the combined integration of spectral and temporal cues was not observed in NH subjects. The unique use of spectro-temporal cues by CI patients can yield benefits for use of cues that are important for speech understanding. This finding has implications for developing sound processing strategies that may rely on joint spectro-temporal modulations to improve speech comprehension of CI users, and the findings of this study may be valuable for developing clinical assessment tools to optimize CI processor performance. PMID:28601530
Spectro-temporal cues enhance modulation sensitivity in cochlear implant users.

Science.gov (United States)

Zheng, Yi; Escabí, Monty; Litovsky, Ruth Y

2017-08-01

Although speech understanding is highly variable amongst cochlear implants (CIs) subjects, the remarkably high speech recognition performance of many CI users is unexpected and not well understood. Numerous factors, including neural health and degradation of the spectral information in the speech signal of CIs, likely contribute to speech understanding. We studied the ability to use spectro-temporal modulations, which may be critical for speech understanding and discrimination, and hypothesize that CI users adopt a different perceptual strategy than normal-hearing (NH) individuals, whereby they rely more heavily on joint spectro-temporal cues to enhance detection of auditory cues. Modulation detection sensitivity was studied in CI users and NH subjects using broadband "ripple" stimuli that were modulated spectrally, temporally, or jointly, i.e., spectro-temporally. The spectro-temporal modulation transfer functions of CI users and NH subjects was decomposed into spectral and temporal dimensions and compared to those subjects' spectral-only and temporal-only modulation transfer functions. In CI users, the joint spectro-temporal sensitivity was better than that predicted by spectral-only and temporal-only sensitivity, indicating a heightened spectro-temporal sensitivity. Such an enhancement through the combined integration of spectral and temporal cues was not observed in NH subjects. The unique use of spectro-temporal cues by CI patients can yield benefits for use of cues that are important for speech understanding. This finding has implications for developing sound processing strategies that may rely on joint spectro-temporal modulations to improve speech comprehension of CI users, and the findings of this study may be valuable for developing clinical assessment tools to optimize CI processor performance. Copyright © 2017 Elsevier B.V. All rights reserved.
INTEGRATING MACHINE TRANSLATION AND SPEECH SYNTHESIS COMPONENT FOR ENGLISH TO DRAVIDIAN LANGUAGE SPEECH TO SPEECH TRANSLATION SYSTEM

Directory of Open Access Journals (Sweden)

J. SANGEETHA

2015-02-01

Full Text Available This paper provides an interface between the machine translation and speech synthesis system for converting English speech to Tamil text in English to Tamil speech to speech translation system. The speech translation system consists of three modules: automatic speech recognition, machine translation and text to speech synthesis. Many procedures for incorporation of speech recognition and machine translation have been projected. Still speech synthesis system has not yet been measured. In this paper, we focus on integration of machine translation and speech synthesis, and report a subjective evaluation to investigate the impact of speech synthesis, machine translation and the integration of machine translation and speech synthesis components. Here we implement a hybrid machine translation (combination of rule based and statistical machine translation and concatenative syllable based speech synthesis technique. In order to retain the naturalness and intelligibility of synthesized speech Auto Associative Neural Network (AANN prosody prediction is used in this work. The results of this system investigation demonstrate that the naturalness and intelligibility of the synthesized speech are strongly influenced by the fluency and correctness of the translated text.
[Intermodal timing cues for audio-visual speech recognition].

Science.gov (United States)

Hashimoto, Masahiro; Kumashiro, Masaharu

2004-06-01

The purpose of this study was to investigate the limitations of lip-reading advantages for Japanese young adults by desynchronizing visual and auditory information in speech. In the experiment, audio-visual speech stimuli were presented under the six test conditions: audio-alone, and audio-visually with either 0, 60, 120, 240 or 480 ms of audio delay. The stimuli were the video recordings of a face of a female Japanese speaking long and short Japanese sentences. The intelligibility of the audio-visual stimuli was measured as a function of audio delays in sixteen untrained young subjects. Speech intelligibility under the audio-delay condition of less than 120 ms was significantly better than that under the audio-alone condition. On the other hand, the delay of 120 ms corresponded to the mean mora duration measured for the audio stimuli. The results implied that audio delays of up to 120 ms would not disrupt lip-reading advantage, because visual and auditory information in speech seemed to be integrated on a syllabic time scale. Potential applications of this research include noisy workplace in which a worker must extract relevant speech from all the other competing noises.
Electrophysiological evidence for speech-specific audiovisual integration

NARCIS (Netherlands)

Baart, M.; Stekelenburg, J.J.; Vroomen, J.

2014-01-01

Lip-read speech is integrated with heard speech at various neural levels. Here, we investigated the extent to which lip-read induced modulations of the auditory N1 and P2 (measured with EEG) are indicative of speech-specific audiovisual integration, and we explored to what extent the ERPs were
Multistage audiovisual integration of speech: dissociating identification and detection.

Science.gov (United States)

Eskelund, Kasper; Tuomainen, Jyrki; Andersen, Tobias S

2011-02-01

Speech perception integrates auditory and visual information. This is evidenced by the McGurk illusion where seeing the talking face influences the auditory phonetic percept and by the audiovisual detection advantage where seeing the talking face influences the detectability of the acoustic speech signal. Here, we show that identification of phonetic content and detection can be dissociated as speech-specific and non-specific audiovisual integration effects. To this end, we employed synthetically modified stimuli, sine wave speech (SWS), which is an impoverished speech signal that only observers informed of its speech-like nature recognize as speech. While the McGurk illusion only occurred for informed observers, the audiovisual detection advantage occurred for naïve observers as well. This finding supports a multistage account of audiovisual integration of speech in which the many attributes of the audiovisual speech signal are integrated by separate integration processes.
A chimpanzee recognizes synthetic speech with significantly reduced acoustic cues to phonetic content.

Science.gov (United States)

Heimbauer, Lisa A; Beran, Michael J; Owren, Michael J

2011-07-26

A long-standing debate concerns whether humans are specialized for speech perception, which some researchers argue is demonstrated by the ability to understand synthetic speech with significantly reduced acoustic cues to phonetic content. We tested a chimpanzee (Pan troglodytes) that recognizes 128 spoken words, asking whether she could understand such speech. Three experiments presented 48 individual words, with the animal selecting a corresponding visuographic symbol from among four alternatives. Experiment 1 tested spectrally reduced, noise-vocoded (NV) synthesis, originally developed to simulate input received by human cochlear-implant users. Experiment 2 tested "impossibly unspeechlike" sine-wave (SW) synthesis, which reduces speech to just three moving tones. Although receiving only intermittent and noncontingent reward, the chimpanzee performed well above chance level, including when hearing synthetic versions for the first time. Recognition of SW words was least accurate but improved in experiment 3 when natural words in the same session were rewarded. The chimpanzee was more accurate with NV than SW versions, as were 32 human participants hearing these items. The chimpanzee's ability to spontaneously recognize acoustically reduced synthetic words suggests that experience rather than specialization is critical for speech-perception capabilities that some have suggested are uniquely human. Copyright © 2011 Elsevier Ltd. All rights reserved.
Music expertise shapes audiovisual temporal integration windows for speech, sinewave speech and music

Directory of Open Access Journals (Sweden)

Hwee Ling eLee

2014-08-01

Full Text Available This psychophysics study used musicians as a model to investigate whether musical expertise shapes the temporal integration window for audiovisual speech, sinewave speech or music. Musicians and non-musicians judged the audiovisual synchrony of speech, sinewave analogues of speech, and music stimuli at 13 audiovisual stimulus onset asynchronies (±360, ±300 ±240, ±180, ±120, ±60, and 0 ms. Further, we manipulated the duration of the stimuli by presenting sentences/melodies or syllables/tones. Critically, musicians relative to non-musicians exhibited significantly narrower temporal integration windows for both music and sinewave speech. Further, the temporal integration window for music decreased with the amount of music practice, but not with age of acquisition. In other words, the more musicians practiced piano in the past three years, the more sensitive they became to the temporal misalignment of visual and auditory signals. Collectively, our findings demonstrate that music practicing fine-tunes the audiovisual temporal integration window to various extents depending on the stimulus class. While the effect of piano practicing was most pronounced for music, it also generalized to other stimulus classes such as sinewave speech and to a marginally significant degree to natural speech.
Integration of speech and gesture in aphasia.

Science.gov (United States)

Cocks, Naomi; Byrne, Suzanne; Pritchard, Madeleine; Morgan, Gary; Dipper, Lucy

2018-02-07

Information from speech and gesture is often integrated to comprehend a message. This integration process requires the appropriate allocation of cognitive resources to both the gesture and speech modalities. People with aphasia are likely to find integration of gesture and speech difficult. This is due to a reduction in cognitive resources, a difficulty with resource allocation or a combination of the two. Despite it being likely that people who have aphasia will have difficulty with integration, empirical evidence describing this difficulty is limited. Such a difficulty was found in a single case study by Cocks et al. in 2009, and is replicated here with a greater number of participants. To determine whether individuals with aphasia have difficulties understanding messages in which they have to integrate speech and gesture. Thirty-one participants with aphasia (PWA) and 30 control participants watched videos of an actor communicating a message in three different conditions: verbal only, gesture only, and verbal and gesture message combined. The message related to an action in which the name of the action (e.g., 'eat') was provided verbally and the manner of the action (e.g., hands in a position as though eating a burger) was provided gesturally. Participants then selected a picture that 'best matched' the message conveyed from a choice of four pictures which represented a gesture match only (G match), a verbal match only (V match), an integrated verbal-gesture match (Target) and an unrelated foil (UR). To determine the gain that participants obtained from integrating gesture and speech, a measure of multimodal gain (MMG) was calculated. The PWA were less able to integrate gesture and speech than the control participants and had significantly lower MMG scores. When the PWA had difficulty integrating, they more frequently selected the verbal match. The findings suggest that people with aphasia can have difficulty integrating speech and gesture in order to obtain
Aging and Spectro-Temporal Integration of Speech

Directory of Open Access Journals (Sweden)

John H. Grose

2016-10-01

Full Text Available The purpose of this study was to determine the effects of age on the spectro-temporal integration of speech. The hypothesis was that the integration of speech fragments distributed over frequency, time, and ear of presentation is reduced in older listeners—even for those with good audiometric hearing. Younger, middle-aged, and older listeners (10 per group with good audiometric hearing participated. They were each tested under seven conditions that encompassed combinations of spectral, temporal, and binaural integration. Sentences were filtered into two bands centered at 500 Hz and 2500 Hz, with criterion bandwidth tailored for each participant. In some conditions, the speech bands were individually square wave interrupted at a rate of 10 Hz. Configurations of uninterrupted, synchronously interrupted, and asynchronously interrupted frequency bands were constructed that constituted speech fragments distributed across frequency, time, and ear of presentation. The over-arching finding was that, for most configurations, performance was not differentially affected by listener age. Although speech intelligibility varied across condition, there was no evidence of performance deficits in older listeners in any condition. This study indicates that age, per se, does not necessarily undermine the ability to integrate fragments of speech dispersed across frequency and time.
Multistage audiovisual integration of speech: dissociating identification and detection

DEFF Research Database (Denmark)

Eskelund, Kasper; Tuomainen, Jyrki; Andersen, Tobias

2011-01-01

Speech perception integrates auditory and visual information. This is evidenced by the McGurk illusion where seeing the talking face influences the auditory phonetic percept and by the audiovisual detection advantage where seeing the talking face influences the detectability of the acoustic speech...... signal. Here we show that identification of phonetic content and detection can be dissociated as speech-specific and non-specific audiovisual integration effects. To this end, we employed synthetically modified stimuli, sine wave speech (SWS), which is an impoverished speech signal that only observers...... informed of its speech-like nature recognize as speech. While the McGurk illusion only occurred for informed observers the audiovisual detection advantage occurred for naïve observers as well. This finding supports a multi-stage account of audiovisual integration of speech in which the many attributes...
Electrophysiological assessment of audiovisual integration in speech perception

DEFF Research Database (Denmark)

Eskelund, Kasper; Dau, Torsten

Speech perception integrates signal from ear and eye. This is witnessed by a wide range of audiovisual integration effects, such as ventriloquism and the McGurk illusion. Some behavioral evidence suggest that audiovisual integration of specific aspects is special for speech perception. However, our...... knowledge of such bimodal integration would be strengthened if the phenomena could be investigated by objective, neutrally based methods. One key question of the present work is if perceptual processing of audiovisual speech can be gauged with a specific signature of neurophysiological activity...... on the auditory speech percept? In two experiments, which both combine behavioral and neurophysiological measures, an uncovering of the relation between perception of faces and of audiovisual integration is attempted. Behavioral findings suggest a strong effect of face perception, whereas the MMN results are less...
Audiovisual integration for speech during mid-childhood: Electrophysiological evidence

Science.gov (United States)

Kaganovich, Natalya; Schumaker, Jennifer

2014-01-01

Previous studies have demonstrated that the presence of visual speech cues reduces the amplitude and latency of the N1 and P2 event-related potential (ERP) components elicited by speech stimuli. However, the developmental trajectory of this effect is not yet fully mapped. We examined ERP responses to auditory, visual, and audiovisual speech in two groups of school-age children (7–8-year-olds and 10–11-year-olds) and in adults. Audiovisual speech led to the attenuation of the N1 and P2 components in all groups of participants, suggesting that the neural mechanisms underlying these effects are functional by early school years. Additionally, while the reduction in N1 was largest over the right scalp, the P2 attenuation was largest over the left and midline scalp. The difference in the hemispheric distribution of the N1 and P2 attenuation supports the idea that these components index at least somewhat disparate neural processes within the context of audiovisual speech perception. PMID:25463815

Treating speech subsystems in childhood apraxia of speech with tactual input: the PROMPT approach.

Science.gov (United States)

Dale, Philip S; Hayden, Deborah A

2013-11-01

Prompts for Restructuring Oral Muscular Phonetic Targets (PROMPT; Hayden, 2004; Hayden, Eigen, Walker, & Olsen, 2010)-a treatment approach for the improvement of speech sound disorders in children-uses tactile-kinesthetic- proprioceptive (TKP) cues to support and shape movements of the oral articulators. No research to date has systematically examined the efficacy of PROMPT for children with childhood apraxia of speech (CAS). Four children (ages 3;6 [years;months] to 4;8), all meeting the American Speech-Language-Hearing Association (2007) criteria for CAS, were treated using PROMPT. All children received 8 weeks of 2 × per week treatment, including at least 4 weeks of full PROMPT treatment that included TKP cues. During the first 4 weeks, 2 of the 4 children received treatment that included all PROMPT components except TKP cues. This design permitted both between-subjects and within-subjects comparisons to evaluate the effect of TKP cues. Gains in treatment were measured by standardized tests and by criterion-referenced measures based on the production of untreated probe words, reflecting change in speech movements and auditory perceptual accuracy. All 4 children made significant gains during treatment, but measures of motor speech control and untreated word probes provided evidence for more gain when TKP cues were included. PROMPT as a whole appears to be effective for treating children with CAS, and the inclusion of TKP cues appears to facilitate greater effect.
Cue integration vs. exemplar-based reasoning in multi-attribute decisions from memory: A matter of cue representation

OpenAIRE

Arndt Broeder; Ben R. Newell; Christine Platzer

2010-01-01

Inferences about target variables can be achieved by deliberate integration of probabilistic cues or by retrieving similar cue-patterns (exemplars) from memory. In tasks with cue information presented in on-screen displays, rule-based strategies tend to dominate unless the abstraction of cue-target relations is unfeasible. This dominance has also been demonstrated --- surprisingly --- in experiments that demanded the retrieval of cue values from memory (M. Persson \\& J. Rieskamp, 2009). In th...
Tuning Neural Phase Entrainment to Speech.

Science.gov (United States)

Falk, Simone; Lanzilotti, Cosima; Schön, Daniele

2017-08-01

Musical rhythm positively impacts on subsequent speech processing. However, the neural mechanisms underlying this phenomenon are so far unclear. We investigated whether carryover effects from a preceding musical cue to a speech stimulus result from a continuation of neural phase entrainment to periodicities that are present in both music and speech. Participants listened and memorized French metrical sentences that contained (quasi-)periodic recurrences of accents and syllables. Speech stimuli were preceded by a rhythmically regular or irregular musical cue. Our results show that the presence of a regular cue modulates neural response as estimated by EEG power spectral density, intertrial coherence, and source analyses at critical frequencies during speech processing compared with the irregular condition. Importantly, intertrial coherences for regular cues were indicative of the participants' success in memorizing the subsequent speech stimuli. These findings underscore the highly adaptive nature of neural phase entrainment across fundamentally different auditory stimuli. They also support current models of neural phase entrainment as a tool of predictive timing and attentional selection across cognitive domains.
Audiovisual integration in children listening to spectrally degraded speech.

Science.gov (United States)

Maidment, David W; Kang, Hi Jee; Stewart, Hannah J; Amitay, Sygal

2015-02-01

The study explored whether visual information improves speech identification in typically developing children with normal hearing when the auditory signal is spectrally degraded. Children (n=69) and adults (n=15) were presented with noise-vocoded sentences from the Children's Co-ordinate Response Measure (Rosen, 2011) in auditory-only or audiovisual conditions. The number of bands was adaptively varied to modulate the degradation of the auditory signal, with the number of bands required for approximately 79% correct identification calculated as the threshold. The youngest children (4- to 5-year-olds) did not benefit from accompanying visual information, in comparison to 6- to 11-year-old children and adults. Audiovisual gain also increased with age in the child sample. The current data suggest that children younger than 6 years of age do not fully utilize visual speech cues to enhance speech perception when the auditory signal is degraded. This evidence not only has implications for understanding the development of speech perception skills in children with normal hearing but may also inform the development of new treatment and intervention strategies that aim to remediate speech perception difficulties in pediatric cochlear implant users.
Audiovisual integration in speech perception: a multi-stage process

DEFF Research Database (Denmark)

Eskelund, Kasper; Tuomainen, Jyrki; Andersen, Tobias

2011-01-01

investigate whether the integration of auditory and visual speech observed in these two audiovisual integration effects are specific traits of speech perception. We further ask whether audiovisual integration is undertaken in a single processing stage or multiple processing stages....
How is the McGurk effect modulated by Cued Speech in deaf and hearing adults?

OpenAIRE

Bayard, Clémence; Colin, Cécile; Leybaert, Jacqueline

2014-01-01

Speech perception for both hearing and deaf people involves an integrative process between auditory and lip-reading information. In order to disambiguate information from lips, manual cues from Cued Speech may be added. Cued Speech (CS) is a system of manual aids developed to help deaf people to clearly and completely understand speech visually (Cornett, 1967). Within this system, both labial and manual information, as lone input sources, remain ambiguous. Perceivers, therefore, have to combi...
Suppression of the µ rhythm during speech and non-speech discrimination revealed by independent component analysis: implications for sensorimotor integration in speech processing.

Science.gov (United States)

Bowers, Andrew; Saltuklaroglu, Tim; Harkrider, Ashley; Cuellar, Megan

2013-01-01

Constructivist theories propose that articulatory hypotheses about incoming phonetic targets may function to enhance perception by limiting the possibilities for sensory analysis. To provide evidence for this proposal, it is necessary to map ongoing, high-temporal resolution changes in sensorimotor activity (i.e., the sensorimotor μ rhythm) to accurate speech and non-speech discrimination performance (i.e., correct trials.). Sixteen participants (15 female and 1 male) were asked to passively listen to or actively identify speech and tone-sweeps in a two-force choice discrimination task while the electroencephalograph (EEG) was recorded from 32 channels. The stimuli were presented at signal-to-noise ratios (SNRs) in which discrimination accuracy was high (i.e., 80-100%) and low SNRs producing discrimination performance at chance. EEG data were decomposed using independent component analysis and clustered across participants using principle component methods in EEGLAB. ICA revealed left and right sensorimotor µ components for 14/16 and 13/16 participants respectively that were identified on the basis of scalp topography, spectral peaks, and localization to the precentral and postcentral gyri. Time-frequency analysis of left and right lateralized µ component clusters revealed significant (pFDRspeech discrimination trials relative to chance trials following stimulus offset. Findings are consistent with constructivist, internal model theories proposing that early forward motor models generate predictions about likely phonemic units that are then synthesized with incoming sensory cues during active as opposed to passive processing. Future directions and possible translational value for clinical populations in which sensorimotor integration may play a functional role are discussed.
Audiovisual integration of speech in a patient with Broca's Aphasia

Science.gov (United States)

Andersen, Tobias S.; Starrfelt, Randi

2015-01-01

Lesions to Broca's area cause aphasia characterized by a severe impairment of the ability to speak, with comparatively intact speech perception. However, some studies have found effects on speech perception under adverse listening conditions, indicating that Broca's area is also involved in speech perception. While these studies have focused on auditory speech perception other studies have shown that Broca's area is activated by visual speech perception. Furthermore, one preliminary report found that a patient with Broca's aphasia did not experience the McGurk illusion suggesting that an intact Broca's area is necessary for audiovisual integration of speech. Here we describe a patient with Broca's aphasia who experienced the McGurk illusion. This indicates that an intact Broca's area is not necessary for audiovisual integration of speech. The McGurk illusions this patient experienced were atypical, which could be due to Broca's area having a more subtle role in audiovisual integration of speech. The McGurk illusions of a control subject with Wernicke's aphasia were, however, also atypical. This indicates that the atypical McGurk illusions were due to deficits in speech processing that are not specific to Broca's aphasia. PMID:25972819
Word segmentation with universal prosodic cues.

Science.gov (United States)

Endress, Ansgar D; Hauser, Marc D

2010-09-01

When listening to speech from one's native language, words seem to be well separated from one another, like beads on a string. When listening to a foreign language, in contrast, words seem almost impossible to extract, as if there was only one bead on the same string. This contrast reveals that there are language-specific cues to segmentation. The puzzle, however, is that infants must be endowed with a language-independent mechanism for segmentation, as they ultimately solve the segmentation problem for any native language. Here, we approach the acquisition problem by asking whether there are language-independent cues to segmentation that might be available to even adult learners who have already acquired a native language. We show that adult learners recognize words in connected speech when only prosodic cues to word-boundaries are given from languages unfamiliar to the participants. In both artificial and natural speech, adult English speakers, with no prior exposure to the test languages, readily recognized words in natural languages with critically different prosodic patterns, including French, Turkish and Hungarian. We suggest that, even though languages differ in their sound structures, they carry universal prosodic characteristics. Further, these language-invariant prosodic cues provide a universally accessible mechanism for finding words in connected speech. These cues may enable infants to start acquiring words in any language even before they are fine-tuned to the sound structure of their native language. Copyright © 2010. Published by Elsevier Inc.
Multisensory integration of speech sounds with letters vs. visual speech : only visual speech induces the mismatch negativity

NARCIS (Netherlands)

Stekelenburg, J.J.; Keetels, M.N.; Vroomen, J.H.M.

2018-01-01

Numerous studies have demonstrated that the vision of lip movements can alter the perception of auditory speech syllables (McGurk effect). While there is ample evidence for integration of text and auditory speech, there are only a few studies on the orthographic equivalent of the McGurk effect.
Emotionally conditioning the target-speech voice enhances recognition of the target speech under "cocktail-party" listening conditions.

Science.gov (United States)

Lu, Lingxi; Bao, Xiaohan; Chen, Jing; Qu, Tianshu; Wu, Xihong; Li, Liang

2018-05-01

Under a noisy "cocktail-party" listening condition with multiple people talking, listeners can use various perceptual/cognitive unmasking cues to improve recognition of the target speech against informational speech-on-speech masking. One potential unmasking cue is the emotion expressed in a speech voice, by means of certain acoustical features. However, it was unclear whether emotionally conditioning a target-speech voice that has none of the typical acoustical features of emotions (i.e., an emotionally neutral voice) can be used by listeners for enhancing target-speech recognition under speech-on-speech masking conditions. In this study we examined the recognition of target speech against a two-talker speech masker both before and after the emotionally neutral target voice was paired with a loud female screaming sound that has a marked negative emotional valence. The results showed that recognition of the target speech (especially the first keyword in a target sentence) was significantly improved by emotionally conditioning the target speaker's voice. Moreover, the emotional unmasking effect was independent of the unmasking effect of the perceived spatial separation between the target speech and the masker. Also, (skin conductance) electrodermal responses became stronger after emotional learning when the target speech and masker were perceptually co-located, suggesting an increase of listening efforts when the target speech was informationally masked. These results indicate that emotionally conditioning the target speaker's voice does not change the acoustical parameters of the target-speech stimuli, but the emotionally conditioned vocal features can be used as cues for unmasking target speech.
Audiovisual Integration of Speech in a Patient with Broca’s Aphasia

DEFF Research Database (Denmark)

Andersen, Tobias; Starrfelt, Randi

2015-01-01

's area is necessary for audiovisual integration of speech. Here we describe a patient with Broca's aphasia who experienced the McGurk illusion. This indicates that an intact Broca's area is not necessary for audiovisual integration of speech. The McGurk illusions this patient experienced were atypical......, which could be due to Broca's area having a more subtle role in audiovisual integration of speech. The McGurk illusions of a control subject with Wernicke's aphasia were, however, also atypical. This indicates that the atypical McGurk illusions were due to deficits in speech processing...
Discrimination and streaming of speech sounds based on differences in interaural and spectral cues.

Science.gov (United States)

David, Marion; Lavandier, Mathieu; Grimault, Nicolas; Oxenham, Andrew J

2017-09-01

Differences in spatial cues, including interaural time differences (ITDs), interaural level differences (ILDs) and spectral cues, can lead to stream segregation of alternating noise bursts. It is unknown how effective such cues are for streaming sounds with realistic spectro-temporal variations. In particular, it is not known whether the high-frequency spectral cues associated with elevation remain sufficiently robust under such conditions. To answer these questions, sequences of consonant-vowel tokens were generated and filtered by non-individualized head-related transfer functions to simulate the cues associated with different positions in the horizontal and median planes. A discrimination task showed that listeners could discriminate changes in interaural cues both when the stimulus remained constant and when it varied between presentations. However, discrimination of changes in spectral cues was much poorer in the presence of stimulus variability. A streaming task, based on the detection of repeated syllables in the presence of interfering syllables, revealed that listeners can use both interaural and spectral cues to segregate alternating syllable sequences, despite the large spectro-temporal differences between stimuli. However, only the full complement of spatial cues (ILDs, ITDs, and spectral cues) resulted in obligatory streaming in a task that encouraged listeners to integrate the tokens into a single stream.
Patients with hippocampal amnesia successfully integrate gesture and speech.

Science.gov (United States)

Hilverman, Caitlin; Clough, Sharice; Duff, Melissa C; Cook, Susan Wagner

2018-06-19

During conversation, people integrate information from co-speech hand gestures with information in spoken language. For example, after hearing the sentence, "A piece of the log flew up and hit Carl in the face" while viewing a gesture directed at the nose, people tend to later report that the log hit Carl in the nose (information only in gesture) rather than in the face (information in speech). The cognitive and neural mechanisms that support the integration of gesture with speech are unclear. One possibility is that the hippocampus - known for its role in relational memory and information integration - is necessary for integrating gesture and speech. To test this possibility, we examined how patients with hippocampal amnesia and healthy and brain-damaged comparison participants express information from gesture in a narrative retelling task. Participants watched videos of an experimenter telling narratives that included hand gestures that contained supplementary information. Participants were asked to retell the narratives and their spoken retellings were assessed for the presence of information from gesture. For features that had been accompanied by supplementary gesture, patients with amnesia retold fewer of these features overall and fewer retellings that matched the speech from the narrative. Yet their retellings included features that contained information that had been present uniquely in gesture in amounts that were not reliably different from comparison groups. Thus, a functioning hippocampus is not necessary for gesture-speech integration over short timescales. Providing unique information in gesture may enhance communication for individuals with declarative memory impairment, possibly via non-declarative memory mechanisms. Copyright © 2018. Published by Elsevier Ltd.
Perception and the temporal properties of speech

Science.gov (United States)

Gordon, Peter C.

1991-11-01

Four experiments addressing the role of attention in phonetic perception are reported. The first experiment shows that the relative importance of two cues to the voicing distinction changes when subjects must perform an arithmetic distractor task at the same time as identifying a speech stimulus. The voice onset time cue loses phonetic significance when subjects are distracted, while the F0 onset frequency cue does not. The second experiment shows a similar pattern for two cues to the distinction between the vowels /i/ (as in 'beat') and /I/ (as in 'bit'). Together these experiments indicate that careful attention to speech perception is necessary for strong acoustic cues to achieve their full phonetic impact, while weaker acoustic cues achieve their full phonetic impact without close attention. Experiment 3 shows that this pattern is obtained when the distractor task places little demand on verbal short term memory. Experiment 4 provides a large data set for testing formal models of the role of attention in speech perception. Attention is shown to influence the signal to noise ratio in phonetic encoding. This principle is instantiated in a network model in which the role of attention is to reduce noise in the phonetic encoding of acoustic cues. Implications of this work for understanding speech perception and general theories of the role of attention in perception are discussed.
Brain responses and looking behaviour during audiovisual speech integration in infants predict auditory speech comprehension in the second year of life.

Directory of Open Access Journals (Sweden)

Elena V Kushnerenko

2013-07-01

Full Text Available The use of visual cues during the processing of audiovisual speech is known to be less efficient in children and adults with language difficulties and difficulties are known to be more prevalent in children from low-income populations. In the present study, we followed an economically diverse group of thirty-seven infants longitudinally from 6-9 months to 14-16 months of age. We used eye-tracking to examine whether individual differences in visual attention during audiovisual processing of speech in 6 to 9 month old infants, particularly when processing congruent and incongruent auditory and visual speech cues, might be indicative of their later language development. Twenty-two of these 6-9 month old infants also participated in an event-related potential (ERP audiovisual task within the same experimental session. Language development was then followed-up at the age of 14-16 months, using two measures of language development, the Preschool Language Scale (PLS and the Oxford Communicative Development Inventory (CDI. The results show that those infants who were less efficient in auditory speech processing at the age of 6-9 months had lower receptive language scores at 14-16 months. A correlational analysis revealed that the pattern of face scanning and ERP responses to audio-visually incongruent stimuli at 6-9 months were both significantly associated with language development at 14-16 months. These findings add to the understanding of individual differences in neural signatures of audiovisual processing and associated looking behaviour in infants.
Audiovisual Integration of Speech in a Patient with Broca’s Aphasia

DEFF Research Database (Denmark)

Andersen, Tobias; Starrfelt, Randi

2015-01-01

perception. While these studies have focused on auditory speech perception other studies have shown that Broca's area is activated by visual speech perception. Furthermore, one preliminary report found that a patient with Broca's aphasia did not experience the McGurk illusion suggesting that an intact Broca......Lesions to Broca's area cause aphasia characterized by a severe impairment of the ability to speak, with comparatively intact speech perception. However, some studies have found effects on speech perception under adverse listening conditions, indicating that Broca's area is also involved in speech......'s area is necessary for audiovisual integration of speech. Here we describe a patient with Broca's aphasia who experienced the McGurk illusion. This indicates that an intact Broca's area is not necessary for audiovisual integration of speech. The McGurk illusions this patient experienced were atypical...
Integration of auditory and visual speech information

NARCIS (Netherlands)

Hall, M.; Smeele, P.M.T.; Kuhl, P.K.

1998-01-01

The integration of auditory and visual speech is observed when modes specify different places of articulation. Influences of auditory variation on integration were examined using consonant identifi-cation, plus quality and similarity ratings. Auditory identification predicted auditory-visual
Audiovisual Integration of Speech in a Patient with Broca’s Aphasia

Directory of Open Access Journals (Sweden)

Tobias Søren Andersen

2015-04-01

Full Text Available Lesions to Broca’s area cause aphasia characterised by a severe impairment of the ability to speak, with comparatively intact speech perception. However, some studies have found effects on speech perception under adverse listening conditions, indicating that Broca’s area is also involved in speech perception. While these studies have focused on auditory speech perception other studies have shown that Broca’s area is activated by visual speech perception. Furthermore, one preliminary report found that a patient with Broca’s aphasia did not experience the McGurk illusion suggesting that an intact Broca’s area is necessary for audiovisual integration of speech. Here we describe a patient with Broca’s aphasia who experienced the McGurk illusion. This indicates that an intact Broca’s area is not necessary for audiovisual integration of speech. The McGurk illusions this patient experienced were atypical, which could be due to Broca’s area having a more subtle role in audiovisual integration of speech. The McGurk illusions of a control subject with Wernicke’s aphasia were, however, also atypical. This indicates that the atypical McGurk illusions were due to deficits in speech processing that are not specific to Broca’s aphasia.
Visual cues and listening effort: individual variability.

Science.gov (United States)

Picou, Erin M; Ricketts, Todd A; Hornsby, Benjamin W Y

2011-10-01

To investigate the effect of visual cues on listening effort as well as whether predictive variables such as working memory capacity (WMC) and lipreading ability affect the magnitude of listening effort. Twenty participants with normal hearing were tested using a paired-associates recall task in 2 conditions (quiet and noise) and 2 presentation modalities (audio only [AO] and auditory-visual [AV]). Signal-to-noise ratios were adjusted to provide matched speech recognition across audio-only and AV noise conditions. Also measured were subjective perceptions of listening effort and 2 predictive variables: (a) lipreading ability and (b) WMC. Objective and subjective results indicated that listening effort increased in the presence of noise, but on average the addition of visual cues did not significantly affect the magnitude of listening effort. Although there was substantial individual variability, on average participants who were better lipreaders or had larger WMCs demonstrated reduced listening effort in noise in AV conditions. Overall, the results support the hypothesis that integrating auditory and visual cues requires cognitive resources in some participants. The data indicate that low lipreading ability or low WMC is associated with relatively effortful integration of auditory and visual information in noise.

SynFace—Speech-Driven Facial Animation for Virtual Speech-Reading Support

Directory of Open Access Journals (Sweden)

Giampiero Salvi

2009-01-01

Full Text Available This paper describes SynFace, a supportive technology that aims at enhancing audio-based spoken communication in adverse acoustic conditions by providing the missing visual information in the form of an animated talking head. Firstly, we describe the system architecture, consisting of a 3D animated face model controlled from the speech input by a specifically optimised phonetic recogniser. Secondly, we report on speech intelligibility experiments with focus on multilinguality and robustness to audio quality. The system, already available for Swedish, English, and Flemish, was optimised for German and for Swedish wide-band speech quality available in TV, radio, and Internet communication. Lastly, the paper covers experiments with nonverbal motions driven from the speech signal. It is shown that turn-taking gestures can be used to affect the flow of human-human dialogues. We have focused specifically on two categories of cues that may be extracted from the acoustic signal: prominence/emphasis and interactional cues (turn-taking/back-channelling.
Visual-Auditory Integration during Speech Imitation in Autism

Science.gov (United States)

Williams, Justin H. G.; Massaro, Dominic W.; Peel, Natalie J.; Bosseler, Alexis; Suddendorf, Thomas

2004-01-01

Children with autistic spectrum disorder (ASD) may have poor audio-visual integration, possibly reflecting dysfunctional "mirror neuron" systems which have been hypothesised to be at the core of the condition. In the present study, a computer program, utilizing speech synthesizer software and a "virtual" head (Baldi), delivered speech stimuli for…
An ALE meta-analysis on the audiovisual integration of speech signals.

Science.gov (United States)

Erickson, Laura C; Heeg, Elizabeth; Rauschecker, Josef P; Turkeltaub, Peter E

2014-11-01

The brain improves speech processing through the integration of audiovisual (AV) signals. Situations involving AV speech integration may be crudely dichotomized into those where auditory and visual inputs contain (1) equivalent, complementary signals (validating AV speech) or (2) inconsistent, different signals (conflicting AV speech). This simple framework may allow the systematic examination of broad commonalities and differences between AV neural processes engaged by various experimental paradigms frequently used to study AV speech integration. We conducted an activation likelihood estimation metaanalysis of 22 functional imaging studies comprising 33 experiments, 311 subjects, and 347 foci examining "conflicting" versus "validating" AV speech. Experimental paradigms included content congruency, timing synchrony, and perceptual measures, such as the McGurk effect or synchrony judgments, across AV speech stimulus types (sublexical to sentence). Colocalization of conflicting AV speech experiments revealed consistency across at least two contrast types (e.g., synchrony and congruency) in a network of dorsal stream regions in the frontal, parietal, and temporal lobes. There was consistency across all contrast types (synchrony, congruency, and percept) in the bilateral posterior superior/middle temporal cortex. Although fewer studies were available, validating AV speech experiments were localized to other regions, such as ventral stream visual areas in the occipital and inferior temporal cortex. These results suggest that while equivalent, complementary AV speech signals may evoke activity in regions related to the corroboration of sensory input, conflicting AV speech signals recruit widespread dorsal stream areas likely involved in the resolution of conflicting sensory signals. Copyright © 2014 Wiley Periodicals, Inc.
Gesture and Speech Integration: An Exploratory Study of a Man with Aphasia

Science.gov (United States)

Cocks, Naomi; Sautin, Laetitia; Kita, Sotaro; Morgan, Gary; Zlotowitz, Sally

2009-01-01

Background: In order to comprehend fully a speaker's intention in everyday communication, information is integrated from multiple sources, including gesture and speech. There are no published studies that have explored the impact of aphasia on iconic co-speech gesture and speech integration. Aims: To explore the impact of aphasia on co-speech…
Audio-Visual Speech Perception: A Developmental ERP Investigation

Science.gov (United States)

Knowland, Victoria C. P.; Mercure, Evelyne; Karmiloff-Smith, Annette; Dick, Fred; Thomas, Michael S. C.

2014-01-01

Being able to see a talking face confers a considerable advantage for speech perception in adulthood. However, behavioural data currently suggest that children fail to make full use of these available visual speech cues until age 8 or 9. This is particularly surprising given the potential utility of multiple informational cues during language…
Speech recognition by means of a three-integrated-circuit set

Energy Technology Data Exchange (ETDEWEB)

Zoicas, A.

1983-11-03

The author uses pattern recognition methods for detecting word boundaries, and monitors incoming speech at 12 millisecond intervals. Frequency is divided into eight bands and analysis is achieved in an analogue interface integrated circuit, a pipeline digital processor and a control integrated circuit. Applications are suggested, including speech input to personal computers. 3 references.
Familiar units prevail over statistical cues in word segmentation.

Science.gov (United States)

Poulin-Charronnat, Bénédicte; Perruchet, Pierre; Tillmann, Barbara; Peereman, Ronald

2017-09-01

In language acquisition research, the prevailing position is that listeners exploit statistical cues, in particular transitional probabilities between syllables, to discover words of a language. However, other cues are also involved in word discovery. Assessing the weight learners give to these different cues leads to a better understanding of the processes underlying speech segmentation. The present study evaluated whether adult learners preferentially used known units or statistical cues for segmenting continuous speech. Before the exposure phase, participants were familiarized with part-words of a three-word artificial language. This design allowed the dissociation of the influence of statistical cues and familiar units, with statistical cues favoring word segmentation and familiar units favoring (nonoptimal) part-word segmentation. In Experiment 1, performance in a two-alternative forced choice (2AFC) task between words and part-words revealed part-word segmentation (even though part-words were less cohesive in terms of transitional probabilities and less frequent than words). By contrast, an unfamiliarized group exhibited word segmentation, as usually observed in standard conditions. Experiment 2 used a syllable-detection task to remove the likely contamination of performance by memory and strategy effects in the 2AFC task. Overall, the results suggest that familiar units overrode statistical cues, ultimately questioning the need for computation mechanisms of transitional probabilities (TPs) in natural language speech segmentation.
Audiovisual integration of speech falters under high attention demands.

Science.gov (United States)

Alsius, Agnès; Navarra, Jordi; Campbell, Ruth; Soto-Faraco, Salvador

2005-05-10

One of the most commonly cited examples of human multisensory integration occurs during exposure to natural speech, when the vocal and the visual aspects of the signal are integrated in a unitary percept. Audiovisual association of facial gestures and vocal sounds has been demonstrated in nonhuman primates and in prelinguistic children, arguing for a general basis for this capacity. One critical question, however, concerns the role of attention in such multisensory integration. Although both behavioral and neurophysiological studies have converged on a preattentive conceptualization of audiovisual speech integration, this mechanism has rarely been measured under conditions of high attentional load, when the observers' attention resources are depleted. We tested the extent to which audiovisual integration was modulated by the amount of available attentional resources by measuring the observers' susceptibility to the classic McGurk illusion in a dual-task paradigm. The proportion of visually influenced responses was severely, and selectively, reduced if participants were concurrently performing an unrelated visual or auditory task. In contrast with the assumption that crossmodal speech integration is automatic, our results suggest that these multisensory binding processes are subject to attentional demands.
A Causal Inference Model Explains Perception of the McGurk Effect and Other Incongruent Audiovisual Speech.

Directory of Open Access Journals (Sweden)

John F Magnotti

2017-02-01

Full Text Available Audiovisual speech integration combines information from auditory speech (talker's voice and visual speech (talker's mouth movements to improve perceptual accuracy. However, if the auditory and visual speech emanate from different talkers, integration decreases accuracy. Therefore, a key step in audiovisual speech perception is deciding whether auditory and visual speech have the same source, a process known as causal inference. A well-known illusion, the McGurk Effect, consists of incongruent audiovisual syllables, such as auditory "ba" + visual "ga" (AbaVga, that are integrated to produce a fused percept ("da". This illusion raises two fundamental questions: first, given the incongruence between the auditory and visual syllables in the McGurk stimulus, why are they integrated; and second, why does the McGurk effect not occur for other, very similar syllables (e.g., AgaVba. We describe a simplified model of causal inference in multisensory speech perception (CIMS that predicts the perception of arbitrary combinations of auditory and visual speech. We applied this model to behavioral data collected from 60 subjects perceiving both McGurk and non-McGurk incongruent speech stimuli. The CIMS model successfully predicted both the audiovisual integration observed for McGurk stimuli and the lack of integration observed for non-McGurk stimuli. An identical model without causal inference failed to accurately predict perception for either form of incongruent speech. The CIMS model uses causal inference to provide a computational framework for studying how the brain performs one of its most important tasks, integrating auditory and visual speech cues to allow us to communicate with others.
Late development of cue integration is linked to sensory fusion in cortex.

Science.gov (United States)

Dekker, Tessa M; Ban, Hiroshi; van der Velde, Bauke; Sereno, Martin I; Welchman, Andrew E; Nardini, Marko

2015-11-02

Adults optimize perceptual judgements by integrating different types of sensory information [1, 2]. This engages specialized neural circuits that fuse signals from the same [3-5] or different [6] modalities. Whereas young children can use sensory cues independently, adult-like precision gains from cue combination only emerge around ages 10 to 11 years [7-9]. Why does it take so long to make best use of sensory information? Existing data cannot distinguish whether this (1) reflects surprisingly late changes in sensory processing (sensory integration mechanisms in the brain are still developing) or (2) depends on post-perceptual changes (integration in sensory cortex is adult-like, but higher-level decision processes do not access the information) [10]. We tested visual depth cue integration in the developing brain to distinguish these possibilities. We presented children aged 6-12 years with displays depicting depth from binocular disparity and relative motion and made measurements using psychophysics, retinotopic mapping, and pattern classification fMRI. Older children (>10.5 years) showed clear evidence for sensory fusion in V3B, a visual area thought to integrate depth cues in the adult brain [3-5]. By contrast, in younger children (develop. Copyright © 2015 The Authors. Published by Elsevier Ltd.. All rights reserved.
Toward Speech and Nonverbal Behaviors Integration for Humanoid Robot

Directory of Open Access Journals (Sweden)

Wei Wang

2012-09-01

Full Text Available It is essential to integrate speeches and nonverbal behaviors for a humanoid robot in human-robot interaction. This paper presents an approach using multi-object genetic algorithm to match the speeches and behaviors automatically. Firstly, with humanoid robot's emotion status, we construct a hierarchical structure to link voice characteristics and nonverbal behaviors. Secondly, these behaviors corresponding to speeches are matched and integrated into an action sequence based on genetic algorithm, so the robot can consistently speak and perform emotional behaviors. Our approach takes advantage of relevant knowledge described by psychologists and nonverbal communication. And from experiment results, our ultimate goal, implementing an affective robot to act and speak with partners vividly and fluently, could be achieved.
Effects of noise and audiovisual cues on speech processing in adults with and without ADHD.

Science.gov (United States)

Michalek, Anne M P; Watson, Silvana M; Ash, Ivan; Ringleb, Stacie; Raymer, Anastasia

2014-03-01

This study examined the interplay among internal (e.g. attention, working memory abilities) and external (e.g. background noise, visual information) factors in individuals with and without ADHD. A 2 × 2 × 6 mixed design with correlational analyses was used to compare participant results on a standardized listening in noise sentence repetition task (QuickSin; Killion et al, 2004 ), presented in an auditory and an audiovisual condition as signal-to-noise ratio (SNR) varied from 25-0 dB and to determine individual differences in working memory capacity and short-term recall. Thirty-eight young adults without ADHD and twenty-five young adults with ADHD. Diagnosis, modality, and signal-to-noise ratio all affected the ability to process speech in noise. The interaction between the diagnosis of ADHD, the presence of visual cues, and the level of noise had an effect on a person's ability to process speech in noise. conclusion: Young adults with ADHD benefited less from visual information during noise than young adults without ADHD, an effect influenced by working memory abilities.
Acoustic Constraints and Musical Consequences: Exploring Composers' Use of Cues for Musical Emotion

Science.gov (United States)

Schutz, Michael

2017-01-01

Emotional communication in music is based in part on the use of pitch and timing, two cues effective in emotional speech. Corpus analyses of natural speech illustrate that happy utterances tend to be higher and faster than sad. Although manipulations altering melodies show that passages changed to be higher and faster sound happier, corpus analyses of unaltered music paralleling those of natural speech have proven challenging. This partly reflects the importance of modality (i.e., major/minor), a powerful musical cue whose use is decidedly imbalanced in Western music. This imbalance poses challenges for creating musical corpora analogous to existing speech corpora for purposes of analyzing emotion. However, a novel examination of music by Bach and Chopin balanced in modality illustrates that, consistent with predictions from speech, their major key (nominally “happy”) pieces are approximately a major second higher and 29% faster than their minor key pieces (Poon and Schutz, 2015). Although this provides useful evidence for parallels in use of emotional cues between these domains, it raises questions about how composers “trade off” cue differentiation in music, suggesting interesting new potential research directions. This Focused Review places those results in a broader context, highlighting their connections with previous work on the natural use of cues for musical emotion. Together, these observational findings based on unaltered music—widely recognized for its artistic significance—complement previous experimental work systematically manipulating specific parameters. In doing so, they also provide a useful musical counterpart to fruitful studies of the acoustic cues for emotion found in natural speech. PMID:29249997
Acoustic Constraints and Musical Consequences: Exploring Composers' Use of Cues for Musical Emotion.

Science.gov (United States)

Schutz, Michael

2017-01-01

Emotional communication in music is based in part on the use of pitch and timing, two cues effective in emotional speech. Corpus analyses of natural speech illustrate that happy utterances tend to be higher and faster than sad. Although manipulations altering melodies show that passages changed to be higher and faster sound happier, corpus analyses of unaltered music paralleling those of natural speech have proven challenging. This partly reflects the importance of modality (i.e., major/minor), a powerful musical cue whose use is decidedly imbalanced in Western music. This imbalance poses challenges for creating musical corpora analogous to existing speech corpora for purposes of analyzing emotion. However, a novel examination of music by Bach and Chopin balanced in modality illustrates that, consistent with predictions from speech, their major key (nominally "happy") pieces are approximately a major second higher and 29% faster than their minor key pieces (Poon and Schutz, 2015). Although this provides useful evidence for parallels in use of emotional cues between these domains, it raises questions about how composers "trade off" cue differentiation in music, suggesting interesting new potential research directions. This Focused Review places those results in a broader context, highlighting their connections with previous work on the natural use of cues for musical emotion. Together, these observational findings based on unaltered music-widely recognized for its artistic significance-complement previous experimental work systematically manipulating specific parameters. In doing so, they also provide a useful musical counterpart to fruitful studies of the acoustic cues for emotion found in natural speech.
Acoustic Constraints and Musical Consequences: Exploring Composers' Use of Cues for Musical Emotion

Directory of Open Access Journals (Sweden)

Michael Schutz

2017-11-01

Full Text Available Emotional communication in music is based in part on the use of pitch and timing, two cues effective in emotional speech. Corpus analyses of natural speech illustrate that happy utterances tend to be higher and faster than sad. Although manipulations altering melodies show that passages changed to be higher and faster sound happier, corpus analyses of unaltered music paralleling those of natural speech have proven challenging. This partly reflects the importance of modality (i.e., major/minor, a powerful musical cue whose use is decidedly imbalanced in Western music. This imbalance poses challenges for creating musical corpora analogous to existing speech corpora for purposes of analyzing emotion. However, a novel examination of music by Bach and Chopin balanced in modality illustrates that, consistent with predictions from speech, their major key (nominally “happy” pieces are approximately a major second higher and 29% faster than their minor key pieces (Poon and Schutz, 2015. Although this provides useful evidence for parallels in use of emotional cues between these domains, it raises questions about how composers “trade off” cue differentiation in music, suggesting interesting new potential research directions. This Focused Review places those results in a broader context, highlighting their connections with previous work on the natural use of cues for musical emotion. Together, these observational findings based on unaltered music—widely recognized for its artistic significance—complement previous experimental work systematically manipulating specific parameters. In doing so, they also provide a useful musical counterpart to fruitful studies of the acoustic cues for emotion found in natural speech.
Exploring the role of brain oscillations in speech perception in noise: Intelligibility of isochronously retimed speech

Directory of Open Access Journals (Sweden)

Vincent Aubanel

2016-08-01

Full Text Available A growing body of evidence shows that brain oscillations track speech. This mechanism is thought to maximise processing efficiency by allocating resources to important speech information, effectively parsing speech into units of appropriate granularity for further decoding. However, some aspects of this mechanism remain unclear. First, while periodicity is an intrinsic property of this physiological mechanism, speech is only quasi-periodic, so it is not clear whether periodicity would present an advantage in processing. Second, it is still a matter of debate which aspect of speech triggers or maintains cortical entrainment, from bottom-up cues such as fluctuations of the amplitude envelope of speech to higher level linguistic cues such as syntactic structure. We present data from a behavioural experiment assessing the effect of isochronous retiming of speech on speech perception in noise. Two types of anchor points were defined for retiming speech, namely syllable onsets and amplitude envelope peaks. For each anchor point type, retiming was implemented at two hierarchical levels, a slow time scale around 2.5 Hz and a fast time scale around 4 Hz. Results show that while any temporal distortion resulted in reduced speech intelligibility, isochronous speech anchored to P-centers (approximated by stressed syllable vowel onsets was significantly more intelligible than a matched anisochronous retiming, suggesting a facilitative role of periodicity defined on linguistically motivated units in processing speech in noise.
Multisensory integration: the case of a time window of gesture-speech integration.

Science.gov (United States)

Obermeier, Christian; Gunter, Thomas C

2015-02-01

This experiment investigates the integration of gesture and speech from a multisensory perspective. In a disambiguation paradigm, participants were presented with short videos of an actress uttering sentences like "She was impressed by the BALL, because the GAME/DANCE...." The ambiguous noun (BALL) was accompanied by an iconic gesture fragment containing information to disambiguate the noun toward its dominant or subordinate meaning. We used four different temporal alignments between noun and gesture fragment: the identification point (IP) of the noun was either prior to (+120 msec), synchronous with (0 msec), or lagging behind the end of the gesture fragment (-200 and -600 msec). ERPs triggered to the IP of the noun showed significant differences for the integration of dominant and subordinate gesture fragments in the -200, 0, and +120 msec conditions. The outcome of this integration was revealed at the target words. These data suggest a time window for direct semantic gesture-speech integration ranging from at least -200 up to +120 msec. Although the -600 msec condition did not show any signs of direct integration at the homonym, significant disambiguation was found at the target word. An explorative analysis suggested that gesture information was directly integrated at the verb, indicating that there are multiple positions in a sentence where direct gesture-speech integration takes place. Ultimately, this would implicate that in natural communication, where a gesture lasts for some time, several aspects of that gesture will have their specific and possibly distinct impact on different positions in an utterance.
Electrophysiological evidence for a self-processing advantage during audiovisual speech integration.

Science.gov (United States)

Treille, Avril; Vilain, Coriandre; Kandel, Sonia; Sato, Marc

2017-09-01

Previous electrophysiological studies have provided strong evidence for early multisensory integrative mechanisms during audiovisual speech perception. From these studies, one unanswered issue is whether hearing our own voice and seeing our own articulatory gestures facilitate speech perception, possibly through a better processing and integration of sensory inputs with our own sensory-motor knowledge. The present EEG study examined the impact of self-knowledge during the perception of auditory (A), visual (V) and audiovisual (AV) speech stimuli that were previously recorded from the participant or from a speaker he/she had never met. Audiovisual interactions were estimated by comparing N1 and P2 auditory evoked potentials during the bimodal condition (AV) with the sum of those observed in the unimodal conditions (A + V). In line with previous EEG studies, our results revealed an amplitude decrease of P2 auditory evoked potentials in AV compared to A + V conditions. Crucially, a temporal facilitation of N1 responses was observed during the visual perception of self speech movements compared to those of another speaker. This facilitation was negatively correlated with the saliency of visual stimuli. These results provide evidence for a temporal facilitation of the integration of auditory and visual speech signals when the visual situation involves our own speech gestures.
Integrated Phoneme Subspace Method for Speech Feature Extraction

Directory of Open Access Journals (Sweden)

Park Hyunsin

2009-01-01

Full Text Available Speech feature extraction has been a key focus in robust speech recognition research. In this work, we discuss data-driven linear feature transformations applied to feature vectors in the logarithmic mel-frequency filter bank domain. Transformations are based on principal component analysis (PCA, independent component analysis (ICA, and linear discriminant analysis (LDA. Furthermore, this paper introduces a new feature extraction technique that collects the correlation information among phoneme subspaces and reconstructs feature space for representing phonemic information efficiently. The proposed speech feature vector is generated by projecting an observed vector onto an integrated phoneme subspace (IPS based on PCA or ICA. The performance of the new feature was evaluated for isolated word speech recognition. The proposed method provided higher recognition accuracy than conventional methods in clean and reverberant environments.
Sequence analysis in multilevel models. A study on different sources of patient cues in medical consultations.

Science.gov (United States)

Del Piccolo, Lidia; Mazzi, Maria Angela; Dunn, Graham; Sandri, Marco; Zimmermann, Christa

2007-12-01

The aims of the study were to explore the importance of macro (patient, physician, consultation) and micro (doctor-patient speech sequences) variables in promoting patient cues (unsolicited new information or expressions of feelings), and to describe the methodological implications related to the study of speech sequences. Patient characteristics, a consultation index of partnership and doctor-patient speech sequences were recorded for 246 primary care consultations in six primary care surgeries in Verona, Italy. Homogeneity and stationarity conditions of speech sequences allowed the creation of a hierarchy of multilevel logit models including micro and macro level variables, with the presence/absence of cues as the dependent variable. We found that emotional distress of the patient increased cues and that cues appeared among other patient expressions and were preceded by physicians' facilitations and handling of emotion. Partnership, in terms of open-ended inquiry, active listening skills and handling of emotion by the physician and active participation by the patient throughout the consultation, reduced cue frequency.

Effects of a brisk walk on blood pressure responses to the Stroop, a speech task and a smoking cue among temporarily abstinent smokers.

Science.gov (United States)

Taylor, Adrian; Katomeri, Magdalena

2006-01-01

A review and meta-analysis by Hamer et al. (2006) showed that a single session of exercise can attenuate post-exercise blood pressure (BP) responses to stress, but no studies examined the effects among smokers or with brisk walking. Healthy volunteers (n=60), averaging 28 years of age and smoking 15 cigarettes daily, abstained from smoking for 2 h before being randomly assigned to a 15-min brisk semi-self-paced walk or passive control condition. Subject characteristics, typical smoking cue-elicited cravings and BP were assessed at baseline. After each condition, BP was assessed before and after three psycho-social stressors were carried out: (1) computerised Stroop word-colour interference task, (2) speech task and (3) only handling a lit cigarette. A two-way mixed ANCOVA (controlling for baseline) revealed a significant overall interaction effect for time by condition for both systolic blood pressure (SBP) and diastolic blood pressure (DBP). Univariate ANCOVAs (to compare between-groups post-stressor BP, controlling for pre-stressor BP) revealed that exercise attenuated systolic BP and diastolic BP responses to the Stroop and speech tasks and SBP to the lit cigarette equivalent to an attenuated SBP and DBP of up to 3.8 mmHg. Post-exercise attenuation effects were moderated by resting blood pressure and self-reported smoking cue-elicited craving. Effects were strongest among those with higher blood pressure and smokers who reported typically stronger cravings when faced with smoking cues. Blood pressure responses to the lit cigarette were not associated with responses to the Stroop and speech task. A self-paced 15-min walk can reduce smokers' SBP and DBP responses to stress, of a magnitude similar on average to non-smokers.
Integration of asynchronous knowledge sources in a novel speech recognition framework

OpenAIRE

Van hamme, Hugo

2008-01-01

Van hamme H., ''Integration of asynchronous knowledge sources in a novel speech recognition framework'', Proceedings ITRW on speech analysis and processing for knowledge discovery, 4 pp., June 2008, Aalborg, Denmark.
Cue integration vs. exemplar-based reasoning in multi-attribute decisions from memory

Directory of Open Access Journals (Sweden)

Arndt Broeder

2010-08-01

Full Text Available Inferences about target variables can be achieved by deliberate integration of probabilistic cues or by retrieving similar cue-patterns (exemplars from memory. In tasks with cue information presented in on-screen displays, rule-based strategies tend to dominate unless the abstraction of cue-target relations is unfeasible. This dominance has also been demonstrated --- surprisingly --- in experiments that demanded the retrieval of cue values from memory (M. Persson and J. Rieskamp, 2009. In three modified replications involving a fictitious disease, binary cue values were represented either by alternative symptoms (e.g., fever vs. hypothermia or by symptom presence vs. absence (e.g., fever vs. no fever. The former representation might hinder cue abstraction. The cues were predictive of the severity of the disease, and participants had to infer in each trial who of two patients was sicker. Both experiments replicated the rule-dominance with present-absent cues but yielded higher percentages of exemplar-based strategies with alternative cues. The experiments demonstrate that a change in cue representation may induce a dramatic shift from rule-based to exemplar-based reasoning in formally identical tasks.
Deficits in audiovisual speech perception in normal aging emerge at the level of whole-word recognition.

Science.gov (United States)

Stevenson, Ryan A; Nelms, Caitlin E; Baum, Sarah H; Zurkovsky, Lilia; Barense, Morgan D; Newhouse, Paul A; Wallace, Mark T

2015-01-01

Over the next 2 decades, a dramatic shift in the demographics of society will take place, with a rapid growth in the population of older adults. One of the most common complaints with healthy aging is a decreased ability to successfully perceive speech, particularly in noisy environments. In such noisy environments, the presence of visual speech cues (i.e., lip movements) provide striking benefits for speech perception and comprehension, but previous research suggests that older adults gain less from such audiovisual integration than their younger peers. To determine at what processing level these behavioral differences arise in healthy-aging populations, we administered a speech-in-noise task to younger and older adults. We compared the perceptual benefits of having speech information available in both the auditory and visual modalities and examined both phoneme and whole-word recognition across varying levels of signal-to-noise ratio. For whole-word recognition, older adults relative to younger adults showed greater multisensory gains at intermediate SNRs but reduced benefit at low SNRs. By contrast, at the phoneme level both younger and older adults showed approximately equivalent increases in multisensory gain as signal-to-noise ratio decreased. Collectively, the results provide important insights into both the similarities and differences in how older and younger adults integrate auditory and visual speech cues in noisy environments and help explain some of the conflicting findings in previous studies of multisensory speech perception in healthy aging. These novel findings suggest that audiovisual processing is intact at more elementary levels of speech perception in healthy-aging populations and that deficits begin to emerge only at the more complex word-recognition level of speech signals. Copyright © 2015 Elsevier Inc. All rights reserved.
Is Birdsong More Like Speech or Music?

Science.gov (United States)

Shannon, Robert V

2016-04-01

Music and speech share many acoustic cues but not all are equally important. For example, harmonic pitch is essential for music but not for speech. When birds communicate is their song more like speech or music? A new study contrasting pitch and spectral patterns shows that birds perceive their song more like humans perceive speech. Copyright © 2016 Elsevier Ltd. All rights reserved.
Integrating speech in time depends on temporal expectancies and attention.

Science.gov (United States)

Scharinger, Mathias; Steinberg, Johanna; Tavano, Alessandro

2017-08-01

Sensory information that unfolds in time, such as in speech perception, relies on efficient chunking mechanisms in order to yield optimally-sized units for further processing. Whether or not two successive acoustic events receive a one-unit or a two-unit interpretation seems to depend on the fit between their temporal extent and a stipulated temporal window of integration. However, there is ongoing debate on how flexible this temporal window of integration should be, especially for the processing of speech sounds. Furthermore, there is no direct evidence of whether attention may modulate the temporal constraints on the integration window. For this reason, we here examine how different word durations, which lead to different temporal separations of sound onsets, interact with attention. In an Electroencephalography (EEG) study, participants actively and passively listened to words where word-final consonants were occasionally omitted. Words had either a natural duration or were artificially prolonged in order to increase the separation of speech sound onsets. Omission responses to incomplete speech input, originating in left temporal cortex, decreased when the critical speech sound was separated from previous sounds by more than 250 msec, i.e., when the separation was larger than the stipulated temporal window of integration (125-150 msec). Attention, on the other hand, only increased omission responses for stimuli with natural durations. We complemented the event-related potential (ERP) analyses by a frequency-domain analysis on the stimulus presentation rate. Notably, the power of stimulation frequency showed the same duration and attention effects than the omission responses. We interpret these findings on the background of existing research on temporal integration windows and further suggest that our findings may be accounted for within the framework of predictive coding. Copyright © 2017 Elsevier Ltd. All rights reserved.
Encoding Specificity and Nonverbal Cue Context: An Expansion of Episodic Memory Research.

Science.gov (United States)

Woodall, W. Gill; Folger, Joseph P.

1981-01-01

Reports two studies demonstrating the ability of nonverbal contextual cues to act as retrieval mechanisms for co-occurring language. Suggests that visual contextual cues, such as speech primacy and motor primacy gestures, can access linguistic target information. Motor primacy cues are shown to act as stronger retrieval cues. (JMF)
Working Memory and Speech Recognition in Noise under Ecologically Relevant Listening Conditions: Effects of Visual Cues and Noise Type among Adults with Hearing Loss

Science.gov (United States)

Miller, Christi W.; Stewart, Erin K.; Wu, Yu-Hsiang; Bishop, Christopher; Bentler, Ruth A.; Tremblay, Kelly

2017-01-01

Purpose: This study evaluated the relationship between working memory (WM) and speech recognition in noise with different noise types as well as in the presence of visual cues. Method: Seventy-six adults with bilateral, mild to moderately severe sensorineural hearing loss (mean age: 69 years) participated. Using a cross-sectional design, 2…
Testing the influence of external and internal cues on smoking motivation using a community sample.

Science.gov (United States)

Litvin, Erika B; Brandon, Thomas H

2010-02-01

Exposing smokers to either external cues (e.g., pictures of cigarettes) or internal cues (e.g., negative affect induction) can induce urge to smoke and other behavioral and physiological responses. However, little is known about whether the two types of cues interact when presented in close proximity, as is likely the case in the real word. Additionally, potential moderators of cue reactivity have rarely been examined. Finally, few cue-reactivity studies have used representative samples of smokers. In a randomized 2 x 2 crossed factorial between-subjects design, the current study tested the effects of a negative affect cue intended to produce anxiety (speech preparation task) and an external smoking cue on urge and behavioral reactivity in a community sample of adult smokers (N = 175), and whether trait impulsivity moderated the effects. Both types of cues produced main effects on urges to smoke, despite the speech task failing to increase anxiety significantly. The speech task increased smoking urge related to anticipation of negative affect relief, whereas the external smoking cues increased urges related to anticipation of pleasure; however, the cues did not interact. Impulsivity measures predicted urge and other smoking-related variables, but did not moderate cue-reactivity. Results suggest independent rather than synergistic effects of these contributors to smoking motivation. (PsycINFO Database Record (c) 2010 APA, all rights reserved).
Automatic discrimination between laughter and speech

NARCIS (Netherlands)

Truong, K.; Leeuwen, D. van

2007-01-01

Emotions can be recognized by audible paralinguistic cues in speech. By detecting these paralinguistic cues that can consist of laughter, a trembling voice, coughs, changes in the intonation contour etc., information about the speakers state and emotion can be revealed. This paper describes the
Causal inference and temporal predictions in audiovisual perception of speech and music.

Science.gov (United States)

Noppeney, Uta; Lee, Hwee Ling

2018-03-31

To form a coherent percept of the environment, the brain must integrate sensory signals emanating from a common source but segregate those from different sources. Temporal regularities are prominent cues for multisensory integration, particularly for speech and music perception. In line with models of predictive coding, we suggest that the brain adapts an internal model to the statistical regularities in its environment. This internal model enables cross-sensory and sensorimotor temporal predictions as a mechanism to arbitrate between integration and segregation of signals from different senses. © 2018 New York Academy of Sciences.
Turn-taking cue delays in human-robot communication

NARCIS (Netherlands)

Cuijpers, R. H.; Van Den Goor, V. J.P.

2017-01-01

Fluent communication between a human and a robot relies on the use of effective turn-taking cues. In human speech staying silent after a sequence of utterances is usually accompanied by an explicit turnyielding cue to signal the end of a turn. Here we study the effect of the timing of four
Interactional convergence in conversational storytelling: when reported speech is a cue of alignment and/or affiliation

Directory of Open Access Journals (Sweden)

Mathilde eGuardiola

2013-10-01

Full Text Available This paper investigates how and when interactional convergence is established by participants in conversation. We analyze sequences of storytelling using an original method which combines Conversation Analysis and a corpus-based approach. In storytelling, the participant in the position of listener is expected to produce either generic or specific responses adapted to the storyteller’s narrative. The listener’s behavior produced within the current activity, is a cue of his or her interactional alignment. We show here that the listener can produce a specific type of (aligned response which we term a reported speech utterance in echo. The participant who is not telling the story is nonetheless able to animate the characters, while reversing the usual asymmetric roles of storyteller and listener. The use of this device is a way for the listener to display a stance toward the events told by the storyteller. If the listener’s stance is congruent with that of the storyteller, this reveals a high degree of affiliation between the participants. We present seventeen extracts from a collection of 94 instances of echo reported speech which we examined using the concepts of alignment and affiliation in order to show how different kinds of convergent sequences are constructed. We demonstrate that this phenomenon is mainly used by the listener in order to align and affiliate with the storyteller by means of reformulative or overbidding Echo Reported Speech. We also show that in affiliative sequences, reported speech can be used by the listener in a humorous way in order to temporarily disalign. This disalignment constitutes a potential starting point for an oblique sequence, which, if accepted and continued by the storyteller, gives rise to a highly convergent sequence.
Psychoacoustic cues to emotion in speech prosody and music.

Science.gov (United States)

Coutinho, Eduardo; Dibben, Nicola

2013-01-01

There is strong evidence of shared acoustic profiles common to the expression of emotions in music and speech, yet relatively limited understanding of the specific psychoacoustic features involved. This study combined a controlled experiment and computational modelling to investigate the perceptual codes associated with the expression of emotion in the acoustic domain. The empirical stage of the study provided continuous human ratings of emotions perceived in excerpts of film music and natural speech samples. The computational stage created a computer model that retrieves the relevant information from the acoustic stimuli and makes predictions about the emotional expressiveness of speech and music close to the responses of human subjects. We show that a significant part of the listeners' second-by-second reported emotions to music and speech prosody can be predicted from a set of seven psychoacoustic features: loudness, tempo/speech rate, melody/prosody contour, spectral centroid, spectral flux, sharpness, and roughness. The implications of these results are discussed in the context of cross-modal similarities in the communication of emotion in the acoustic domain.
Validating a Method to Assess Lipreading, Audiovisual Gain, and Integration During Speech Reception With Cochlear-Implanted and Normal-Hearing Subjects Using a Talking Head.

Science.gov (United States)

Schreitmüller, Stefan; Frenken, Miriam; Bentz, Lüder; Ortmann, Magdalene; Walger, Martin; Meister, Hartmut

benefitted by AV over unimodal speech as indexed by calculations of the measures visual enhancement and auditory enhancement (each p < 0.001). Both groups efficiently integrated complementary auditory and visual speech features as indexed by calculations of the measure integration enhancement (each p < 0.005). Given the good agreement between results from literature and the outcome of supplementing an existing validated auditory test with synthetic visual cues, the introduced method can be considered an interesting candidate for clinical and scientific applications to assess measures important for AV SR in a standardized manner. This could be beneficial for optimizing the diagnosis and treatment of individual listening and communication disorders, such as cochlear implantation.
What Information Is Necessary for Speech Categorization? Harnessing Variability in the Speech Signal by Integrating Cues Computed Relative to Expectations

Science.gov (United States)

McMurray, Bob; Jongman, Allard

2011-01-01

Most theories of categorization emphasize how continuous perceptual information is mapped to categories. However, equally important are the informational assumptions of a model, the type of information subserving this mapping. This is crucial in speech perception where the signal is variable and context dependent. This study assessed the…
Functional neuroanatomy of gesture-speech integration in children varies with individual differences in gesture processing.

Science.gov (United States)

Demir-Lira, Özlem Ece; Asaridou, Salomi S; Raja Beharelle, Anjali; Holt, Anna E; Goldin-Meadow, Susan; Small, Steven L

2018-03-08

Gesture is an integral part of children's communicative repertoire. However, little is known about the neurobiology of speech and gesture integration in the developing brain. We investigated how 8- to 10-year-old children processed gesture that was essential to understanding a set of narratives. We asked whether the functional neuroanatomy of gesture-speech integration varies as a function of (1) the content of speech, and/or (2) individual differences in how gesture is processed. When gestures provided missing information not present in the speech (i.e., disambiguating gesture; e.g., "pet" + flapping palms = bird), the presence of gesture led to increased activity in inferior frontal gyri, the right middle temporal gyrus, and the left superior temporal gyrus, compared to when gesture provided redundant information (i.e., reinforcing gesture; e.g., "bird" + flapping palms = bird). This pattern of activation was found only in children who were able to successfully integrate gesture and speech behaviorally, as indicated by their performance on post-test story comprehension questions. Children who did not glean meaning from gesture did not show differential activation across the two conditions. Our results suggest that the brain activation pattern for gesture-speech integration in children overlaps with-but is broader than-the pattern in adults performing the same task. Overall, our results provide a possible neurobiological mechanism that could underlie children's increasing ability to integrate gesture and speech over childhood, and account for individual differences in that integration. © 2018 John Wiley & Sons Ltd.
Detection vs. selection: integration of genetic, epigenetic and environmental cues in fluctuating environments.

Science.gov (United States)

McNamara, John M; Dall, Sasha R X; Hammerstein, Peter; Leimar, Olof

2016-10-01

There are many inputs during development that influence an organism's fit to current or upcoming environments. These include genetic effects, transgenerational epigenetic influences, environmental cues and developmental noise, which are rarely investigated in the same formal framework. We study an analytically tractable evolutionary model, in which cues are integrated to determine mature phenotypes in fluctuating environments. Environmental cues received during development and by the mother as an adult act as detection-based (individually observed) cues. The mother's phenotype and a quantitative genetic effect act as selection-based cues (they correlate with environmental states after selection). We specify when such cues are complementary and tend to be used together, and when using the most informative cue will predominate. Thus, we extend recent analyses of the evolutionary implications of subsets of these effects by providing a general diagnosis of the conditions under which detection and selection-based influences on development are likely to evolve and coexist. © 2016 John Wiley & Sons Ltd/CNRS.
Improving Understanding of Emotional Speech Acoustic Content

Science.gov (United States)

Tinnemore, Anna

Children with cochlear implants show deficits in identifying emotional intent of utterances without facial or body language cues. A known limitation to cochlear implants is the inability to accurately portray the fundamental frequency contour of speech which carries the majority of information needed to identify emotional intent. Without reliable access to the fundamental frequency, other methods of identifying vocal emotion, if identifiable, could be used to guide therapies for training children with cochlear implants to better identify vocal emotion. The current study analyzed recordings of adults speaking neutral sentences with a set array of emotions in a child-directed and adult-directed manner. The goal was to identify acoustic cues that contribute to emotion identification that may be enhanced in child-directed speech, but are also present in adult-directed speech. Results of this study showed that there were significant differences in the variation of the fundamental frequency, the variation of intensity, and the rate of speech among emotions and between intended audiences.
Real-Time Lane Detection on Suburban Streets Using Visual Cue Integration

Directory of Open Access Journals (Sweden)

Shehan Fernando

2014-04-01

Full Text Available The detection of lane boundaries on suburban streets using images obtained from video constitutes a challenging task. This is mainly due to the difficulties associated with estimating the complex geometric structure of lane boundaries, the quality of lane markings as a result of wear, occlusions by traffic, and shadows caused by road-side trees and structures. Most of the existing techniques for lane boundary detection employ a single visual cue and will only work under certain conditions and where there are clear lane markings. Also, better results are achieved when there are no other on-road objects present. This paper extends our previous work and discusses a novel lane boundary detection algorithm specifically addressing the abovementioned issues through the integration of two visual cues. The first visual cue is based on stripe-like features found on lane lines extracted using a two-dimensional symmetric Gabor filter. The second visual cue is based on a texture characteristic determined using the entropy measure of the predefined neighbourhood around a lane boundary line. The visual cues are then integrated using a rule-based classifier which incorporates a modified sequential covering algorithm to improve robustness. To separate lane boundary lines from other similar features, a road mask is generated using road chromaticity values estimated from CIE L*a*b* colour transformation. Extraneous points around lane boundary lines are then removed by an outlier removal procedure based on studentized residuals. The lane boundary lines are then modelled with Bezier spline curves. To validate the algorithm, extensive experimental evaluation was carried out on suburban streets and the results are presented.

Neural dynamics of audiovisual speech integration under variable listening conditions: an individual participant analysis.

Science.gov (United States)

Altieri, Nicholas; Wenger, Michael J

2013-01-01

Speech perception engages both auditory and visual modalities. Limitations of traditional accuracy-only approaches in the investigation of audiovisual speech perception have motivated the use of new methodologies. In an audiovisual speech identification task, we utilized capacity (Townsend and Nozawa, 1995), a dynamic measure of efficiency, to quantify audiovisual integration. Capacity was used to compare RT distributions from audiovisual trials to RT distributions from auditory-only and visual-only trials across three listening conditions: clear auditory signal, S/N ratio of -12 dB, and S/N ratio of -18 dB. The purpose was to obtain EEG recordings in conjunction with capacity to investigate how a late ERP co-varies with integration efficiency. Results showed efficient audiovisual integration for low auditory S/N ratios, but inefficient audiovisual integration when the auditory signal was clear. The ERP analyses showed evidence for greater audiovisual amplitude compared to the unisensory signals for lower auditory S/N ratios (higher capacity/efficiency) compared to the high S/N ratio (low capacity/inefficient integration). The data are consistent with an interactive framework of integration, where auditory recognition is influenced by speech-reading as a function of signal clarity.
Zebra finches are sensitive to prosodic features of human speech.

Science.gov (United States)

Spierings, Michelle J; ten Cate, Carel

2014-07-22

Variation in pitch, amplitude and rhythm adds crucial paralinguistic information to human speech. Such prosodic cues can reveal information about the meaning or emphasis of a sentence or the emotional state of the speaker. To examine the hypothesis that sensitivity to prosodic cues is language independent and not human specific, we tested prosody perception in a controlled experiment with zebra finches. Using a go/no-go procedure, subjects were trained to discriminate between speech syllables arranged in XYXY patterns with prosodic stress on the first syllable and XXYY patterns with prosodic stress on the final syllable. To systematically determine the salience of the various prosodic cues (pitch, duration and amplitude) to the zebra finches, they were subjected to five tests with different combinations of these cues. The zebra finches generalized the prosodic pattern to sequences that consisted of new syllables and used prosodic features over structural ones to discriminate between stimuli. This strong sensitivity to the prosodic pattern was maintained when only a single prosodic cue was available. The change in pitch was treated as more salient than changes in the other prosodic features. These results show that zebra finches are sensitive to the same prosodic cues known to affect human speech perception. © 2014 The Author(s) Published by the Royal Society. All rights reserved.
Audio-visual speech perception in infants and toddlers with Down syndrome, fragile X syndrome, and Williams syndrome.

Science.gov (United States)

D'Souza, Dean; D'Souza, Hana; Johnson, Mark H; Karmiloff-Smith, Annette

2016-08-01

Typically-developing (TD) infants can construct unified cross-modal percepts, such as a speaking face, by integrating auditory-visual (AV) information. This skill is a key building block upon which higher-level skills, such as word learning, are built. Because word learning is seriously delayed in most children with neurodevelopmental disorders, we assessed the hypothesis that this delay partly results from a deficit in integrating AV speech cues. AV speech integration has rarely been investigated in neurodevelopmental disorders, and never previously in infants. We probed for the McGurk effect, which occurs when the auditory component of one sound (/ba/) is paired with the visual component of another sound (/ga/), leading to the perception of an illusory third sound (/da/ or /tha/). We measured AV integration in 95 infants/toddlers with Down, fragile X, or Williams syndrome, whom we matched on Chronological and Mental Age to 25 TD infants. We also assessed a more basic AV perceptual ability: sensitivity to matching vs. mismatching AV speech stimuli. Infants with Williams syndrome failed to demonstrate a McGurk effect, indicating poor AV speech integration. Moreover, while the TD children discriminated between matching and mismatching AV stimuli, none of the other groups did, hinting at a basic deficit or delay in AV speech processing, which is likely to constrain subsequent language development. Copyright © 2016 Elsevier Inc. All rights reserved.
A causal test of the motor theory of speech perception: a case of impaired speech production and spared speech perception.

Science.gov (United States)

Stasenko, Alena; Bonn, Cory; Teghipco, Alex; Garcea, Frank E; Sweet, Catherine; Dombovy, Mary; McDonough, Joyce; Mahon, Bradford Z

2015-01-01

The debate about the causal role of the motor system in speech perception has been reignited by demonstrations that motor processes are engaged during the processing of speech sounds. Here, we evaluate which aspects of auditory speech processing are affected, and which are not, in a stroke patient with dysfunction of the speech motor system. We found that the patient showed a normal phonemic categorical boundary when discriminating two non-words that differ by a minimal pair (e.g., ADA-AGA). However, using the same stimuli, the patient was unable to identify or label the non-word stimuli (using a button-press response). A control task showed that he could identify speech sounds by speaker gender, ruling out a general labelling impairment. These data suggest that while the motor system is not causally involved in perception of the speech signal, it may be used when other cues (e.g., meaning, context) are not available.
Using auditory-visual speech to probe the basis of noise-impaired consonant-vowel perception in dyslexia and auditory neuropathy

Science.gov (United States)

Ramirez, Joshua; Mann, Virginia

2005-08-01

Both dyslexics and auditory neuropathy (AN) subjects show inferior consonant-vowel (CV) perception in noise, relative to controls. To better understand these impairments, natural acoustic speech stimuli that were masked in speech-shaped noise at various intensities were presented to dyslexic, AN, and control subjects either in isolation or accompanied by visual articulatory cues. AN subjects were expected to benefit from the pairing of visual articulatory cues and auditory CV stimuli, provided that their speech perception impairment reflects a relatively peripheral auditory disorder. Assuming that dyslexia reflects a general impairment of speech processing rather than a disorder of audition, dyslexics were not expected to similarly benefit from an introduction of visual articulatory cues. The results revealed an increased effect of noise masking on the perception of isolated acoustic stimuli by both dyslexic and AN subjects. More importantly, dyslexics showed less effective use of visual articulatory cues in identifying masked speech stimuli and lower visual baseline performance relative to AN subjects and controls. Last, a significant positive correlation was found between reading ability and the ameliorating effect of visual articulatory cues on speech perception in noise. These results suggest that some reading impairments may stem from a central deficit of speech processing.
Visual speech information: a help or hindrance in perceptual processing of dysarthric speech.

Science.gov (United States)

Borrie, Stephanie A

2015-03-01

This study investigated the influence of visual speech information on perceptual processing of neurologically degraded speech. Fifty listeners identified spastic dysarthric speech under both audio (A) and audiovisual (AV) conditions. Condition comparisons revealed that the addition of visual speech information enhanced processing of the neurologically degraded input in terms of (a) acuity (percent phonemes correct) of vowels and consonants and (b) recognition (percent words correct) of predictive and nonpredictive phrases. Listeners exploited stress-based segmentation strategies more readily in AV conditions, suggesting that the perceptual benefit associated with adding visual speech information to the auditory signal-the AV advantage-has both segmental and suprasegmental origins. Results also revealed that the magnitude of the AV advantage can be predicted, to some degree, by the extent to which an individual utilizes syllabic stress cues to inform word recognition in AV conditions. Findings inform the development of a listener-specific model of speech perception that applies to processing of dysarthric speech in everyday communication contexts.
The influence of masker type on early reflection processing and speech intelligibility (L)

DEFF Research Database (Denmark)

Arweiler, Iris; Buchholz, Jörg M.; Dau, Torsten

2013-01-01

Arweiler and Buchholz [J. Acoust. Soc. Am. 130, 996-1005 (2011)] showed that, while the energy of early reflections (ERs) in a room improves speech intelligibility, the benefit is smaller than that provided by the energy of the direct sound (DS). In terms of integration of ERs and DS, binaural...... listening did not provide a benefit from ERs apart from a binaural energy summation, such that monaural auditory processing could account for the data. However, a diffuse speech shaped noise (SSN) was used in the speech intelligibility experiments, which does not provide distinct binaural cues...... to the auditory system. In the present study, the monaural and binaural benefit from ERs for speech intelligibility was investigated using three directional maskers presented from 90° azimuth: a SSN, a multi-talker babble, and a reversed two-talker masker. For normal-hearing as well as hearing-impaired listeners...
An analysis of machine translation and speech synthesis in speech-to-speech translation system

OpenAIRE

Hashimoto, K.; Yamagishi, J.; Byrne, W.; King, S.; Tokuda, K.

2011-01-01

This paper provides an analysis of the impacts of machine translation and speech synthesis on speech-to-speech translation systems. The speech-to-speech translation system consists of three components: speech recognition, machine translation and speech synthesis. Many techniques for integration of speech recognition and machine translation have been proposed. However, speech synthesis has not yet been considered. Therefore, in this paper, we focus on machine translation and speech synthesis, ...
Evaluating the effects of delivering integrated kinesthetic and tactile cues to individuals with unilateral hemiparetic stroke during overground walking.

Science.gov (United States)

Afzal, Muhammad Raheel; Pyo, Sanghun; Oh, Min-Kyun; Park, Young Sook; Yoon, Jungwon

2018-04-16

Integration of kinesthetic and tactile cues for application to post-stroke gait rehabilitation is a novel concept which needs to be explored. The combined provision of haptic cues may result in collective improvement of gait parameters such as symmetry, balance and muscle activation patterns. Our proposed integrated cue system can offer a cost-effective and voluntary gait training experience for rehabilitation of subjects with unilateral hemiparetic stroke. Ten post-stroke ambulatory subjects participated in a 10 m walking trial while utilizing the haptic cues (either alone or integrated application), at their preferred and increased gait speeds. In the system a haptic cane device (HCD) provided kinesthetic perception and a vibrotactile feedback device (VFD) provided tactile cue on the paretic leg for gait modification. Balance, gait symmetry and muscle activity were analyzed to identify the benefits of utilizing the proposed system. When using kinesthetic cues, either alone or integrated with a tactile cue, an increase in the percentage of non-paretic peak activity in the paretic muscles was observed at the preferred gait speed (vastus medialis obliquus: p kinesthetic cue, at their preferred gait speed (p < 0.001, partial η 2 = 0.702). When combining haptic cues, the subjects walked at their preferred gait speed with increased temporal stance symmetry and paretic muscle activity affecting their balance. Similar improvements were observed at higher gait speeds. The efficacy of the proposed system is influenced by gait speed. Improvements were observed at a 20% increased gait speed, whereas, a plateau effect was observed at a 40% increased gait speed. These results imply that integration of haptic cues may benefit post-stroke gait rehabilitation by inducing simultaneous improvements in gait symmetry and muscle activity.
Interactional convergence in conversational storytelling: when reported speech is a cue of alignment and/or affiliation.

Science.gov (United States)

Guardiola, Mathilde; Bertrand, Roxane

2013-01-01

This paper investigates how and when interactional convergence is established by participants in conversation. We analyze sequences of storytelling using an original method that combines Conversation Analysis and a corpus-based approach. In storytelling, the participant in the position of "listener" is expected to produce either generic or specific responses adapted to the storyteller's narrative. The listener's behavior produced within the current activity is a cue of his/her interactional alignment. We show here that the listener can produce a specific type of (aligned) response, which we term a reported speech utterance in echo. The participant who is not telling the story is nonetheless able to animate the characters, while reversing the usual asymmetric roles of storyteller and listener. The use of this device is a way for the listener to display his/her stance toward the events told by the storyteller. If the listener's stance is congruent with that of the storyteller, this reveals a high degree of affiliation between the participants. We present seventeen excerpts from a collection of 94 instances of Echo Reported Speech (ERS) which we examined using the concepts of alignment and affiliation in order to show how different kinds of convergent sequences are constructed. We demonstrate that this phenomenon is mainly used by the listener to align and affiliate with the storyteller by means of reformulative, enumerative, or overbidding ERS. We also show that in affiliative sequences, reported speech can be used by the listener in a humorous way in order to temporarily disalign. This disalignment constitutes a potential starting point for an oblique sequence, which, if accepted and continued by the storyteller, gives rise to a highly convergent sequence.
Transcranial Magnetic Stimulation over Left Inferior Frontal and Posterior Temporal Cortex Disrupts Gesture-Speech Integration.

Science.gov (United States)

Zhao, Wanying; Riggs, Kevin; Schindler, Igor; Holle, Henning

2018-02-21

Language and action naturally occur together in the form of cospeech gestures, and there is now convincing evidence that listeners display a strong tendency to integrate semantic information from both domains during comprehension. A contentious question, however, has been which brain areas are causally involved in this integration process. In previous neuroimaging studies, left inferior frontal gyrus (IFG) and posterior middle temporal gyrus (pMTG) have emerged as candidate areas; however, it is currently not clear whether these areas are causally or merely epiphenomenally involved in gesture-speech integration. In the present series of experiments, we directly tested for a potential critical role of IFG and pMTG by observing the effect of disrupting activity in these areas using transcranial magnetic stimulation in a mixed gender sample of healthy human volunteers. The outcome measure was performance on a Stroop-like gesture task (Kelly et al., 2010a), which provides a behavioral index of gesture-speech integration. Our results provide clear evidence that disrupting activity in IFG and pMTG selectively impairs gesture-speech integration, suggesting that both areas are causally involved in the process. These findings are consistent with the idea that these areas play a joint role in gesture-speech integration, with IFG regulating strategic semantic access via top-down signals acting upon temporal storage areas. SIGNIFICANCE STATEMENT Previous neuroimaging studies suggest an involvement of inferior frontal gyrus and posterior middle temporal gyrus in gesture-speech integration, but findings have been mixed and due to methodological constraints did not allow inferences of causality. By adopting a virtual lesion approach involving transcranial magnetic stimulation, the present study provides clear evidence that both areas are causally involved in combining semantic information arising from gesture and speech. These findings support the view that, rather than being
Severe Speech Sound Disorders: An Integrated Multimodal Intervention

Science.gov (United States)

King, Amie M.; Hengst, Julie A.; DeThorne, Laura S.

2013-01-01

Purpose: This study introduces an integrated multimodal intervention (IMI) and examines its effectiveness for the treatment of persistent and severe speech sound disorders (SSD) in young children. The IMI is an activity-based intervention that focuses simultaneously on increasing the "quantity" of a child's meaningful productions of target words…
Within-subjects comparison of the HiRes and Fidelity120 speech processing strategies: speech perception and its relation to place-pitch sensitivity.

Science.gov (United States)

Donaldson, Gail S; Dawson, Patricia K; Borden, Lamar Z

2011-01-01

Previous studies have confirmed that current steering can increase the number of discriminable pitches available to many cochlear implant (CI) users; however, the ability to perceive additional pitches has not been linked to improved speech perception. The primary goals of this study were to determine (1) whether adult CI users can achieve higher levels of spectral cue transmission with a speech processing strategy that implements current steering (Fidelity120) than with a predecessor strategy (HiRes) and, if so, (2) whether the magnitude of improvement can be predicted from individual differences in place-pitch sensitivity. A secondary goal was to determine whether Fidelity120 supports higher levels of speech recognition in noise than HiRes. A within-subjects repeated measures design evaluated speech perception performance with Fidelity120 relative to HiRes in 10 adult CI users. Subjects used the novel strategy (either HiRes or Fidelity120) for 8 wks during the main study; a subset of five subjects used Fidelity120 for three additional months after the main study. Speech perception was assessed for the spectral cues related to vowel F1 frequency, vowel F2 frequency, and consonant place of articulation; overall transmitted information for vowels and consonants; and sentence recognition in noise. Place-pitch sensitivity was measured for electrode pairs in the apical, middle, and basal regions of the implanted array using a psychophysical pitch-ranking task. With one exception, there was no effect of strategy (HiRes versus Fidelity120) on the speech measures tested, either during the main study (N = 10) or after extended use of Fidelity120 (N = 5). The exception was a small but significant advantage for HiRes over Fidelity120 for consonant perception during the main study. Examination of individual subjects' data revealed that 3 of 10 subjects demonstrated improved perception of one or more spectral cues with Fidelity120 relative to HiRes after 8 wks or longer
Degradation of labial information modifies audiovisual speech perception in cochlear-implanted children.

Science.gov (United States)

Huyse, Aurélie; Berthommier, Frédéric; Leybaert, Jacqueline

2013-01-01

The aim of the present study was to examine audiovisual speech integration in cochlear-implanted children and in normally hearing children exposed to degraded auditory stimuli. Previous studies have shown that speech perception in cochlear-implanted users is biased toward the visual modality when audition and vision provide conflicting information. Our main question was whether an experimentally designed degradation of the visual speech cue would increase the importance of audition in the response pattern. The impact of auditory proficiency was also investigated. A group of 31 children with cochlear implants and a group of 31 normally hearing children matched for chronological age were recruited. All children with cochlear implants had profound congenital deafness and had used their implants for at least 2 years. Participants had to perform an /aCa/ consonant-identification task in which stimuli were presented randomly in three conditions: auditory only, visual only, and audiovisual (congruent and incongruent McGurk stimuli). In half of the experiment, the visual speech cue was normal; in the other half (visual reduction) a degraded visual signal was presented, aimed at preventing lipreading of good quality. The normally hearing children received a spectrally reduced speech signal (simulating the input delivered by the cochlear implant). First, performance in visual-only and in congruent audiovisual modalities were decreased, showing that the visual reduction technique used here was efficient at degrading lipreading. Second, in the incongruent audiovisual trials, visual reduction led to a major increase in the number of auditory based responses in both groups. Differences between proficient and nonproficient children were found in both groups, with nonproficient children's responses being more visual and less auditory than those of proficient children. Further analysis revealed that differences between visually clear and visually reduced conditions and between
Voice-associated static face image releases speech from informational masking.

Science.gov (United States)

Gao, Yayue; Cao, Shuyang; Qu, Tianshu; Wu, Xihong; Li, Haifeng; Zhang, Jinsheng; Li, Liang

2014-06-01

In noisy, multipeople talking environments such as a cocktail party, listeners can use various perceptual and/or cognitive cues to improve recognition of target speech against masking, particularly informational masking. Previous studies have shown that temporally prepresented voice cues (voice primes) improve recognition of target speech against speech masking but not noise masking. This study investigated whether static face image primes that have become target-voice associated (i.e., facial images linked through associative learning with voices reciting the target speech) can be used by listeners to unmask speech. The results showed that in 32 normal-hearing younger adults, temporally prepresenting a voice-priming sentence with the same voice reciting the target sentence significantly improved the recognition of target speech that was masked by irrelevant two-talker speech. When a person's face photograph image became associated with the voice reciting the target speech by learning, temporally prepresenting the target-voice-associated face image significantly improved recognition of target speech against speech masking, particularly for the last two keywords in the target sentence. Moreover, speech-recognition performance under the voice-priming condition was significantly correlated to that under the face-priming condition. The results suggest that learned facial information on talker identity plays an important role in identifying the target-talker's voice and facilitating selective attention to the target-speech stream against the masking-speech stream. © 2014 The Institute of Psychology, Chinese Academy of Sciences and Wiley Publishing Asia Pty Ltd.
Integrating Automatic Speech Recognition and Machine Translation for Better Translation Outputs

DEFF Research Database (Denmark)

Liyanapathirana, Jeevanthi

translations, combining machine translation with computer assisted translation has drawn attention in current research. This combines two prospects: the opportunity of ensuring high quality translation along with a significant performance gain. Automatic Speech Recognition (ASR) is another important area......, which caters important functionalities in language processing and natural language understanding tasks. In this work we integrate automatic speech recognition and machine translation in parallel. We aim to avoid manual typing of possible translations as dictating the translation would take less time...... to the n-best list rescoring, we also use word graphs with the expectation of arriving at a tighter integration of ASR and MT models. Integration methods include constraining ASR models using language and translation models of MT, and vice versa. We currently develop and experiment different methods...
The minor third communicates sadness in speech, mirroring its use in music.

Science.gov (United States)

Curtis, Meagan E; Bharucha, Jamshed J

2010-06-01

There is a long history of attempts to explain why music is perceived as expressing emotion. The relationship between pitches serves as an important cue for conveying emotion in music. The musical interval referred to as the minor third is generally thought to convey sadness. We reveal that the minor third also occurs in the pitch contour of speech conveying sadness. Bisyllabic speech samples conveying four emotions were recorded by 9 actresses. Acoustic analyses revealed that the relationship between the 2 salient pitches of the sad speech samples tended to approximate a minor third. Participants rated the speech samples for perceived emotion, and the use of numerous acoustic parameters as cues for emotional identification was modeled using regression analysis. The minor third was the most reliable cue for identifying sadness. Additional participants rated musical intervals for emotion, and their ratings verified the historical association between the musical minor third and sadness. These findings support the theory that human vocal expressions and music share an acoustic code for communicating sadness.
Man-system interface based on automatic speech recognition: integration to a virtual control desk

Energy Technology Data Exchange (ETDEWEB)

Jorge, Carlos Alexandre F.; Mol, Antonio Carlos A.; Pereira, Claudio M.N.A.; Aghina, Mauricio Alves C., E-mail: calexandre@ien.gov.b, E-mail: mol@ien.gov.b, E-mail: cmnap@ien.gov.b, E-mail: mag@ien.gov.b [Instituto de Engenharia Nuclear (IEN/CNEN-RJ), Rio de Janeiro, RJ (Brazil); Nomiya, Diogo V., E-mail: diogonomiya@gmail.co [Universidade Federal do Rio de Janeiro (UFRJ), RJ (Brazil)

2009-07-01

This work reports the implementation of a man-system interface based on automatic speech recognition, and its integration to a virtual nuclear power plant control desk. The later is aimed to reproduce a real control desk using virtual reality technology, for operator training and ergonomic evaluation purpose. An automatic speech recognition system was developed to serve as a new interface with users, substituting computer keyboard and mouse. They can operate this virtual control desk in front of a computer monitor or a projection screen through spoken commands. The automatic speech recognition interface developed is based on a well-known signal processing technique named cepstral analysis, and on artificial neural networks. The speech recognition interface is described, along with its integration with the virtual control desk, and results are presented. (author)
Man-system interface based on automatic speech recognition: integration to a virtual control desk

International Nuclear Information System (INIS)

Jorge, Carlos Alexandre F.; Mol, Antonio Carlos A.; Pereira, Claudio M.N.A.; Aghina, Mauricio Alves C.; Nomiya, Diogo V.

2009-01-01

This work reports the implementation of a man-system interface based on automatic speech recognition, and its integration to a virtual nuclear power plant control desk. The later is aimed to reproduce a real control desk using virtual reality technology, for operator training and ergonomic evaluation purpose. An automatic speech recognition system was developed to serve as a new interface with users, substituting computer keyboard and mouse. They can operate this virtual control desk in front of a computer monitor or a projection screen through spoken commands. The automatic speech recognition interface developed is based on a well-known signal processing technique named cepstral analysis, and on artificial neural networks. The speech recognition interface is described, along with its integration with the virtual control desk, and results are presented. (author)
The early maximum likelihood estimation model of audiovisual integration in speech perception

DEFF Research Database (Denmark)

Andersen, Tobias

2015-01-01

integration to speech perception along with three model variations. In early MLE, integration is based on a continuous internal representation before categorization, which can make the model more parsimonious by imposing constraints that reflect experimental designs. The study also shows that cross......Speech perception is facilitated by seeing the articulatory mouth movements of the talker. This is due to perceptual audiovisual integration, which also causes the McGurk−MacDonald illusion, and for which a comprehensive computational account is still lacking. Decades of research have largely......-validation can evaluate models of audiovisual integration based on typical data sets taking both goodness-of-fit and model flexibility into account. All models were tested on a published data set previously used for testing the FLMP. Cross-validation favored the early MLE while more conventional error measures...

Bayesian integration of position and orientation cues in perception of biological and non-biological dynamic forms

Directory of Open Access Journals (Sweden)

Steven Matthew Thurman

2014-02-01

Full Text Available Visual form analysis is fundamental to shape perception and likely plays a central role in perception of more complex dynamic shapes, such as moving objects or biological motion. Two primary form-based cues serve to represent the overall shape of an object: the spatial position and the orientation of locations along the boundary of the object. However, it is unclear how the visual system integrates these two sources of information in dynamic form analysis, and in particular how the brain resolves ambiguities due to sensory uncertainty and/or cue conflict. In the current study, we created animations of sparsely-sampled dynamic objects (human walkers or rotating squares comprised of oriented Gabor patches in which orientation could either coincide or conflict with information provided by position cues. When the cues were incongruent, we found a characteristic trade-off between position and orientation information whereby position cues increasingly dominated perception as the relative uncertainty of orientation increased and vice versa. Furthermore, we found no evidence for differences in the visual processing of biological and non-biological objects, casting doubt on the claim that biological motion may be specialized in the human brain, at least in specific terms of form analysis. To explain these behavioral results quantitatively, we adopt a probabilistic template-matching model that uses Bayesian inference within local modules to estimate object shape separately from either spatial position or orientation signals. The outputs of the two modules are integrated with weights that reflect individual estimates of subjective cue reliability, and integrated over time to produce a decision about the perceived dynamics of the input data. Results of this model provided a close fit to the behavioral data, suggesting a mechanism in the human visual system that approximates rational Bayesian inference to integrate position and orientation signals in dynamic
Effects of Audio-Visual Integration on the Detection of Masked Speech and Non-Speech Sounds

Science.gov (United States)

Eramudugolla, Ranmalee; Henderson, Rachel; Mattingley, Jason B.

2011-01-01

Integration of simultaneous auditory and visual information about an event can enhance our ability to detect that event. This is particularly evident in the perception of speech, where the articulatory gestures of the speaker's lips and face can significantly improve the listener's detection and identification of the message, especially when that…
SPEECH ACT ANALYSIS OF IGBO UTTERANCES IN FUNERAL ...

African Journals Online (AJOL)

Dean SPGS NAU

In other words, a speech act is a .... relationship with that one single person and to share those memories ... identifies four conditions or rules for the effective performance of a ... In other words, the rules establish a system for the ... 54 shaped by the interplay of particular speech acts and non verbal cues. ..... Retrieved from.
Integration of multiple intraguild predator cues for oviposition decisions by a predatory mite

Science.gov (United States)

Walzer, Andreas; Schausberger, Peter

2012-01-01

In mutual intraguild predation (IGP), the role of individual guild members is strongly context dependent and, during ontogeny, can shift from an intraguild (IG) prey to a food competitor or to an IG predator. Consequently, recognition of an offspring's predator is more complex for IG than classic prey females. Thus, IG prey females should be able to modulate their oviposition decisions by integrating multiple IG predator cues and by experience. Using a guild of plant-inhabiting predatory mites sharing the spider mite Tetranychus urticae as prey and passing through ontogenetic role shifts in mutual IGP, we assessed the effects of single and combined direct cues of the IG predator Amblyseius andersoni (eggs and traces left by a female on the substrate) on prey patch selection and oviposition behaviour of naïve and IG predator-experienced IG prey females of Phytoseiulus persimilis. The IG prey females preferentially resided in patches without predator cues when the alternative patch contained traces of predator females or the cue combination. Preferential egg placement in patches without predator cues was only apparent in the choice situation with the cue combination. Experience increased the responsiveness of females exposed to the IG predator cue combination, indicated by immediate selection of the prey patch without predator cues and almost perfect oviposition avoidance in patches with the cue combination. We argue that the evolution of the ability of IG prey females to evaluate offspring's IGP risk accurately is driven by the irreversibility of oviposition and the functionally complex relationships between predator guild members. PMID:23264692
Iconic Gestures for Robot Avatars, Recognition and Integration with Speech

Science.gov (United States)

Bremner, Paul; Leonards, Ute

2016-01-01

Co-verbal gestures are an important part of human communication, improving its efficiency and efficacy for information conveyance. One possible means by which such multi-modal communication might be realized remotely is through the use of a tele-operated humanoid robot avatar. Such avatars have been previously shown to enhance social presence and operator salience. We present a motion tracking based tele-operation system for the NAO robot platform that allows direct transmission of speech and gestures produced by the operator. To assess the capabilities of this system for transmitting multi-modal communication, we have conducted a user study that investigated if robot-produced iconic gestures are comprehensible, and are integrated with speech. Robot performed gesture outcomes were compared directly to those for gestures produced by a human actor, using a within participant experimental design. We show that iconic gestures produced by a tele-operated robot are understood by participants when presented alone, almost as well as when produced by a human. More importantly, we show that gestures are integrated with speech when presented as part of a multi-modal communication equally well for human and robot performances. PMID:26925010
Iconic Gestures for Robot Avatars, Recognition and Integration with Speech

Directory of Open Access Journals (Sweden)

Paul Adam Bremner

2016-02-01

Full Text Available Co-verbal gestures are an important part of human communication, improving its efficiency and efficacy for information conveyance. One possible means by which such multi-modal communication might be realised remotely is through the use of a tele-operated humanoid robot avatar. Such avatars have been previously shown to enhance social presence and operator salience. We present a motion tracking based tele-operation system for the NAO robot platform that allows direct transmission of speech and gestures produced by the operator. To assess the capabilities of this system for transmitting multi-modal communication, we have conducted a user study that investigated if robot-produced iconic gestures are comprehensible, and are integrated with speech. Robot performed gesture outcomes were compared directly to those for gestures produced by a human actor, using a within participant experimental design. We show that iconic gestures produced by a tele-operated robot are understood by participants when presented alone, almost as well as when produced by a human. More importantly, we show that gestures are integrated with speech when presented as part of a multi-modal communication equally well for human and robot performances.
Intelligibility for Binaural Speech with Discarded Low-SNR Speech Components.

Science.gov (United States)

Schoenmaker, Esther; van de Par, Steven

2016-01-01

Speech intelligibility in multitalker settings improves when the target speaker is spatially separated from the interfering speakers. A factor that may contribute to this improvement is the improved detectability of target-speech components due to binaural interaction in analogy to the Binaural Masking Level Difference (BMLD). This would allow listeners to hear target speech components within specific time-frequency intervals that have a negative SNR, similar to the improvement in the detectability of a tone in noise when these contain disparate interaural difference cues. To investigate whether these negative-SNR target-speech components indeed contribute to speech intelligibility, a stimulus manipulation was performed where all target components were removed when local SNRs were smaller than a certain criterion value. It can be expected that for sufficiently high criterion values target speech components will be removed that do contribute to speech intelligibility. For spatially separated speakers, assuming that a BMLD-like detection advantage contributes to intelligibility, degradation in intelligibility is expected already at criterion values below 0 dB SNR. However, for collocated speakers it is expected that higher criterion values can be applied without impairing speech intelligibility. Results show that degradation of intelligibility for separated speakers is only seen for criterion values of 0 dB and above, indicating a negligible contribution of a BMLD-like detection advantage in multitalker settings. These results show that the spatial benefit is related to a spatial separation of speech components at positive local SNRs rather than to a BMLD-like detection improvement for speech components at negative local SNRs.
Sound frequency affects speech emotion perception: results from congenital amusia.

Science.gov (United States)

Lolli, Sydney L; Lewenstein, Ari D; Basurto, Julian; Winnik, Sean; Loui, Psyche

2015-01-01

Congenital amusics, or "tone-deaf" individuals, show difficulty in perceiving and producing small pitch differences. While amusia has marked effects on music perception, its impact on speech perception is less clear. Here we test the hypothesis that individual differences in pitch perception affect judgment of emotion in speech, by applying low-pass filters to spoken statements of emotional speech. A norming study was first conducted on Mechanical Turk to ensure that the intended emotions from the Macquarie Battery for Evaluation of Prosody were reliably identifiable by US English speakers. The most reliably identified emotional speech samples were used in Experiment 1, in which subjects performed a psychophysical pitch discrimination task, and an emotion identification task under low-pass and unfiltered speech conditions. Results showed a significant correlation between pitch-discrimination threshold and emotion identification accuracy for low-pass filtered speech, with amusics (defined here as those with a pitch discrimination threshold >16 Hz) performing worse than controls. This relationship with pitch discrimination was not seen in unfiltered speech conditions. Given the dissociation between low-pass filtered and unfiltered speech conditions, we inferred that amusics may be compensating for poorer pitch perception by using speech cues that are filtered out in this manipulation. To assess this potential compensation, Experiment 2 was conducted using high-pass filtered speech samples intended to isolate non-pitch cues. No significant correlation was found between pitch discrimination and emotion identification accuracy for high-pass filtered speech. Results from these experiments suggest an influence of low frequency information in identifying emotional content of speech.
Integrating cues of social interest and voice pitch in men's preferences for women's voices.

Science.gov (United States)

Jones, Benedict C; Feinberg, David R; Debruine, Lisa M; Little, Anthony C; Vukovic, Jovana

2008-04-23

Most previous studies of vocal attractiveness have focused on preferences for physical characteristics of voices such as pitch. Here we examine the content of vocalizations in interaction with such physical traits, finding that vocal cues of social interest modulate the strength of men's preferences for raised pitch in women's voices. Men showed stronger preferences for raised pitch when judging the voices of women who appeared interested in the listener than when judging the voices of women who appeared relatively disinterested in the listener. These findings show that voice preferences are not determined solely by physical properties of voices and that men integrate information about voice pitch and the degree of social interest expressed by women when forming voice preferences. Women's preferences for raised pitch in women's voices were not modulated by cues of social interest, suggesting that the integration of cues of social interest and voice pitch when men judge the attractiveness of women's voices may reflect adaptations that promote efficient allocation of men's mating effort.
Gesture-speech integration in children with specific language impairment.

Science.gov (United States)

Mainela-Arnold, Elina; Alibali, Martha W; Hostetter, Autumn B; Evans, Julia L

2014-11-01

Previous research suggests that speakers are especially likely to produce manual communicative gestures when they have relative ease in thinking about the spatial elements of what they are describing, paired with relative difficulty organizing those elements into appropriate spoken language. Children with specific language impairment (SLI) exhibit poor expressive language abilities together with within-normal-range nonverbal IQs. This study investigated whether weak spoken language abilities in children with SLI influence their reliance on gestures to express information. We hypothesized that these children would rely on communicative gestures to express information more often than their age-matched typically developing (TD) peers, and that they would sometimes express information in gestures that they do not express in the accompanying speech. Participants were 15 children with SLI (aged 5;6-10;0) and 18 age-matched TD controls. Children viewed a wordless cartoon and retold the story to a listener unfamiliar with the story. Children's gestures were identified and coded for meaning using a previously established system. Speech-gesture combinations were coded as redundant if the information conveyed in speech and gesture was the same, and non-redundant if the information conveyed in speech was different from the information conveyed in gesture. Children with SLI produced more gestures than children in the TD group; however, the likelihood that speech-gesture combinations were non-redundant did not differ significantly across the SLI and TD groups. In both groups, younger children were significantly more likely to produce non-redundant speech-gesture combinations than older children. The gesture-speech integration system functions similarly in children with SLI and TD, but children with SLI rely more on gesture to help formulate, conceptualize or express the messages they want to convey. This provides motivation for future research examining whether interventions
Crossmodal deficit in dyslexic children: practice affects the neural timing of letter-speech sound integration

Directory of Open Access Journals (Sweden)

Gojko eŽarić

2015-06-01

Full Text Available A failure to build solid letter-speech sound associations may contribute to reading impairments in developmental dyslexia. Whether this reduced neural integration of letters and speech sounds changes over time within individual children and how this relates to behavioral gains in reading skills remains unknown. In this research, we examined changes in event-related potential (ERP measures of letter-speech sound integration over a 6-month period during which 9-year-old dyslexic readers (n=17 followed a training in letter-speech sound coupling next to their regular reading curriculum. We presented the Dutch spoken vowels /a/ and /o/ as standard and deviant stimuli in one auditory and two audiovisual oddball conditions. In one audiovisual condition (AV0, the letter ‘a’ was presented simultaneously with the vowels, while in the other (AV200 it was preceding vowel onset for 200 ms. Prior to the training (T1, dyslexic readers showed the expected pattern of typical auditory mismatch responses, together with the absence of letter-speech sound effects in a late negativity (LN window. After the training (T2, our results showed earlier (and enhanced crossmodal effects in the LN window. Most interestingly, earlier LN latency at T2 was significantly related to higher behavioral accuracy in letter-speech sound coupling. On a more general level, the timing of the earlier mismatch negativity (MMN in the simultaneous condition (AV0 measured at T1, significantly related to reading fluency at both T1 and T2 as well as with reading gains. Our findings suggest that the reduced neural integration of letters and speech sounds in dyslexic children may show moderate improvement with reading instruction and training and that behavioral improvements relate especially to individual differences in the timing of this neural integration.
Predicting Intelligibility Gains in Dysarthria through Automated Speech Feature Analysis

Science.gov (United States)

Fletcher, Annalise R.; Wisler, Alan A.; McAuliffe, Megan J.; Lansford, Kaitlin L.; Liss, Julie M.

2017-01-01

Purpose: Behavioral speech modifications have variable effects on the intelligibility of speakers with dysarthria. In the companion article, a significant relationship was found between measures of speakers' baseline speech and their intelligibility gains following cues to speak louder and reduce rate (Fletcher, McAuliffe, Lansford, Sinex, &…
A Longitudinal Assessment of Early Childhood Education with Integrated Speech Therapy for Children with Significant Language Impairment in Germany

Science.gov (United States)

Ullrich, Dieter; Ullrich, Katja; Marten, Magret

2014-01-01

Background: In Lower Saxony, Germany, pre-school children with language- and speech-deficits have the opportunity to access kindergartens with integrated language-/speech therapy prior to attending primary school, both regular or with integrated speech therapy. It is unknown whether these early childhood education treatments are helpful and…
Comparing the influence of spectro-temporal integration in computational speech segregation

DEFF Research Database (Denmark)

Bentsen, Thomas; May, Tobias; Kressner, Abigail Anne

2016-01-01

The goal of computational speech segregation systems is to automatically segregate a target speaker from interfering maskers. Typically, these systems include a feature extraction stage in the front-end and a classification stage in the back-end. A spectrotemporal integration strategy can...... be applied in either the frontend, using the so-called delta features, or in the back-end, using a second classifier that exploits the posterior probability of speech from the first classifier across a spectro-temporal window. This study systematically analyzes the influence of such stages on segregation...... metric that comprehensively predicts computational segregation performance and correlates well with intelligibility. The outcome of this study could help to identify the most effective spectro-temporal integration strategy for computational segregation systems....
Integration of visual and inertial cues in perceived heading of self-motion

NARCIS (Netherlands)

Winkel, K.N. de; Weesie, H.M.; Werkhoven, P.J.; Groen, E.L.

2010-01-01

In the present study, we investigated whether the perception of heading of linear self-motion can be explained by Maximum Likelihood Integration (MLI) of visual and non-visual sensory cues. MLI predicts smaller variance for multisensory judgments compared to unisensory judgments. Nine participants
Seeing the talker's face supports executive processing of speech in steady state noise.

Science.gov (United States)

Mishra, Sushmit; Lunner, Thomas; Stenfelt, Stefan; Rönnberg, Jerker; Rudner, Mary

2013-01-01

Listening to speech in noise depletes cognitive resources, affecting speech processing. The present study investigated how remaining resources or cognitive spare capacity (CSC) can be deployed by young adults with normal hearing. We administered a test of CSC (CSCT; Mishra et al., 2013) along with a battery of established cognitive tests to 20 participants with normal hearing. In the CSCT, lists of two-digit numbers were presented with and without visual cues in quiet, as well as in steady-state and speech-like noise at a high intelligibility level. In low load conditions, two numbers were recalled according to instructions inducing executive processing (updating, inhibition) and in high load conditions the participants were additionally instructed to recall one extra number, which was the always the first item in the list. In line with previous findings, results showed that CSC was sensitive to memory load and executive function but generally not related to working memory capacity (WMC). Furthermore, CSCT scores in quiet were lowered by visual cues, probably due to distraction. In steady-state noise, the presence of visual cues improved CSCT scores, probably by enabling better encoding. Contrary to our expectation, CSCT performance was disrupted more in steady-state than speech-like noise, although only without visual cues, possibly because selective attention could be used to ignore the speech-like background and provide an enriched representation of target items in working memory similar to that obtained in quiet. This interpretation is supported by a consistent association between CSCT scores and updating skills.
Sonority's Effect as a Surface Cue on Lexical Speech Perception of Children With Cochlear Implants.

Science.gov (United States)

Hamza, Yasmeen; Okalidou, Areti; Kyriafinis, George; van Wieringen, Astrid

2018-03-06

Sonority is the relative perceptual prominence/loudness of speech sounds of the same length, stress, and pitch. Children with cochlear implants (CIs), with restored audibility and relatively intact temporal processing, are expected to benefit from the perceptual prominence cues of highly sonorous sounds. Sonority also influences lexical access through the sonority-sequencing principle (SSP), a grammatical phonotactic rule, which facilitates the recognition and segmentation of syllables within speech. The more nonsonorous the onset of a syllable is, the larger is the degree of sonority rise to the nucleus, and the more optimal the SSP. Children with CIs may experience hindered or delayed development of the language-learning rule SSP, as a result of their deprived/degraded auditory experience. The purpose of the study was to explore sonority's role in speech perception and lexical access of prelingually deafened children with CIs. A case-control study with 15 children with CIs, 25 normal-hearing children (NHC), and 50 normal-hearing adults was conducted, using a lexical identification task of novel, nonreal CV-CV words taught via fast mapping. The CV-CV words were constructed according to four sonority conditions, entailing syllables with sonorous onsets/less optimal SSP (SS) and nonsonorous onsets/optimal SSP (NS) in all combinations, that is, SS-SS, SS-NS, NS-SS, and NS-NS. Outcome measures were accuracy and reaction times (RTs). A subgroup analysis of 12 children with CIs pair matched to 12 NHC on hearing age aimed to study the effect of oral-language exposure period on the sonority-related performance. The children groups showed similar accuracy performance, overall and across all the sonority conditions. However, within-group comparisons showed that the children with CIs scored more accurately on the SS-SS condition relative to the NS-NS and NS-SS conditions, while the NHC performed equally well across all conditions. Additionally, adult-comparable accuracy
Assessing the contribution of binaural cues for apparent source width perception via a functional model

DEFF Research Database (Denmark)

Käsbach, Johannes; Hahmann, Manuel; May, Tobias

2016-01-01

In echoic conditions, sound sources are not perceived as point sources but appear to be expanded. The expansion in the horizontal dimension is referred to as apparent source width (ASW). To elicit this perception, the auditory system has access to fluctuations of binaural cues, the interaural time...... a statistical representation of ITDs and ILDs based on percentiles integrated over time and frequency. The model’s performance was evaluated against psychoacoustic data obtained with noise, speech and music signals in loudspeakerbased experiments. A robust model prediction of ASW was achieved using a cross...
The Selective Cue Integration Framework: A Theory of Postidentification Witness Confidence Assessment

Science.gov (United States)

Charman, Steve D.; Carlucci, Marianna; Vallano, Jon; Gregory, Amy Hyman

2010-01-01

The current manuscript proposes a theory of how witnesses assess their confidence following a lineup identification, called the selective cue integration framework (SCIF). Drawing from past research on the postidentification feedback effect, the SCIF details a three-stage process of confidence assessment that is based largely on a…
Seeing the talker’s face supports executive processing of speech in steady state noise

Directory of Open Access Journals (Sweden)

Sushmit eMishra

2013-11-01

Full Text Available Listening to speech in noise depletes cognitive resources, affecting speech processing. The present study investigated how remaining resources or cognitive spare capacity (CSC can be deployed by young adults with normal hearing. We administered a test of CSC (CSCT, Mishra et al., 2013 along with a battery of established cognitive tests to 20 participants with normal hearing. In the CSCT, lists of two-digit numbers were presented with and without visual cues in quiet, as well as in steady-state and speech-like noise at a high intelligibility level. In low load conditions, two numbers were recalled according to instructions inducing executive processing (updating, inhibition and in high load conditions the participants were additionally instructed to recall one extra number, which was the always the first item in the list. In line with previous findings, results showed that CSC was sensitive to memory load and executive function but generally not related to working memory capacity. Furthermore, CSCT scores in quiet were lowered by visual cues, probably due to distraction. In steady-state noise, the presence of visual cues improved CSCT scores, probably by enabling better encoding. Contrary to our expectation, CSCT performance was disrupted more in steady-state than speech-like noise, although only without visual cues, possibly because selective attention could be used to ignore the speech-like background and provide an enriched representation of target items in working memory similar to that obtained in quiet. This interpretation is supported by a consistent association between CSCT scores and updating skills.

Seeing the talker’s face supports executive processing of speech in steady state noise

Science.gov (United States)

Mishra, Sushmit; Lunner, Thomas; Stenfelt, Stefan; Rönnberg, Jerker; Rudner, Mary

2013-01-01

Listening to speech in noise depletes cognitive resources, affecting speech processing. The present study investigated how remaining resources or cognitive spare capacity (CSC) can be deployed by young adults with normal hearing. We administered a test of CSC (CSCT; Mishra et al., 2013) along with a battery of established cognitive tests to 20 participants with normal hearing. In the CSCT, lists of two-digit numbers were presented with and without visual cues in quiet, as well as in steady-state and speech-like noise at a high intelligibility level. In low load conditions, two numbers were recalled according to instructions inducing executive processing (updating, inhibition) and in high load conditions the participants were additionally instructed to recall one extra number, which was the always the first item in the list. In line with previous findings, results showed that CSC was sensitive to memory load and executive function but generally not related to working memory capacity (WMC). Furthermore, CSCT scores in quiet were lowered by visual cues, probably due to distraction. In steady-state noise, the presence of visual cues improved CSCT scores, probably by enabling better encoding. Contrary to our expectation, CSCT performance was disrupted more in steady-state than speech-like noise, although only without visual cues, possibly because selective attention could be used to ignore the speech-like background and provide an enriched representation of target items in working memory similar to that obtained in quiet. This interpretation is supported by a consistent association between CSCT scores and updating skills. PMID:24324411
Speech Acquisition and Automatic Speech Recognition for Integrated Spacesuit Audio Systems

Science.gov (United States)

Huang, Yiteng; Chen, Jingdong; Chen, Shaoyan

2010-01-01

A voice-command human-machine interface system has been developed for spacesuit extravehicular activity (EVA) missions. A multichannel acoustic signal processing method has been created for distant speech acquisition in noisy and reverberant environments. This technology reduces noise by exploiting differences in the statistical nature of signal (i.e., speech) and noise that exists in the spatial and temporal domains. As a result, the automatic speech recognition (ASR) accuracy can be improved to the level at which crewmembers would find the speech interface useful. The developed speech human/machine interface will enable both crewmember usability and operational efficiency. It can enjoy a fast rate of data/text entry, small overall size, and can be lightweight. In addition, this design will free the hands and eyes of a suited crewmember. The system components and steps include beam forming/multi-channel noise reduction, single-channel noise reduction, speech feature extraction, feature transformation and normalization, feature compression, model adaption, ASR HMM (Hidden Markov Model) training, and ASR decoding. A state-of-the-art phoneme recognizer can obtain an accuracy rate of 65 percent when the training and testing data are free of noise. When it is used in spacesuits, the rate drops to about 33 percent. With the developed microphone array speech-processing technologies, the performance is improved and the phoneme recognition accuracy rate rises to 44 percent. The recognizer can be further improved by combining the microphone array and HMM model adaptation techniques and using speech samples collected from inside spacesuits. In addition, arithmetic complexity models for the major HMMbased ASR components were developed. They can help real-time ASR system designers select proper tasks when in the face of constraints in computational resources.
The Perception of "Sine-Wave Speech" by Adults with Developmental Dyslexia.

Science.gov (United States)

Rosner, Burton S.; Talcott, Joel B.; Witton, Caroline; Hogg, James D.; Richardson, Alexandra J.; Hansen, Peter C.; Stein, John F.

2003-01-01

"Sine-wave speech" sentences contain only four frequency-modulated sine waves, lacking many acoustic cues present in natural speech. Adults with (n=19) and without (n=14) dyslexia were asked to reproduce orally sine-wave utterances in successive trials. Results suggest comprehension of sine-wave sentences is impaired in some adults with…
Effectiveness of an Integrated Phonological Awareness Approach for Children with Childhood Apraxia of Speech (CAS)

Science.gov (United States)

McNeill, Brigid C.; Gillon, Gail T.; Dodd, Barbara

2009-01-01

This study investigated the effectiveness of an integrated phonological awareness approach for children with childhood apraxia of speech (CAS). Change in speech, phonological awareness, letter knowledge, word decoding, and spelling skills were examined. A controlled multiple single-subject design was employed. Twelve children aged 4-7 years with…
Cue Integration in Dynamic Decision Marking (Integration des indices dans la Prise de Decision Dynamique)

Science.gov (United States)

2010-07-01

e.g., pulling the goalie in hockey). Outcomes are thought of as changes from a reference point- the reference point is itself changed by how the...challenges and training related issues can be pulled from the literature. These are discussed below. General Themes Cue integration involves the use...from our understanding of the term "pattern recognition", a number of general themes, challenges and training related issues can be pulled from the
Training to Improve Hearing Speech in Noise: Biological Mechanisms

Science.gov (United States)

Song, Judy H.; Skoe, Erika; Banai, Karen

2012-01-01

We investigated training-related improvements in listening in noise and the biological mechanisms mediating these improvements. Training-related malleability was examined using a program that incorporates cognitively based listening exercises to improve speech-in-noise perception. Before and after training, auditory brainstem responses to a speech syllable were recorded in quiet and multitalker noise from adults who ranged in their speech-in-noise perceptual ability. Controls did not undergo training but were tested at intervals equivalent to the trained subjects. Trained subjects exhibited significant improvements in speech-in-noise perception that were retained 6 months later. Subcortical responses in noise demonstrated training-related enhancements in the encoding of pitch-related cues (the fundamental frequency and the second harmonic), particularly for the time-varying portion of the syllable that is most vulnerable to perceptual disruption (the formant transition region). Subjects with the largest strength of pitch encoding at pretest showed the greatest perceptual improvement. Controls exhibited neither neurophysiological nor perceptual changes. We provide the first demonstration that short-term training can improve the neural representation of cues important for speech-in-noise perception. These results implicate and delineate biological mechanisms contributing to learning success, and they provide a conceptual advance to our understanding of the kind of training experiences that can influence sensory processing in adulthood. PMID:21799207
Large Scale Functional Brain Networks Underlying Temporal Integration of Audio-Visual Speech Perception: An EEG Study.

Science.gov (United States)

Kumar, G Vinodh; Halder, Tamesh; Jaiswal, Amit K; Mukherjee, Abhishek; Roy, Dipanjan; Banerjee, Arpan

2016-01-01

Observable lip movements of the speaker influence perception of auditory speech. A classical example of this influence is reported by listeners who perceive an illusory (cross-modal) speech sound (McGurk-effect) when presented with incongruent audio-visual (AV) speech stimuli. Recent neuroimaging studies of AV speech perception accentuate the role of frontal, parietal, and the integrative brain sites in the vicinity of the superior temporal sulcus (STS) for multisensory speech perception. However, if and how does the network across the whole brain participates during multisensory perception processing remains an open question. We posit that a large-scale functional connectivity among the neural population situated in distributed brain sites may provide valuable insights involved in processing and fusing of AV speech. Varying the psychophysical parameters in tandem with electroencephalogram (EEG) recordings, we exploited the trial-by-trial perceptual variability of incongruent audio-visual (AV) speech stimuli to identify the characteristics of the large-scale cortical network that facilitates multisensory perception during synchronous and asynchronous AV speech. We evaluated the spectral landscape of EEG signals during multisensory speech perception at varying AV lags. Functional connectivity dynamics for all sensor pairs was computed using the time-frequency global coherence, the vector sum of pairwise coherence changes over time. During synchronous AV speech, we observed enhanced global gamma-band coherence and decreased alpha and beta-band coherence underlying cross-modal (illusory) perception compared to unisensory perception around a temporal window of 300-600 ms following onset of stimuli. During asynchronous speech stimuli, a global broadband coherence was observed during cross-modal perception at earlier times along with pre-stimulus decreases of lower frequency power, e.g., alpha rhythms for positive AV lags and theta rhythms for negative AV lags. Thus, our
Replacing Maladaptive Speech with Verbal Labeling Responses: An Analysis of Generalized Responding.

Science.gov (United States)

Foxx, R. M.; And Others

1988-01-01

Three mentally handicapped students (aged 13, 36, and 40) with maladaptive speech received training to answer questions with verbal labels. The results of their cues-pause-point training showed that the students replaced their maladaptive speech with correct labels (answers) to questions in the training setting and three generalization settings.…
Speech Perception as a Multimodal Phenomenon

OpenAIRE

Rosenblum, Lawrence D.

2008-01-01

Speech perception is inherently multimodal. Visual speech (lip-reading) information is used by all perceivers and readily integrates with auditory speech. Imaging research suggests that the brain treats auditory and visual speech similarly. These findings have led some researchers to consider that speech perception works by extracting amodal information that takes the same form across modalities. From this perspective, speech integration is a property of the input information itself. Amodal s...
Auditory, Visual and Audiovisual Speech Processing Streams in Superior Temporal Sulcus.

Science.gov (United States)

Venezia, Jonathan H; Vaden, Kenneth I; Rong, Feng; Maddox, Dale; Saberi, Kourosh; Hickok, Gregory

2017-01-01

The human superior temporal sulcus (STS) is responsive to visual and auditory information, including sounds and facial cues during speech recognition. We investigated the functional organization of STS with respect to modality-specific and multimodal speech representations. Twenty younger adult participants were instructed to perform an oddball detection task and were presented with auditory, visual, and audiovisual speech stimuli, as well as auditory and visual nonspeech control stimuli in a block fMRI design. Consistent with a hypothesized anterior-posterior processing gradient in STS, auditory, visual and audiovisual stimuli produced the largest BOLD effects in anterior, posterior and middle STS (mSTS), respectively, based on whole-brain, linear mixed effects and principal component analyses. Notably, the mSTS exhibited preferential responses to multisensory stimulation, as well as speech compared to nonspeech. Within the mid-posterior and mSTS regions, response preferences changed gradually from visual, to multisensory, to auditory moving posterior to anterior. Post hoc analysis of visual regions in the posterior STS revealed that a single subregion bordering the mSTS was insensitive to differences in low-level motion kinematics yet distinguished between visual speech and nonspeech based on multi-voxel activation patterns. These results suggest that auditory and visual speech representations are elaborated gradually within anterior and posterior processing streams, respectively, and may be integrated within the mSTS, which is sensitive to more abstract speech information within and across presentation modalities. The spatial organization of STS is consistent with processing streams that are hypothesized to synthesize perceptual speech representations from sensory signals that provide convergent information from visual and auditory modalities.
Recognizing intentions in infant-directed speech: evidence for universals.

Science.gov (United States)

Bryant, Gregory A; Barrett, H Clark

2007-08-01

In all languages studied to date, distinct prosodic contours characterize different intention categories of infant-directed (ID) speech. This vocal behavior likely exists universally as a species-typical trait, but little research has examined whether listeners can accurately recognize intentions in ID speech using only vocal cues, without access to semantic information. We recorded native-English-speaking mothers producing four intention categories of utterances (prohibition, approval, comfort, and attention) as both ID and adult-directed (AD) speech, and we then presented the utterances to Shuar adults (South American hunter-horticulturalists). Shuar subjects were able to reliably distinguish ID from AD speech and were able to reliably recognize the intention categories in both types of speech, although performance was significantly better with ID speech. This is the first demonstration that adult listeners in an indigenous, nonindustrialized, and nonliterate culture can accurately infer intentions from both ID speech and AD speech in a language they do not speak.
Integrating Music Therapy Services and Speech-Language Therapy Services for Children with Severe Communication Impairments: A Co-Treatment Model

Science.gov (United States)

Geist, Kamile; McCarthy, John; Rodgers-Smith, Amy; Porter, Jessica

2008-01-01

Documenting how music therapy can be integrated with speech-language therapy services for children with communication delay is not evident in the literature. In this article, a collaborative model with procedures, experiences, and communication outcomes of integrating music therapy with the existing speech-language services is given. Using…
Common variation in the autism risk gene CNTNAP2, brain structural connectivity and multisensory speech integration.

Science.gov (United States)

Ross, Lars A; Del Bene, Victor A; Molholm, Sophie; Jae Woo, Young; Andrade, Gizely N; Abrahams, Brett S; Foxe, John J

2017-11-01

Three lines of evidence motivated this study. 1) CNTNAP2 variation is associated with autism risk and speech-language development. 2) CNTNAP2 variations are associated with differences in white matter (WM) tracts comprising the speech-language circuitry. 3) Children with autism show impairment in multisensory speech perception. Here, we asked whether an autism risk-associated CNTNAP2 single nucleotide polymorphism in neurotypical adults was associated with multisensory speech perception performance, and whether such a genotype-phenotype association was mediated through white matter tract integrity in speech-language circuitry. Risk genotype at rs7794745 was associated with decreased benefit from visual speech and lower fractional anisotropy (FA) in several WM tracts (right precentral gyrus, left anterior corona radiata, right retrolenticular internal capsule). These structural connectivity differences were found to mediate the effect of genotype on audiovisual speech perception, shedding light on possible pathogenic pathways in autism and biological sources of inter-individual variation in audiovisual speech processing in neurotypicals. Copyright © 2017 Elsevier Inc. All rights reserved.
Integrating Information from Speech and Physiological Signals to Achieve Emotional Sensitivity

DEFF Research Database (Denmark)

Kim, Jonghwa; André, Elisabeth; Rehm, Matthias

2005-01-01

Recently, there has been a significant amount of work on the recognition of emotions from speech and biosignals. Most approaches to emotion recognition so far concentrate on a single modality and do not take advantage of the fact that an integrated multimodal analysis may help to resolve...
Subcortical encoding of speech cues in children with attention deficit hyperactivity disorder.

Science.gov (United States)

Jafari, Zahra; Malayeri, Saeed; Rostami, Reza

2015-02-01

There is little information about processing of nonspeech and speech stimuli at the subcortical level in individuals with attention deficit hyperactivity disorder (ADHD). The auditory brainstem response (ABR) provides information about the function of the auditory brainstem pathways. We aim to investigate the subcortical function in neural encoding of click and speech stimuli in children with ADHD. The subjects include 50 children with ADHD and 34 typically developing (TD) children between the ages of 8 and 12 years. Click ABR (cABR) and speech ABR (sABR) with 40 ms synthetic /da/ syllable stimulus were recorded. Latencies of cABR in waves of III and V and duration of V-Vn (P⩽0.027), and latencies of sABR in waves A, D, E, F and O and duration of V-A (P⩽0.034) were significantly longer in children with ADHD than in TD children. There were no apparent differences in components the sustained frequency following response (FFR). We conclude that children with ADHD have deficits in temporal neural encoding of both nonspeech and speech stimuli. There is a common dysfunction in the processing of click and speech stimuli at the brainstem level in children with suspected ADHD. Copyright © 2015. Published by Elsevier Ireland Ltd.
Audiovisual speech integration in the superior temporal region is dysfunctional in dyslexia.

Science.gov (United States)

Ye, Zheng; Rüsseler, Jascha; Gerth, Ivonne; Münte, Thomas F

2017-07-25

Dyslexia is an impairment of reading and spelling that affects both children and adults even after many years of schooling. Dyslexic readers have deficits in the integration of auditory and visual inputs but the neural mechanisms of the deficits are still unclear. This fMRI study examined the neural processing of auditorily presented German numbers 0-9 and videos of lip movements of a German native speaker voicing numbers 0-9 in unimodal (auditory or visual) and bimodal (always congruent) conditions in dyslexic readers and their matched fluent readers. We confirmed results of previous studies that the superior temporal gyrus/sulcus plays a critical role in audiovisual speech integration: fluent readers showed greater superior temporal activations for combined audiovisual stimuli than auditory-/visual-only stimuli. Importantly, such an enhancement effect was absent in dyslexic readers. Moreover, the auditory network (bilateral superior temporal regions plus medial PFC) was dynamically modulated during audiovisual integration in fluent, but not in dyslexic readers. These results suggest that superior temporal dysfunction may underly poor audiovisual speech integration in readers with dyslexia. Copyright © 2017 IBRO. Published by Elsevier Ltd. All rights reserved.
Development in children's interpretation of pitch cues to emotions.

Science.gov (United States)

Quam, Carolyn; Swingley, Daniel

2012-01-01

Young infants respond to positive and negative speech prosody (A. Fernald, 1993), yet 4-year-olds rely on lexical information when it conflicts with paralinguistic cues to approval or disapproval (M. Friend, 2003). This article explores this surprising phenomenon, testing one hundred eighteen 2- to 5-year-olds' use of isolated pitch cues to emotions in interactive tasks. Only 4- to 5-year-olds consistently interpreted exaggerated, stereotypically happy or sad pitch contours as evidence that a puppet had succeeded or failed to find his toy (Experiment 1) or was happy or sad (Experiments 2, 3). Two- and 3-year-olds exploited facial and body-language cues in the same task. The authors discuss the implications of this late-developing use of pitch cues to emotions, relating them to other functions of pitch. © 2011 The Authors. Child Development © 2011 Society for Research in Child Development, Inc.
The Interaction between Prosody and Meaning in Second Language Speech Production

Science.gov (United States)

Jackson, Carrie N.; O'Brien, Mary Grantham

2011-01-01

Research has shown that English and German native speakers use prosodic cues during speech production to convey the intended meaning of an utterance. However, little is known about whether American L2 learners of German also use such cues during L2 production. The present study shows that inter-mediate-level L2 learners of German (English L1) use…
What Iconic Gesture Fragments Reveal about Gesture-Speech Integration: When Synchrony Is Lost, Memory Can Help

Science.gov (United States)

Obermeier, Christian; Holle, Henning; Gunter, Thomas C.

2011-01-01

The present series of experiments explores several issues related to gesture-speech integration and synchrony during sentence processing. To be able to more precisely manipulate gesture-speech synchrony, we used gesture fragments instead of complete gestures, thereby avoiding the usual long temporal overlap of gestures with their coexpressive…
Speech-specific audiovisual perception affects identification but not detection of speech

DEFF Research Database (Denmark)

Eskelund, Kasper; Andersen, Tobias

Speech perception is audiovisual as evidenced by the McGurk effect in which watching incongruent articulatory mouth movements can change the phonetic auditory speech percept. This type of audiovisual integration may be specific to speech or be applied to all stimuli in general. To investigate...... of audiovisual integration specific to speech perception. However, the results of Tuomainen et al. might have been influenced by another effect. When observers were naïve, they had little motivation to look at the face. When informed, they knew that the face was relevant for the task and this could increase...... visual detection task. In our first experiment, observers presented with congruent and incongruent audiovisual sine-wave speech stimuli did only show a McGurk effect when informed of the speech nature of the stimulus. Performance on the secondary visual task was very good, thus supporting the finding...

Comparison of different speech tasks among adults who stutter and adults who do not stutter

Directory of Open Access Journals (Sweden)

Ana Paula Ritto

2016-03-01

Full Text Available OBJECTIVES: In this study, we compared the performance of both fluent speakers and people who stutter in three different speaking situations: monologue speech, oral reading and choral reading. This study follows the assumption that the neuromotor control of speech can be influenced by external auditory stimuli in both speakers who stutter and speakers who do not stutter. METHOD: Seventeen adults who stutter and seventeen adults who do not stutter were assessed in three speaking tasks: monologue, oral reading (solo reading aloud and choral reading (reading in unison with the evaluator. Speech fluency and rate were measured for each task. RESULTS: The participants who stuttered had a lower frequency of stuttering during choral reading than during monologue and oral reading. CONCLUSIONS: According to the dual premotor system model, choral speech enhanced fluency by providing external cues for the timing of each syllable compensating for deficient internal cues.
Availability of binaural cues for pediatric bilateral cochlear implant recipients.

Science.gov (United States)

Sheffield, Sterling W; Haynes, David S; Wanna, George B; Labadie, Robert F; Gifford, René H

2015-03-01

Bilateral implant recipients theoretically have access to binaural cues. Research in postlingually deafened adults with cochlear implants (CIs) indicates minimal evidence for true binaural hearing. Congenitally deafened children who experience spatial hearing with bilateral CIs, however, might perceive binaural cues in the CI signal differently. There is limited research examining binaural hearing in children with CIs, and the few published studies are limited by the use of unrealistic speech stimuli and background noise. The purposes of this study were to (1) replicate our previous study of binaural hearing in postlingually deafened adults with AzBio sentences in prelingually deafened children with the pediatric version of the AzBio sentences, and (2) replicate previous studies of binaural hearing in children with CIs using more open-set sentences and more realistic background noise (i.e., multitalker babble). The study was a within-participant, repeated-measures design. The study sample consisted of 14 children with bilateral CIs with at least 25 mo of listening experience. Speech recognition was assessed using sentences presented in multitalker babble at a fixed signal-to-noise ratio. Test conditions included speech at 0° with noise presented at 0° (S0N0), on the side of the first CI (90° or 270°) (S0N1stCI), and on the side of the second CI (S0N2ndCI) as well as speech presented at 0° with noise presented semidiffusely from eight speakers at 45° intervals. Estimates of summation, head shadow, squelch, and spatial release from masking were calculated. Results of test conditions commonly reported in the literature (S0N0, S0N1stCI, S0N2ndCI) are consistent with results from previous research in adults and children with bilateral CIs, showing minimal summation and squelch but typical head shadow and spatial release from masking. However, bilateral benefit over the better CI with speech at 0° was much larger with semidiffuse noise. Congenitally deafened
Newborn infants' sensitivity to perceptual cues to lexical and grammatical words.

Science.gov (United States)

Shi, R; Werker, J F; Morgan, J L

1999-09-30

In our study newborn infants were presented with lists of lexical and grammatical words prepared from natural maternal speech. The results show that newborns are able to categorically discriminate these sets of words based on a constellation of perceptual cues that distinguish them. This general ability to detect and categorically discriminate sets of words on the basis of multiple acoustic and phonological cues may provide a perceptual base that can help older infants bootstrap into the acquisition of grammatical categories and syntactic structure.
The effect of a concurrent working memory task and temporal offsets on the integration of auditory and visual speech information.

Science.gov (United States)

Buchan, Julie N; Munhall, Kevin G

2012-01-01

Audiovisual speech perception is an everyday occurrence of multisensory integration. Conflicting visual speech information can influence the perception of acoustic speech (namely the McGurk effect), and auditory and visual speech are integrated over a rather wide range of temporal offsets. This research examined whether the addition of a concurrent cognitive load task would affect the audiovisual integration in a McGurk speech task and whether the cognitive load task would cause more interference at increasing offsets. The amount of integration was measured by the proportion of responses in incongruent trials that did not correspond to the audio (McGurk response). An eye-tracker was also used to examine whether the amount of temporal offset and the presence of a concurrent cognitive load task would influence gaze behavior. Results from this experiment show a very modest but statistically significant decrease in the number of McGurk responses when subjects also perform a cognitive load task, and that this effect is relatively constant across the various temporal offsets. Participant's gaze behavior was also influenced by the addition of a cognitive load task. Gaze was less centralized on the face, less time was spent looking at the mouth and more time was spent looking at the eyes, when a concurrent cognitive load task was added to the speech task.
Perceptual stimulus-A Bayesian-based integration of multi-visual-cue approach and its application

Institute of Scientific and Technical Information of China (English)

XUE JianRu; ZHENG NanNing; ZHONG XiaoPin; PING LinJiang

2008-01-01

With the view that visual cue could be taken as a kind of stimulus, the study of the mechanism in the visual perception process by using visual cues in their probabilistic representation eventually leads to a class of statistical integration of multiple visual cues (IMVC) methods which have been applied widely in perceptual grouping, video analysis, and other basic problems in computer vision. In this paper, a survey on the basic ideas and recent advances of IMVC methods is presented, and much focus is on the models and algorithms of IMVC for video analysis within the framework of Bayesian estimation. Furthermore, two typical problems in video analysis, robust visual tracking and "switching problem" in multi-target tracking (MTT) are taken as test beds to verify a series of Bayesian-based IMVC methods proposed by the authors. Furthermore, the relations between the statistical IMVC and the visual per-ception process, as well as potential future research work for IMVC, are discussed.
Integration of multiple cues allows threat-sensitive anti-intraguild predator responses in predatory mites

Science.gov (United States)

Walzer, Andreas; Schausberger, Peter

2013-01-01

Intraguild (IG) prey is commonly confronted with multiple IG predator species. However, the IG predation (IGP) risk for prey is not only dependent on the predator species, but also on inherent (intraspecific) characteristics of a given IG predator such as its life-stage, sex or gravidity and the associated prey needs. Thus, IG prey should have evolved the ability to integrate multiple IG predator cues, which should allow both inter- and intraspecific threat-sensitive anti-predator responses. Using a guild of plant-inhabiting predatory mites sharing spider mites as prey, we evaluated the effects of single and combined cues (eggs and/or chemical traces left by a predator female on the substrate) of the low risk IG predator Neoseiulus californicus and the high risk IG predator Amblyseius andersoni on time, distance and path shape parameters of the larval IG prey Phytoseiulus persimilis. IG prey discriminated between traces of the low and high risk IG predator, with and without additional presence of their eggs, indicating interspecific threat-sensitivity. The behavioural changes were manifest in distance moved, activity and path shape of IG prey. The cue combination of traces and eggs of the IG predators conveyed other information than each cue alone, allowing intraspecific threat-sensitive responses by IG prey apparent in changed velocities and distances moved. We argue that graded responses to single and combined IG predator cues are adaptive due to minimization of acceptance errors in IG prey decision making. PMID:23750040
Audio-visual onset differences are used to determine syllable identity for ambiguous audio-visual stimulus pairs.

Science.gov (United States)

Ten Oever, Sanne; Sack, Alexander T; Wheat, Katherine L; Bien, Nina; van Atteveldt, Nienke

2013-01-01

Content and temporal cues have been shown to interact during audio-visual (AV) speech identification. Typically, the most reliable unimodal cue is used more strongly to identify specific speech features; however, visual cues are only used if the AV stimuli are presented within a certain temporal window of integration (TWI). This suggests that temporal cues denote whether unimodal stimuli belong together, that is, whether they should be integrated. It is not known whether temporal cues also provide information about the identity of a syllable. Since spoken syllables have naturally varying AV onset asynchronies, we hypothesize that for suboptimal AV cues presented within the TWI, information about the natural AV onset differences can aid in speech identification. To test this, we presented low-intensity auditory syllables concurrently with visual speech signals, and varied the stimulus onset asynchronies (SOA) of the AV pair, while participants were instructed to identify the auditory syllables. We revealed that specific speech features (e.g., voicing) were identified by relying primarily on one modality (e.g., auditory). Additionally, we showed a wide window in which visual information influenced auditory perception, that seemed even wider for congruent stimulus pairs. Finally, we found a specific response pattern across the SOA range for syllables that were not reliably identified by the unimodal cues, which we explained as the result of the use of natural onset differences between AV speech signals. This indicates that temporal cues not only provide information about the temporal integration of AV stimuli, but additionally convey information about the identity of AV pairs. These results provide a detailed behavioral basis for further neuro-imaging and stimulation studies to unravel the neurofunctional mechanisms of the audio-visual-temporal interplay within speech perception.
Specialization in audiovisual speech perception: a replication study

DEFF Research Database (Denmark)

Eskelund, Kasper; Andersen, Tobias

Speech perception is audiovisual as evidenced by bimodal integration in the McGurk effect. This integration effect may be specific to speech or be applied to all stimuli in general. To investigate this, Tuomainen et al. (2005) used sine-wave speech, which naÃ¯ve observers may perceive as non......-speech, but hear as speech once informed of the linguistic origin of the signal. Combinations of sine-wave speech and incongruent video of the talker elicited a McGurk effect only for informed observers. This indicates that the audiovisual integration effect is specific to speech perception. However, observers...... that observers did look near the mouth. We conclude that eye-movements did not influence the results of Tuomainen et al. and that their results thus can be taken as evidence of a speech specific mode of audiovisual integration underlying the McGurk illusion....
Multisensory Integration in Cochlear Implant Recipients.

Science.gov (United States)

Stevenson, Ryan A; Sheffield, Sterling W; Butera, Iliza M; Gifford, René H; Wallace, Mark T

Speech perception is inherently a multisensory process involving integration of auditory and visual cues. Multisensory integration in cochlear implant (CI) recipients is a unique circumstance in that the integration occurs after auditory deprivation and the provision of hearing via the CI. Despite the clear importance of multisensory cues for perception, in general, and for speech intelligibility, specifically, the topic of multisensory perceptual benefits in CI users has only recently begun to emerge as an area of inquiry. We review the research that has been conducted on multisensory integration in CI users to date and suggest a number of areas needing further research. The overall pattern of results indicates that many CI recipients show at least some perceptual gain that can be attributable to multisensory integration. The extent of this gain, however, varies based on a number of factors, including age of implantation and specific task being assessed (e.g., stimulus detection, phoneme perception, word recognition). Although both children and adults with CIs obtain audiovisual benefits for phoneme, word, and sentence stimuli, neither group shows demonstrable gain for suprasegmental feature perception. Additionally, only early-implanted children and the highest performing adults obtain audiovisual integration benefits similar to individuals with normal hearing. Increasing age of implantation in children is associated with poorer gains resultant from audiovisual integration, suggesting a sensitive period in development for the brain networks that subserve these integrative functions, as well as length of auditory experience. This finding highlights the need for early detection of and intervention for hearing loss, not only in terms of auditory perception, but also in terms of the behavioral and perceptual benefits of audiovisual processing. Importantly, patterns of auditory, visual, and audiovisual responses suggest that underlying integrative processes may be
Audiovisual Asynchrony Detection in Human Speech

Science.gov (United States)

Maier, Joost X.; Di Luca, Massimiliano; Noppeney, Uta

2011-01-01

Combining information from the visual and auditory senses can greatly enhance intelligibility of natural speech. Integration of audiovisual speech signals is robust even when temporal offsets are present between the component signals. In the present study, we characterized the temporal integration window for speech and nonspeech stimuli with…
Contribution of Binaural Masking Release to Improved Speech Intelligibility for different Masker types.

Science.gov (United States)

Sutojo, Sarinah; van de Par, Steven; Schoenmaker, Esther

2018-06-01

In situations with competing talkers or in the presence of masking noise, speech intelligibility can be improved by spatially separating the target speaker from the interferers. This advantage is generally referred to as spatial release from masking (SRM) and different mechanisms have been suggested to explain it. One proposed mechanism to benefit from spatial cues is the binaural masking release, which is purely stimulus driven. According to this mechanism, the spatial benefit results from differences in the binaural cues of target and masker, which need to appear simultaneously in time and frequency to improve the signal detection. In an alternative proposed mechanism, the differences in the interaural cues improve the segregation of auditory streams, a process, which involves top-down processing rather than being purely stimulus driven. Other than the cues that produce binaural masking release, the interaural cue differences between target and interferer required to improve stream segregation do not have to appear simultaneously in time and frequency. This study is concerned with the contribution of binaural masking release to SRM for three masker types that differ with respect to the amount of energetic masking they exert. Speech intelligibility was measured, employing a stimulus manipulation that inhibits binaural masking release, and analyzed with a metric to account for the number of better-ear glimpses. Results indicate that the contribution of the stimulus-driven binaural masking release plays a minor role while binaural stream segregation and the availability of glimpses in the better ear had a stronger influence on improving the speech intelligibility. This article is protected by copyright. All rights reserved. This article is protected by copyright. All rights reserved.
Awareness of rhythm patterns in speech and music in children with specific language impairments

Directory of Open Access Journals (Sweden)

Ruth eCumming

2015-12-01

Full Text Available Children with specific language impairments (SLIs show impaired perception and production of language, and also show impairments in perceiving auditory cues to rhythm (amplitude rise time [ART] and sound duration and in tapping to a rhythmic beat. Here we explore potential links between language development and rhythm perception in 45 children with SLI and 50 age-matched controls. We administered three rhythmic tasks, a musical beat detection task, a tapping-to-music task, and a novel music/speech task, which varied rhythm and pitch cues independently or together in both speech and music. Via low-pass filtering, the music sounded as though it was played from a low-quality radio and the speech sounded as though it was muffled (heard behind the door. We report data for all of the SLI children (N = 45, IQ varying, as well as for two independent subgroupings with intact IQ. One subgroup, Pure SLI, had intact phonology and reading (N=16, the other, SLI PPR (N=15, had impaired phonology and reading. When IQ varied (all SLI children, we found significant group differences in all the rhythmic tasks. For the Pure SLI group, there were rhythmic impairments in the tapping task only. For children with SLI and poor phonology (SLI PPR, group differences were found in all of the filtered speech/music AXB tasks. We conclude that difficulties with rhythmic cues in both speech and music are present in children with SLIs, but that some rhythmic measures are more sensitive than others. The data are interpreted within a ‘prosodic phrasing’ hypothesis, and we discuss the potential utility of rhythmic and musical interventions in remediating speech and language difficulties in children.
Using Zebra-speech to study sequential and simultaneous speech segregation in a cochlear-implant simulation.

Science.gov (United States)

Gaudrain, Etienne; Carlyon, Robert P

2013-01-01

Previous studies have suggested that cochlear implant users may have particular difficulties exploiting opportunities to glimpse clear segments of a target speech signal in the presence of a fluctuating masker. Although it has been proposed that this difficulty is associated with a deficit in linking the glimpsed segments across time, the details of this mechanism are yet to be explained. The present study introduces a method called Zebra-speech developed to investigate the relative contribution of simultaneous and sequential segregation mechanisms in concurrent speech perception, using a noise-band vocoder to simulate cochlear implants. One experiment showed that the saliency of the difference between the target and the masker is a key factor for Zebra-speech perception, as it is for sequential segregation. Furthermore, forward masking played little or no role, confirming that intelligibility was not limited by energetic masking but by across-time linkage abilities. In another experiment, a binaural cue was used to distinguish the target and the masker. It showed that the relative contribution of simultaneous and sequential segregation depended on the spectral resolution, with listeners relying more on sequential segregation when the spectral resolution was reduced. The potential of Zebra-speech as a segregation enhancement strategy for cochlear implants is discussed.
Zebra finches can use positional and transitional cues to distinguish vocal element strings.

Science.gov (United States)

Chen, Jiani; Ten Cate, Carel

2015-08-01

Learning sequences is of great importance to humans and non-human animals. Many motor and mental actions, such as singing in birds and speech processing in humans, rely on sequential learning. At least two mechanisms are considered to be involved in such learning. The chaining theory proposes that learning of sequences relies on memorizing the transitions between adjacent items, while the positional theory suggests that learners encode the items according to their ordinal position in the sequence. Positional learning is assumed to dominate sequential learning. However, human infants exposed to a string of speech sounds can learn transitional (chaining) cues. So far, it is not clear whether birds, an increasingly important model for examining vocal processing, can do this. In this study we use a Go-Nogo design to examine whether zebra finches can use transitional cues to distinguish artificially constructed strings of song elements. Zebra finches were trained with sequences differing in transitional and positional information and next tested with novel strings sharing positional and transitional similarities with the training strings. The results show that they can attend to both transitional and positional cues and that their sequential coding strategies can be biased toward transitional cues depending on the learning context. This article is part of a Special Issue entitled: In Honor of Jerry Hogan. Copyright © 2014 Elsevier B.V. All rights reserved.
The role of spectral and temporal cues in voice gender discrimination by normal-hearing listeners and cochlear implant users.

Science.gov (United States)

Fu, Qian-Jie; Chinchilla, Sherol; Galvin, John J

2004-09-01

The present study investigated the relative importance of temporal and spectral cues in voice gender discrimination and vowel recognition by normal-hearing subjects listening to an acoustic simulation of cochlear implant speech processing and by cochlear implant users. In the simulation, the number of speech processing channels ranged from 4 to 32, thereby varying the spectral resolution; the cutoff frequencies of the channels' envelope filters ranged from 20 to 320 Hz, thereby manipulating the available temporal cues. For normal-hearing subjects, results showed that both voice gender discrimination and vowel recognition scores improved as the number of spectral channels was increased. When only 4 spectral channels were available, voice gender discrimination significantly improved as the envelope filter cutoff frequency was increased from 20 to 320 Hz. For all spectral conditions, increasing the amount of temporal information had no significant effect on vowel recognition. Both voice gender discrimination and vowel recognition scores were highly variable among implant users. The performance of cochlear implant listeners was similar to that of normal-hearing subjects listening to comparable speech processing (4-8 spectral channels). The results suggest that both spectral and temporal cues contribute to voice gender discrimination and that temporal cues are especially important for cochlear implant users to identify the voice gender when there is reduced spectral resolution.
Sensory integration dysfunction affects efficacy of speech therapy on children with functional articulation disorders

Directory of Open Access Journals (Sweden)

Tung LC

2013-01-01

Full Text Available Li-Chen Tung,1,# Chin-Kai Lin,2,# Ching-Lin Hsieh,3,4 Ching-Chi Chen,1 Chin-Tsan Huang,1 Chun-Hou Wang5,6 1Department of Physical Medicine and Rehabilitation, Chi Mei Medical Center, Tainan, 2Program of Early Intervention, Department of Early Childhood Education, National Taichung University of Education, Taichung, 3School of Occupational Therapy, College of Medicine, National Taiwan University, Taipei, 4Department of Physical Medicine and Rehabilitation, National Taiwan University Hospital, Taipei, 5School of Physical Therapy, College of Medical Science and Technology, Chung Shan Medical University, Taichung, 6Physical Therapy Room, Chung Shan Medical University Hospital, Taichung, Taiwan#These authors contributed equally Background: Articulation disorders in young children are due to defects occurring at a certain stage in sensory and motor development. Some children with functional articulation disorders may also have sensory integration dysfunction (SID. We hypothesized that speech therapy would be less efficacious in children with SID than in those without SID. Hence, the purpose of this study was to compare the efficacy of speech therapy in two groups of children with functional articulation disorders: those without and those with SID.Method: A total of 30 young children with functional articulation disorders were divided into two groups, the no-SID group (15 children and the SID group (15 children. The number of pronunciation mistakes was evaluated before and after speech therapy.Results: There were no statistically significant differences in age, sex, sibling order, education of parents, and pretest number of mistakes in pronunciation between the two groups (P > 0.05. The mean and standard deviation in the pre- and posttest number of mistakes in pronunciation were 10.5 ± 3.2 and 3.3 ± 3.3 in the no-SID group, and 10.1 ± 2.9 and 6.9 ± 3.5 in the SID group, respectively. Results showed great changes after speech therapy treatment (F
Immediate integration of prosodic information from speech and visual information from pictures in the absence of focused attention: a mismatch negativity study.

Science.gov (United States)

Li, X; Yang, Y; Ren, G

2009-06-16

Language is often perceived together with visual information. Recent experimental evidences indicated that, during spoken language comprehension, the brain can immediately integrate visual information with semantic or syntactic information from speech. Here we used the mismatch negativity to further investigate whether prosodic information from speech could be immediately integrated into a visual scene context or not, and especially the time course and automaticity of this integration process. Sixteen Chinese native speakers participated in the study. The materials included Chinese spoken sentences and picture pairs. In the audiovisual situation, relative to the concomitant pictures, the spoken sentence was appropriately accented in the standard stimuli, but inappropriately accented in the two kinds of deviant stimuli. In the purely auditory situation, the speech sentences were presented without pictures. It was found that the deviants evoked mismatch responses in both audiovisual and purely auditory situations; the mismatch negativity in the purely auditory situation peaked at the same time as, but was weaker than that evoked by the same deviant speech sounds in the audiovisual situation. This pattern of results suggested immediate integration of prosodic information from speech and visual information from pictures in the absence of focused attention.
Electrophysiological and hemodynamic mismatch responses in rats listening to human speech syllables.

Directory of Open Access Journals (Sweden)

Mahdi Mahmoudzadeh

Full Text Available Speech is a complex auditory stimulus which is processed according to several time-scales. Whereas consonant discrimination is required to resolve rapid acoustic events, voice perception relies on slower cues. Humans, right from preterm ages, are particularly efficient to encode temporal cues. To compare the capacities of preterms to those observed in other mammals, we tested anesthetized adult rats by using exactly the same paradigm as that used in preterm neonates. We simultaneously recorded neural (using ECoG and hemodynamic responses (using fNIRS to series of human speech syllables and investigated the brain response to a change of consonant (ba vs. ga and to a change of voice (male vs. female. Both methods revealed concordant results, although ECoG measures were more sensitive than fNIRS. Responses to syllables were bilateral, but with marked right-hemispheric lateralization. Responses to voice changes were observed with both methods, while only ECoG was sensitive to consonant changes. These results suggest that rats more effectively processed the speech envelope than fine temporal cues in contrast with human preterm neonates, in whom the opposite effects were observed. Cross-species comparisons constitute a very valuable tool to define the singularities of the human brain and species-specific bias that may help human infants to learn their native language.
Temporal predictive mechanisms modulate motor reaction time during initiation and inhibition of speech and hand movement.

Science.gov (United States)

Johari, Karim; Behroozmand, Roozbeh

2017-08-01

Skilled movement is mediated by motor commands executed with extremely fine temporal precision. The question of how the brain incorporates temporal information to perform motor actions has remained unanswered. This study investigated the effect of stimulus temporal predictability on response timing of speech and hand movement. Subjects performed a randomized vowel vocalization or button press task in two counterbalanced blocks in response to temporally-predictable and unpredictable visual cues. Results indicated that speech and hand reaction time was decreased for predictable compared with unpredictable stimuli. This finding suggests that a temporal predictive code is established to capture temporal dynamics of sensory cues in order to produce faster movements in responses to predictable stimuli. In addition, results revealed a main effect of modality, indicating faster hand movement compared with speech. We suggest that this effect is accounted for by the inherent complexity of speech production compared with hand movement. Lastly, we found that movement inhibition was faster than initiation for both hand and speech, suggesting that movement initiation requires a longer processing time to coordinate activities across multiple regions in the brain. These findings provide new insights into the mechanisms of temporal information processing during initiation and inhibition of speech and hand movement. Copyright © 2017 Elsevier B.V. All rights reserved.
The role of visual spatial attention in audiovisual speech perception

DEFF Research Database (Denmark)

Andersen, Tobias; Tiippana, K.; Laarni, J.

2009-01-01

Auditory and visual information is integrated when perceiving speech, as evidenced by the McGurk effect in which viewing an incongruent talking face categorically alters auditory speech perception. Audiovisual integration in speech perception has long been considered automatic and pre-attentive b......Auditory and visual information is integrated when perceiving speech, as evidenced by the McGurk effect in which viewing an incongruent talking face categorically alters auditory speech perception. Audiovisual integration in speech perception has long been considered automatic and pre...... from each of the faces and from the voice on the auditory speech percept. We found that directing visual spatial attention towards a face increased the influence of that face on auditory perception. However, the influence of the voice on auditory perception did not change suggesting that audiovisual...... integration did not change. Visual spatial attention was also able to select between the faces when lip reading. This suggests that visual spatial attention acts at the level of visual speech perception prior to audiovisual integration and that the effect propagates through audiovisual integration...

Sound of mind : electrophysiological and behavioural evidence for the role of context, variation and informativity in human speech processing

NARCIS (Netherlands)

Nixon, Jessie Sophia

2014-01-01

Spoken communication involves transmission of a message which takes physical form in acoustic waves. Within any given language, acoustic cues pattern in language-specific ways along language-specific acoustic dimensions to create speech sound contrasts. These cues are utilized by listeners to
Noise and pitch interact during the cortical segregation of concurrent speech.

Science.gov (United States)

Bidelman, Gavin M; Yellamsetty, Anusha

2017-08-01

Behavioral studies reveal listeners exploit intrinsic differences in voice fundamental frequency (F0) to segregate concurrent speech sounds-the so-called "F0-benefit." More favorable signal-to-noise ratio (SNR) in the environment, an extrinsic acoustic factor, similarly benefits the parsing of simultaneous speech. Here, we examined the neurobiological substrates of these two cues in the perceptual segregation of concurrent speech mixtures. We recorded event-related brain potentials (ERPs) while listeners performed a speeded double-vowel identification task. Listeners heard two concurrent vowels whose F0 differed by zero or four semitones presented in either clean (no noise) or noise-degraded (+5 dB SNR) conditions. Behaviorally, listeners were more accurate in correctly identifying both vowels for larger F0 separations but F0-benefit was more pronounced at more favorable SNRs (i.e., pitch × SNR interaction). Analysis of the ERPs revealed that only the P2 wave (∼200 ms) showed a similar F0 x SNR interaction as behavior and was correlated with listeners' perceptual F0-benefit. Neural classifiers applied to the ERPs further suggested that speech sounds are segregated neurally within 200 ms based on SNR whereas segregation based on pitch occurs later in time (400-700 ms). The earlier timing of extrinsic SNR compared to intrinsic F0-based segregation implies that the cortical extraction of speech from noise is more efficient than differentiating speech based on pitch cues alone, which may recruit additional cortical processes. Findings indicate that noise and pitch differences interact relatively early in cerebral cortex and that the brain arrives at the identities of concurrent speech mixtures as early as ∼200 ms. Copyright © 2017 Elsevier B.V. All rights reserved.
Rhythm Perception and Its Role in Perception and Learning of Dysrhythmic Speech.

Science.gov (United States)

Borrie, Stephanie A; Lansford, Kaitlin L; Barrett, Tyson S

2017-03-01

The perception of rhythm cues plays an important role in recognizing spoken language, especially in adverse listening conditions. Indeed, this has been shown to hold true even when the rhythm cues themselves are dysrhythmic. This study investigates whether expertise in rhythm perception provides a processing advantage for perception (initial intelligibility) and learning (intelligibility improvement) of naturally dysrhythmic speech, dysarthria. Fifty young adults with typical hearing participated in 3 key tests, including a rhythm perception test, a receptive vocabulary test, and a speech perception and learning test, with standard pretest, familiarization, and posttest phases. Initial intelligibility scores were calculated as the proportion of correct pretest words, while intelligibility improvement scores were calculated by subtracting this proportion from the proportion of correct posttest words. Rhythm perception scores predicted intelligibility improvement scores but not initial intelligibility. On the other hand, receptive vocabulary scores predicted initial intelligibility scores but not intelligibility improvement. Expertise in rhythm perception appears to provide an advantage for processing dysrhythmic speech, but a familiarization experience is required for the advantage to be realized. Findings are discussed in relation to the role of rhythm in speech processing and shed light on processing models that consider the consequence of rhythm abnormalities in dysarthria.
SPEECH VISUALIZATION SISTEM AS A BASIS FOR SPEECH TRAINING AND COMMUNICATION AIDS

Directory of Open Access Journals (Sweden)

Oliana KRSTEVA

1997-09-01

Full Text Available One receives much more information through a visual sense than through a tactile one. However, most visual aids for hearing-impaired persons are not wearable because it is difficult to make them compact and it is not a best way to mask always their vision.Generally it is difficult to get the integrated patterns by a single mathematical transform of signals, such as a Foruier transform. In order to obtain the integrated pattern speech parameters should be carefully extracted by an analysis according as each parameter, and a visual pattern, which can intuitively be understood by anyone, must be synthesized from them. Successful integration of speech parameters will never disturb understanding of individual features, so that the system can be used for speech training and communication.
Internet Video Telephony Allows Speech Reading by Deaf Individuals and Improves Speech Perception by Cochlear Implant Users

Science.gov (United States)

Mantokoudis, Georgios; Dähler, Claudia; Dubach, Patrick; Kompis, Martin; Caversaccio, Marco D.; Senn, Pascal

2013-01-01

Objective To analyze speech reading through Internet video calls by profoundly hearing-impaired individuals and cochlear implant (CI) users. Methods Speech reading skills of 14 deaf adults and 21 CI users were assessed using the Hochmair Schulz Moser (HSM) sentence test. We presented video simulations using different video resolutions (1280×720, 640×480, 320×240, 160×120 px), frame rates (30, 20, 10, 7, 5 frames per second (fps)), speech velocities (three different speakers), webcameras (Logitech Pro9000, C600 and C500) and image/sound delays (0–500 ms). All video simulations were presented with and without sound and in two screen sizes. Additionally, scores for live Skype™ video connection and live face-to-face communication were assessed. Results Higher frame rate (>7 fps), higher camera resolution (>640×480 px) and shorter picture/sound delay (<100 ms) were associated with increased speech perception scores. Scores were strongly dependent on the speaker but were not influenced by physical properties of the camera optics or the full screen mode. There is a significant median gain of +8.5%pts (p = 0.009) in speech perception for all 21 CI-users if visual cues are additionally shown. CI users with poor open set speech perception scores (n = 11) showed the greatest benefit under combined audio-visual presentation (median speech perception +11.8%pts, p = 0.032). Conclusion Webcameras have the potential to improve telecommunication of hearing-impaired individuals. PMID:23359119
Internet video telephony allows speech reading by deaf individuals and improves speech perception by cochlear implant users.

Directory of Open Access Journals (Sweden)

Georgios Mantokoudis

Full Text Available OBJECTIVE: To analyze speech reading through Internet video calls by profoundly hearing-impaired individuals and cochlear implant (CI users. METHODS: Speech reading skills of 14 deaf adults and 21 CI users were assessed using the Hochmair Schulz Moser (HSM sentence test. We presented video simulations using different video resolutions (1280 × 720, 640 × 480, 320 × 240, 160 × 120 px, frame rates (30, 20, 10, 7, 5 frames per second (fps, speech velocities (three different speakers, webcameras (Logitech Pro9000, C600 and C500 and image/sound delays (0-500 ms. All video simulations were presented with and without sound and in two screen sizes. Additionally, scores for live Skype™ video connection and live face-to-face communication were assessed. RESULTS: Higher frame rate (>7 fps, higher camera resolution (>640 × 480 px and shorter picture/sound delay (<100 ms were associated with increased speech perception scores. Scores were strongly dependent on the speaker but were not influenced by physical properties of the camera optics or the full screen mode. There is a significant median gain of +8.5%pts (p = 0.009 in speech perception for all 21 CI-users if visual cues are additionally shown. CI users with poor open set speech perception scores (n = 11 showed the greatest benefit under combined audio-visual presentation (median speech perception +11.8%pts, p = 0.032. CONCLUSION: Webcameras have the potential to improve telecommunication of hearing-impaired individuals.
Relative cue encoding in the context of sophisticated models of categorization: Separating information from categorization.

Science.gov (United States)

Apfelbaum, Keith S; McMurray, Bob

2015-08-01

Traditional studies of human categorization often treat the processes of encoding features and cues as peripheral to the question of how stimuli are categorized. However, in domains where the features and cues are less transparent, how information is encoded prior to categorization may constrain our understanding of the architecture of categorization. This is particularly true in speech perception, where acoustic cues to phonological categories are ambiguous and influenced by multiple factors. Here, it is crucial to consider the joint contributions of the information in the input and the categorization architecture. We contrasted accounts that argue for raw acoustic information encoding with accounts that posit that cues are encoded relative to expectations, and investigated how two categorization architectures-exemplar models and back-propagation parallel distributed processing models-deal with each kind of information. Relative encoding, akin to predictive coding, is a form of noise reduction, so it can be expected to improve model accuracy; however, like predictive coding, the use of relative encoding in speech perception by humans is controversial, so results are compared to patterns of human performance, rather than on the basis of overall accuracy. We found that, for both classes of models, in the vast majority of parameter settings, relative cues greatly helped the models approximate human performance. This suggests that expectation-relative processing is a crucial precursor step in phoneme categorization, and that understanding the information content is essential to understanding categorization processes.
Integration of Distinct Objects in Visual Working Memory Depends on Strong Objecthood Cues Even for Different-Dimension Conjunctions.

Science.gov (United States)

Balaban, Halely; Luria, Roy

2016-05-01

What makes an integrated object in visual working memory (WM)? Past evidence suggested that WM holds all features of multidimensional objects together, but struggles to integrate color-color conjunctions. This difficulty was previously attributed to a challenge in same-dimension integration, but here we argue that it arises from the integration of 2 distinct objects. To test this, we examined the integration of distinct different-dimension features (a colored square and a tilted bar). We monitored the contralateral delay activity, an event-related potential component sensitive to the number of objects in WM. The results indicated that color and orientation belonging to distinct objects in a shared location were not integrated in WM (Experiment 1), even following a common fate Gestalt cue (Experiment 2). These conjunctions were better integrated in a less demanding task (Experiment 3), and in the original WM task, but with a less individuating version of the original stimuli (Experiment 4). Our results identify the critical factor in WM integration at same- versus separate-objects, rather than at same- versus different-dimensions. Compared with the perfect integration of an object's features, the integration of several objects is demanding, and depends on an interaction between the grouping cues and task demands, among other factors. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Functional connectivity between face-movement and speech-intelligibility areas during auditory-only speech perception.

Science.gov (United States)

Schall, Sonja; von Kriegstein, Katharina

2014-01-01

It has been proposed that internal simulation of the talking face of visually-known speakers facilitates auditory speech recognition. One prediction of this view is that brain areas involved in auditory-only speech comprehension interact with visual face-movement sensitive areas, even under auditory-only listening conditions. Here, we test this hypothesis using connectivity analyses of functional magnetic resonance imaging (fMRI) data. Participants (17 normal participants, 17 developmental prosopagnosics) first learned six speakers via brief voice-face or voice-occupation training (comprehension. Overall, the present findings indicate that learned visual information is integrated into the analysis of auditory-only speech and that this integration results from the interaction of task-relevant face-movement and auditory speech-sensitive areas.
Human phoneme recognition depending on speech-intrinsic variability.

Science.gov (United States)

Meyer, Bernd T; Jürgens, Tim; Wesker, Thorsten; Brand, Thomas; Kollmeier, Birger

2010-11-01

The influence of different sources of speech-intrinsic variation (speaking rate, effort, style and dialect or accent) on human speech perception was investigated. In listening experiments with 16 listeners, confusions of consonant-vowel-consonant (CVC) and vowel-consonant-vowel (VCV) sounds in speech-weighted noise were analyzed. Experiments were based on the OLLO logatome speech database, which was designed for a man-machine comparison. It contains utterances spoken by 50 speakers from five dialect/accent regions and covers several intrinsic variations. By comparing results depending on intrinsic and extrinsic variations (i.e., different levels of masking noise), the degradation induced by variabilities can be expressed in terms of the SNR. The spectral level distance between the respective speech segment and the long-term spectrum of the masking noise was found to be a good predictor for recognition rates, while phoneme confusions were influenced by the distance to spectrally close phonemes. An analysis based on transmitted information of articulatory features showed that voicing and manner of articulation are comparatively robust cues in the presence of intrinsic variations, whereas the coding of place is more degraded. The database and detailed results have been made available for comparisons between human speech recognition (HSR) and automatic speech recognizers (ASR).
Do We Perceive Others Better than Ourselves? A Perceptual Benefit for Noise-Vocoded Speech Produced by an Average Speaker.

Directory of Open Access Journals (Sweden)

William L Schuerman

Full Text Available In different tasks involving action perception, performance has been found to be facilitated when the presented stimuli were produced by the participants themselves rather than by another participant. These results suggest that the same mental representations are accessed during both production and perception. However, with regard to spoken word perception, evidence also suggests that listeners' representations for speech reflect the input from their surrounding linguistic community rather than their own idiosyncratic productions. Furthermore, speech perception is heavily influenced by indexical cues that may lead listeners to frame their interpretations of incoming speech signals with regard to speaker identity. In order to determine whether word recognition evinces similar self-advantages as found in action perception, it was necessary to eliminate indexical cues from the speech signal. We therefore asked participants to identify noise-vocoded versions of Dutch words that were based on either their own recordings or those of a statistically average speaker. The majority of participants were more accurate for the average speaker than for themselves, even after taking into account differences in intelligibility. These results suggest that the speech representations accessed during perception of noise-vocoded speech are more reflective of the input of the speech community, and hence that speech perception is not necessarily based on representations of one's own speech.
Individual differences in speech-in-noise perception parallel neural speech processing and attention in preschoolers

Science.gov (United States)

Thompson, Elaine C.; Carr, Kali Woodruff; White-Schwoch, Travis; Otto-Meyer, Sebastian; Kraus, Nina

2016-01-01

From bustling classrooms to unruly lunchrooms, school settings are noisy. To learn effectively in the unwelcome company of numerous distractions, children must clearly perceive speech in noise. In older children and adults, speech-in-noise perception is supported by sensory and cognitive processes, but the correlates underlying this critical listening skill in young children (3–5 year olds) remain undetermined. Employing a longitudinal design (two evaluations separated by ~12 months), we followed a cohort of 59 preschoolers, ages 3.0–4.9, assessing word-in-noise perception, cognitive abilities (intelligence, short-term memory, attention), and neural responses to speech. Results reveal changes in word-in-noise perception parallel changes in processing of the fundamental frequency (F0), an acoustic cue known for playing a role central to speaker identification and auditory scene analysis. Four unique developmental trajectories (speech-in-noise perception groups) confirm this relationship, in that improvements and declines in word-in-noise perception couple with enhancements and diminishments of F0 encoding, respectively. Improvements in word-in-noise perception also pair with gains in attention. Word-in-noise perception does not relate to strength of neural harmonic representation or short-term memory. These findings reinforce previously-reported roles of F0 and attention in hearing speech in noise in older children and adults, and extend this relationship to preschool children. PMID:27864051
Do gender differences in audio-visual benefit and visual influence in audio-visual speech perception emerge with age?

Directory of Open Access Journals (Sweden)

Magnus eAlm

2015-07-01

Full Text Available Gender and age have been found to affect adults’ audio-visual (AV speech perception. However, research on adult aging focuses on adults over 60 years, who have an increasing likelihood for cognitive and sensory decline, which may confound positive effects of age-related AV-experience and its interaction with gender. Observed age and gender differences in AV speech perception may also depend on measurement sensitivity and AV task difficulty. Consequently both AV benefit and visual influence were used to measure visual contribution for gender-balanced groups of young (20-30 years and middle-aged adults (50-60 years with task difficulty varied using AV syllables from different talkers in alternative auditory backgrounds. Females had better speech-reading performance than males. Whereas no gender differences in AV benefit or visual influence were observed for young adults, visually influenced responses were significantly greater for middle-aged females than middle-aged males. That speech-reading performance did not influence AV benefit may be explained by visual speech extraction and AV integration constituting independent abilities. Contrastingly, the gender difference in visually influenced responses in middle adulthood may reflect an experience-related shift in females’ general AV perceptual strategy. Although young females’ speech-reading proficiency may not readily contribute to greater visual influence, between young and middle-adulthood recurrent confirmation of the contribution of visual cues induced by speech-reading proficiency may gradually shift females AV perceptual strategy towards more visually dominated responses.
Some Behavioral and Neurobiological Constraints on Theories of Audiovisual Speech Integration: A Review and Suggestions for New Directions

Science.gov (United States)

Altieri, Nicholas; Pisoni, David B.; Townsend, James T.

2012-01-01

Summerfield (1987) proposed several accounts of audiovisual speech perception, a field of research that has burgeoned in recent years. The proposed accounts included the integration of discrete phonetic features, vectors describing the values of independent acoustical and optical parameters, the filter function of the vocal tract, and articulatory dynamics of the vocal tract. The latter two accounts assume that the representations of audiovisual speech perception are based on abstract gestures, while the former two assume that the representations consist of symbolic or featural information obtained from visual and auditory modalities. Recent converging evidence from several different disciplines reveals that the general framework of Summerfield’s feature-based theories should be expanded. An updated framework building upon the feature-based theories is presented. We propose a processing model arguing that auditory and visual brain circuits provide facilitatory information when the inputs are correctly timed, and that auditory and visual speech representations do not necessarily undergo translation into a common code during information processing. Future research on multisensory processing in speech perception should investigate the connections between auditory and visual brain regions, and utilize dynamic modeling tools to further understand the timing and information processing mechanisms involved in audiovisual speech integration. PMID:21968081
Behavioral Signal Processing: Deriving Human Behavioral Informatics From Speech and Language

Science.gov (United States)

Narayanan, Shrikanth; Georgiou, Panayiotis G.

2013-01-01

The expression and experience of human behavior are complex and multimodal and characterized by individual and contextual heterogeneity and variability. Speech and spoken language communication cues offer an important means for measuring and modeling human behavior. Observational research and practice across a variety of domains from commerce to healthcare rely on speech- and language-based informatics for crucial assessment and diagnostic information and for planning and tracking response to an intervention. In this paper, we describe some of the opportunities as well as emerging methodologies and applications of human behavioral signal processing (BSP) technology and algorithms for quantitatively understanding and modeling typical, atypical, and distressed human behavior with a specific focus on speech- and language-based communicative, affective, and social behavior. We describe the three important BSP components of acquiring behavioral data in an ecologically valid manner across laboratory to real-world settings, extracting and analyzing behavioral cues from measured data, and developing models offering predictive and decision-making support. We highlight both the foundational speech and language processing building blocks as well as the novel processing and modeling opportunities. Using examples drawn from specific real-world applications ranging from literacy assessment and autism diagnostics to psychotherapy for addiction and marital well being, we illustrate behavioral informatics applications of these signal processing techniques that contribute to quantifying higher level, often subjectively described, human behavior in a domain-sensitive fashion. PMID:24039277
The effectiveness of Speech-Music Therapy for Aphasia (SMTA) in five speakers with Apraxia of Speech and aphasia

NARCIS (Netherlands)

Hurkmans, Joost; Jonkers, Roel; de Bruijn, Madeleen; Boonstra, Anne M.; Hartman, Paul P.; Arendzen, Hans; Reinders - Messelink, Heelen

2015-01-01

Background: Several studies using musical elements in the treatment of neurological language and speech disorders have reported improvement of speech production. One such programme, Speech-Music Therapy for Aphasia (SMTA), integrates speech therapy and music therapy (MT) to treat the individual with
Sound frequency affects speech emotion perception: Results from congenital amusia

Directory of Open Access Journals (Sweden)

Sydney eLolli

2015-09-01

Full Text Available Congenital amusics, or tone-deaf individuals, show difficulty in perceiving and producing small pitch differences. While amusia has marked effects on music perception, its impact on speech perception is less clear. Here we test the hypothesis that individual differences in pitch perception affect judgment of emotion in speech, by applying band-pass filters to spoken statements of emotional speech. A norming study was first conducted on Mechanical Turk to ensure that the intended emotions from the Macquarie Battery for Evaluation of Prosody (MBEP were reliably identifiable by US English speakers. The most reliably identified emotional speech samples were used in in Experiment 1, in which subjects performed a psychophysical pitch discrimination task, and an emotion identification task under band-pass and unfiltered speech conditions. Results showed a significant correlation between pitch discrimination threshold and emotion identification accuracy for band-pass filtered speech, with amusics (defined here as those with a pitch discrimination threshold > 16 Hz performing worse than controls. This relationship with pitch discrimination was not seen in unfiltered speech conditions. Given the dissociation between band-pass filtered and unfiltered speech conditions, we inferred that amusics may be compensating for poorer pitch perception by using speech cues that are filtered out in this manipulation.
Cueing listeners to attend to a target talker progressively improves word report as the duration of the cue-target interval lengthens to 2,000 ms.

Science.gov (United States)

Holmes, Emma; Kitterick, Padraig T; Summerfield, A Quentin

2018-04-25

Endogenous attention is typically studied by presenting instructive cues in advance of a target stimulus array. For endogenous visual attention, task performance improves as the duration of the cue-target interval increases up to 800 ms. Less is known about how endogenous auditory attention unfolds over time or the mechanisms by which an instructive cue presented in advance of an auditory array improves performance. The current experiment used five cue-target intervals (0, 250, 500, 1,000, and 2,000 ms) to compare four hypotheses for how preparatory attention develops over time in a multi-talker listening task. Young adults were cued to attend to a target talker who spoke in a mixture of three talkers. Visual cues indicated the target talker's spatial location or their gender. Participants directed attention to location and gender simultaneously ("objects") at all cue-target intervals. Participants were consistently faster and more accurate at reporting words spoken by the target talker when the cue-target interval was 2,000 ms than 0 ms. In addition, the latency of correct responses progressively shortened as the duration of the cue-target interval increased from 0 to 2,000 ms. These findings suggest that the mechanisms involved in preparatory auditory attention develop gradually over time, taking at least 2,000 ms to reach optimal configuration, yet providing cumulative improvements in speech intelligibility as the duration of the cue-target interval increases from 0 to 2,000 ms. These results demonstrate an improvement in performance for cue-target intervals longer than those that have been reported previously in the visual or auditory modalities.
Spoken Word Recognition of Chinese Words in Continuous Speech

Science.gov (United States)

Yip, Michael C. W.

2015-01-01

The present study examined the role of positional probability of syllables played in recognition of spoken word in continuous Cantonese speech. Because some sounds occur more frequently at the beginning position or ending position of Cantonese syllables than the others, so these kinds of probabilistic information of syllables may cue the locations…
Low- and high-frequency cortical brain oscillations reflect dissociable mechanisms of concurrent speech segregation in noise.

Science.gov (United States)

Yellamsetty, Anusha; Bidelman, Gavin M

2018-04-01

Parsing simultaneous speech requires listeners use pitch-guided segregation which can be affected by the signal-to-noise ratio (SNR) in the auditory scene. The interaction of these two cues may occur at multiple levels within the cortex. The aims of the current study were to assess the correspondence between oscillatory brain rhythms and determine how listeners exploit pitch and SNR cues to successfully segregate concurrent speech. We recorded electrical brain activity while participants heard double-vowel stimuli whose fundamental frequencies (F0s) differed by zero or four semitones (STs) presented in either clean or noise-degraded (+5 dB SNR) conditions. We found that behavioral identification was more accurate for vowel mixtures with larger pitch separations but F0 benefit interacted with noise. Time-frequency analysis decomposed the EEG into different spectrotemporal frequency bands. Low-frequency (θ, β) responses were elevated when speech did not contain pitch cues (0ST > 4ST) or was noisy, suggesting a correlate of increased listening effort and/or memory demands. Contrastively, γ power increments were observed for changes in both pitch (0ST > 4ST) and SNR (clean > noise), suggesting high-frequency bands carry information related to acoustic features and the quality of speech representations. Brain-behavior associations corroborated these effects; modulations in low-frequency rhythms predicted the speed of listeners' perceptual decisions with higher bands predicting identification accuracy. Results are consistent with the notion that neural oscillations reflect both automatic (pre-perceptual) and controlled (post-perceptual) mechanisms of speech processing that are largely divisible into high- and low-frequency bands of human brain rhythms. Copyright © 2018 Elsevier B.V. All rights reserved.

Detection of Clinical Depression in Adolescents’ Speech During Family Interactions

Science.gov (United States)

Low, Lu-Shih Alex; Maddage, Namunu C.; Lech, Margaret; Sheeber, Lisa B.; Allen, Nicholas B.

2013-01-01

The properties of acoustic speech have previously been investigated as possible cues for depression in adults. However, these studies were restricted to small populations of patients and the speech recordings were made during patients’ clinical interviews or fixed-text reading sessions. Symptoms of depression often first appear during adolescence at a time when the voice is changing, in both males and females, suggesting that specific studies of these phenomena in adolescent populations are warranted. This study investigated acoustic correlates of depression in a large sample of 139 adolescents (68 clinically depressed and 71 controls). Speech recordings were made during naturalistic interactions between adolescents and their parents. Prosodic, cepstral, spectral, and glottal features, as well as features derived from the Teager energy operator (TEO), were tested within a binary classification framework. Strong gender differences in classification accuracy were observed. The TEO-based features clearly outperformed all other features and feature combinations, providing classification accuracy ranging between 81%–87% for males and 72%–79% for females. Close, but slightly less accurate, results were obtained by combining glottal features with prosodic and spectral features (67%–69% for males and 70%–75% for females). These findings indicate the importance of nonlinear mechanisms associated with the glottal flow formation as cues for clinical depression. PMID:21075715
Speech Recognition with the Advanced Combination Encoder and Transient Emphasis Spectral Maxima Strategies in Nucleus 24 Recipients

Science.gov (United States)

Holden, Laura K.; Vandali, Andrew E.; Skinner, Margaret W.; Fourakis, Marios S.; Holden, Timothy A.

2005-01-01

One of the difficulties faced by cochlear implant (CI) recipients is perception of low-intensity speech cues. A. E. Vandali (2001) has developed the transient emphasis spectral maxima (TESM) strategy to amplify short-duration, low-level sounds. The aim of the present study was to determine whether speech scores would be significantly higher with…
The Influence of Direct and Indirect Speech on Mental Representations

NARCIS (Netherlands)

A. Eerland (Anita); J.A.A. Engelen (Jan A.A.); R.A. Zwaan (Rolf)

2013-01-01

textabstractLanguage can be viewed as a set of cues that modulate the comprehender's thought processes. It is a very subtle instrument. For example, the literature suggests that people perceive direct speech (e.g., Joanne said: 'I went out for dinner last night') as more vivid and perceptually
Frontal and temporal contributions to understanding the iconic co-speech gestures that accompany speech.

Science.gov (United States)

Dick, Anthony Steven; Mok, Eva H; Raja Beharelle, Anjali; Goldin-Meadow, Susan; Small, Steven L

2014-03-01

In everyday conversation, listeners often rely on a speaker's gestures to clarify any ambiguities in the verbal message. Using fMRI during naturalistic story comprehension, we examined which brain regions in the listener are sensitive to speakers' iconic gestures. We focused on iconic gestures that contribute information not found in the speaker's talk, compared with those that convey information redundant with the speaker's talk. We found that three regions-left inferior frontal gyrus triangular (IFGTr) and opercular (IFGOp) portions, and left posterior middle temporal gyrus (MTGp)--responded more strongly when gestures added information to nonspecific language, compared with when they conveyed the same information in more specific language; in other words, when gesture disambiguated speech as opposed to reinforced it. An increased BOLD response was not found in these regions when the nonspecific language was produced without gesture, suggesting that IFGTr, IFGOp, and MTGp are involved in integrating semantic information across gesture and speech. In addition, we found that activity in the posterior superior temporal sulcus (STSp), previously thought to be involved in gesture-speech integration, was not sensitive to the gesture-speech relation. Together, these findings clarify the neurobiology of gesture-speech integration and contribute to an emerging picture of how listeners glean meaning from gestures that accompany speech. Copyright © 2012 Wiley Periodicals, Inc.
Experience with a second language affects the use of fundamental frequency in speech segmentation

Science.gov (United States)

Broersma, Mirjam; Cho, Taehong; Kim, Sahyang; Martínez-García, Maria Teresa; Connell, Katrina

2017-01-01

This study investigates whether listeners’ experience with a second language learned later in life affects their use of fundamental frequency (F0) as a cue to word boundaries in the segmentation of an artificial language (AL), particularly when the cues to word boundaries conflict between the first language (L1) and second language (L2). F0 signals phrase-final (and thus word-final) boundaries in French but word-initial boundaries in English. Participants were functionally monolingual French listeners, functionally monolingual English listeners, bilingual L1-English L2-French listeners, and bilingual L1-French L2-English listeners. They completed the AL-segmentation task with F0 signaling word-final boundaries or without prosodic cues to word boundaries (monolingual groups only). After listening to the AL, participants completed a forced-choice word-identification task in which the foils were either non-words or part-words. The results show that the monolingual French listeners, but not the monolingual English listeners, performed better in the presence of F0 cues than in the absence of such cues. Moreover, bilingual status modulated listeners’ use of F0 cues to word-final boundaries, with bilingual French listeners performing less accurately than monolingual French listeners on both word types but with bilingual English listeners performing more accurately than monolingual English listeners on non-words. These findings not only confirm that speech segmentation is modulated by the L1, but also newly demonstrate that listeners’ experience with the L2 (French or English) affects their use of F0 cues in speech segmentation. This suggests that listeners’ use of prosodic cues to word boundaries is adaptive and non-selective, and can change as a function of language experience. PMID:28738093
Integrating speech technology to meet crew station design requirements

Science.gov (United States)

Simpson, Carol A.; Ruth, John C.; Moore, Carolyn A.

The last two years have seen improvements in speech generation and speech recognition technology that make speech I/O for crew station controls and displays viable for operational systems. These improvements include increased robustness of algorithm performance in high levels of background noise, increased vocabulary size, improved performance in the connected speech mode, and less speaker dependence. This improved capability makes possible far more sophisticated user interface design than was possible with earlier technology. Engineering, linguistic, and human factors design issues are discussed in the context of current voice I/O technology performance.
Parameter masks for close talk speech segregation using deep neural networks

Directory of Open Access Journals (Sweden)

Jiang Yi

2015-01-01

Full Text Available A deep neural networks (DNN based close talk speech segregation algorithm is introduced. One nearby microphone is used to collect the target speech as close talk indicated, and another microphone is used to get the noise in environments. The time and energy difference between the two microphones signal is used as the segregation cue. A DNN estimator on each frequency channel is used to calculate the parameter masks. The parameter masks represent the target speech energy in each time frequency (T-F units. Experiment results show the good performance of the proposed system. The signal to noise ratio (SNR improvement is 8.1 dB on 0 dB noisy environment.
Markers of Deception in Italian Speech

Directory of Open Access Journals (Sweden)

Katelyn eSpence

2012-10-01

Full Text Available Lying is a universal activity and the detection of lying a universal concern. Presently, there is great interest in determining objective measures of deception. The examination of speech, in particular, holds promise in this regard; yet, most of what we know about the relationship between speech and lying is based on the assessment of English-speaking participants. Few studies have examined indicators of deception in languages other than English. The world’s languages differ in significant ways, and cross-linguistic studies of deceptive communications are a research imperative. Here we review some of these differences amongst the world’s languages, and provide an overview of a number of recent studies demonstrating that cross-linguistic research is a worthwhile endeavour. In addition, we report the results of an empirical investigation of pitch, response latency, and speech rate as cues to deception in Italian speech. True and false opinions were elicited in an audio-taped interview. A within subjects analysis revealed no significant difference between the average pitch of the two conditions; however, speech rate was significantly slower, while response latency was longer, during deception compared with truth-telling. We explore the implications of these findings and propose directions for future research, with the aim of expanding the cross-linguistic branch of research on markers of deception.
Tackling the complexity in speech

DEFF Research Database (Denmark)

section includes four carefully selected chapters. They deal with facets of speech production, speech acoustics, and/or speech perception or recognition, place them in an integrated phonetic-phonological perspective, and relate them in more or less explicit ways to aspects of speech technology. Therefore......, we hope that this volume can help speech scientists with traditional training in phonetics and phonology to keep up with the latest developments in speech technology. In the opposite direction, speech researchers starting from a technological perspective will hopefully get inspired by reading about...... the questions, phenomena, and communicative functions that are currently addressed in phonetics and phonology. Either way, the future of speech research lies in international, interdisciplinary collaborations, and our volume is meant to reflect and facilitate such collaborations...
What's in the input? Frequent frames in child-directed speech offer distributional cues to grammatical categories in Spanish and English.

Science.gov (United States)

Weisleder, Adriana; Waxman, Sandra R

2010-11-01

Recent analyses have revealed that child-directed speech contains distributional regularities that could, in principle, support young children's discovery of distinct grammatical categories (noun, verb, adjective). In particular, a distributional unit known as the frequent frame appears to be especially informative (Mintz, 2003). However, analyses have focused almost exclusively on the distributional information available in English. Because languages differ considerably in how the grammatical forms are marked within utterances, the scarcity of cross-linguistic evidence represents an unfortunate gap. We therefore advance the developmental evidence by analyzing the distributional information available in frequent frames across two languages (Spanish and English), across sentence positions (phrase medial and phrase final), and across grammatical forms (noun, verb, adjective). We selected six parent-child corpora from the CHILDES database (three English; three Spanish), and analyzed the input when children were aged 2 ; 6 or younger. In each language, frequent frames did indeed offer systematic cues to grammatical category assignment. We also identify differences in the accuracy of these frames across languages, sentences positions and grammatical classes.
How musical expertise shapes speech perception: evidence from auditory classification images.

Science.gov (United States)

Varnet, Léo; Wang, Tianyun; Peter, Chloe; Meunier, Fanny; Hoen, Michel

2015-09-24

It is now well established that extensive musical training percolates to higher levels of cognition, such as speech processing. However, the lack of a precise technique to investigate the specific listening strategy involved in speech comprehension has made it difficult to determine how musicians' higher performance in non-speech tasks contributes to their enhanced speech comprehension. The recently developed Auditory Classification Image approach reveals the precise time-frequency regions used by participants when performing phonemic categorizations in noise. Here we used this technique on 19 non-musicians and 19 professional musicians. We found that both groups used very similar listening strategies, but the musicians relied more heavily on the two main acoustic cues, at the first formant onset and at the onsets of the second and third formants onsets. Additionally, they responded more consistently to stimuli. These observations provide a direct visualization of auditory plasticity resulting from extensive musical training and shed light on the level of functional transfer between auditory processing and speech perception.
Prosodic cues to word order: what level of representation?

Directory of Open Access Journals (Sweden)

Carline eBernard

2012-10-01

Full Text Available Within language, systematic correlations exist between syntactic structure and prosody. Prosodic prominence, for instance, falls on the complement and not the head of syntactic phrases, and its realization depends on the phrasal position of the prominent element. Thus, in Japanese, a functor-final language, prominence is phrase-initial and realized as increased pitch (^Tōkyō ni ‘Tokyo to’, whereas in French, English or Italian, functor-initial languages, it manifests itself as phrase-final lengthening (to Rome. Prosody is readily available in the linguistic signal even to the youngest infants. It has, therefore, been proposed that young learners might be able to exploit its correlations with syntax to bootstrap language structure. In this study, we tested this hypothesis, investigating how 8-month-old monolingual French infants processed an artificial grammar manipulating the relative position of prosodic prominence and word frequency. In Condition 1, we created a speech stream in which the two cues, prosody and frequency, were aligned, frequent words being prosodically non-prominent and infrequent ones being prominent, as is the case in natural language (functors are prosodically minimal compared to content words. In Condition 2, the two cues were misaligned, with frequent words carrying prosodic prominence, unlike in natural language. After familiarization with the aligned or the misaligned stream in a headturn preference procedure, we tested infants’ preference for test items having a frequent word initial or a frequent word final word order. We found that infants’ familiarized with the aligned stream showed the expected preference for the frequent word initial test items, mimicking the functor-initial word order of French. Infants in the misaligned condition showed no preference. These results suggest that infants are able to use word frequency and prosody as early cues to word order and they integrate them into a coherent
Visual Temporal Acuity Is Related to Auditory Speech Perception Abilities in Cochlear Implant Users.

Science.gov (United States)

Jahn, Kelly N; Stevenson, Ryan A; Wallace, Mark T

Despite significant improvements in speech perception abilities following cochlear implantation, many prelingually deafened cochlear implant (CI) recipients continue to rely heavily on visual information to develop speech and language. Increased reliance on visual cues for understanding spoken language could lead to the development of unique audiovisual integration and visual-only processing abilities in these individuals. Brain imaging studies have demonstrated that good CI performers, as indexed by auditory-only speech perception abilities, have different patterns of visual cortex activation in response to visual and auditory stimuli as compared with poor CI performers. However, no studies have examined whether speech perception performance is related to any type of visual processing abilities following cochlear implantation. The purpose of the present study was to provide a preliminary examination of the relationship between clinical, auditory-only speech perception tests, and visual temporal acuity in prelingually deafened adult CI users. It was hypothesized that prelingually deafened CI users, who exhibit better (i.e., more acute) visual temporal processing abilities would demonstrate better auditory-only speech perception performance than those with poorer visual temporal acuity. Ten prelingually deafened adult CI users were recruited for this study. Participants completed a visual temporal order judgment task to quantify visual temporal acuity. To assess auditory-only speech perception abilities, participants completed the consonant-nucleus-consonant word recognition test and the AzBio sentence recognition test. Results were analyzed using two-tailed partial Pearson correlations, Spearman's rho correlations, and independent samples t tests. Visual temporal acuity was significantly correlated with auditory-only word and sentence recognition abilities. In addition, proficient CI users, as assessed via auditory-only speech perception performance, demonstrated
When to Take a Gesture Seriously: On How We Use and Prioritize Communicative Cues.

Science.gov (United States)

Gunter, Thomas C; Weinbrenner, J E Douglas

2017-08-01

When people talk, their speech is often accompanied by gestures. Although it is known that co-speech gestures can influence face-to-face communication, it is currently unclear to what extent they are actively used and under which premises they are prioritized to facilitate communication. We investigated these open questions in two experiments that varied how pointing gestures disambiguate the utterances of an interlocutor. Participants, whose event-related brain responses were measured, watched a video, where an actress was interviewed about, for instance, classical literature (e.g., Goethe and Shakespeare). While responding, the actress pointed systematically to the left side to refer to, for example, Goethe, or to the right to refer to Shakespeare. Her final statement was ambiguous and combined with a pointing gesture. The P600 pattern found in Experiment 1 revealed that, when pointing was unreliable, gestures were only monitored for their cue validity and not used for reference tracking related to the ambiguity. However, when pointing was a valid cue (Experiment 2), it was used for reference tracking, as indicated by a reduced N400 for pointing. In summary, these findings suggest that a general prioritization mechanism is in use that constantly monitors and evaluates the use of communicative cues against communicative priors on the basis of accumulated error information.
Emotional speech comprehension in children and adolescents with autism spectrum disorders.

Science.gov (United States)

Le Sourn-Bissaoui, Sandrine; Aguert, Marc; Girard, Pauline; Chevreuil, Claire; Laval, Virginie

2013-01-01

We examined the understanding of emotional speech by children and adolescents with autism spectrum disorders (ASD). We predicted that they would have difficulty understanding emotional speech, not because of an emotional prosody processing impairment but because of problems drawing appropriate inferences, especially in multiple-cue environments. Twenty-six children and adolescents with ASD and 26 typically developing controls performed a computerized task featuring emotional prosody, either embedded in a discrepant context or without any context at all. They must identify the speaker's feeling. When the prosody was the sole cue, participants with ASD performed just as well as controls, relying on this cue to infer the speaker's intention. When the prosody was embedded in a discrepant context, both ASD and TD participants exhibited a contextual bias and a negativity bias. However ASD participants relied less on the emotional prosody than the controls when it was positive. We discuss these findings with respect to executive function and intermodal processing. After reading this article, the reader should be able to (1) describe the ASD participants pragmatic impairments, (2) explain why ASD participants did not have an emotional prosody processing impairment, and (3) explain why ASD participants had difficulty inferring the speaker's intention from emotional prosody in a discrepant situation. Copyright © 2013 Elsevier Inc. All rights reserved.
Speech Enhancement of Mobile Devices Based on the Integration of a Dual Microphone Array and a Background Noise Elimination Algorithm.

Science.gov (United States)

Chen, Yung-Yue

2018-05-08

Mobile devices are often used in our daily lives for the purposes of speech and communication. The speech quality of mobile devices is always degraded due to the environmental noises surrounding mobile device users. Regretfully, an effective background noise reduction solution cannot easily be developed for this speech enhancement problem. Due to these depicted reasons, a methodology is systematically proposed to eliminate the effects of background noises for the speech communication of mobile devices. This methodology integrates a dual microphone array with a background noise elimination algorithm. The proposed background noise elimination algorithm includes a whitening process, a speech modelling method and an H ₂ estimator. Due to the adoption of the dual microphone array, a low-cost design can be obtained for the speech enhancement of mobile devices. Practical tests have proven that this proposed method is immune to random background noises, and noiseless speech can be obtained after executing this denoise process.
Speech Enhancement of Mobile Devices Based on the Integration of a Dual Microphone Array and a Background Noise Elimination Algorithm

Directory of Open Access Journals (Sweden)

Yung-Yue Chen

2018-05-01

Full Text Available Mobile devices are often used in our daily lives for the purposes of speech and communication. The speech quality of mobile devices is always degraded due to the environmental noises surrounding mobile device users. Regretfully, an effective background noise reduction solution cannot easily be developed for this speech enhancement problem. Due to these depicted reasons, a methodology is systematically proposed to eliminate the effects of background noises for the speech communication of mobile devices. This methodology integrates a dual microphone array with a background noise elimination algorithm. The proposed background noise elimination algorithm includes a whitening process, a speech modelling method and an H2 estimator. Due to the adoption of the dual microphone array, a low-cost design can be obtained for the speech enhancement of mobile devices. Practical tests have proven that this proposed method is immune to random background noises, and noiseless speech can be obtained after executing this denoise process.
Evidence for a perception of prosodic cues in bat communication: contact call classification by Megaderma lyra.

Science.gov (United States)

Janssen, Simone; Schmidt, Sabine

2009-07-01

The perception of prosodic cues in human speech may be rooted in mechanisms common to mammals. The present study explores to what extent bats use rhythm and frequency, typically carrying prosodic information in human speech, for the classification of communication call series. Using a two-alternative, forced choice procedure, we trained Megaderma lyra to discriminate between synthetic contact call series differing in frequency, rhythm on level of calls and rhythm on level of call series, and measured the classification performance for stimuli differing in only one, or two, of the above parameters. A comparison with predictions from models based on one, combinations of two, or all, parameters revealed that the bats based their decision predominantly on frequency and in addition on rhythm on the level of call series, whereas rhythm on level of calls was not taken into account in this paradigm. Moreover, frequency and rhythm on the level of call series were evaluated independently. Our results show that parameters corresponding to prosodic cues in human languages are perceived and evaluated by bats. Thus, these necessary prerequisites for a communication via prosodic structures in mammals have evolved far before human speech.
Comparison of Gated Audiovisual Speech Identification in Elderly Hearing Aid Users and Elderly Normal-Hearing Individuals

Science.gov (United States)

Lidestam, Björn; Rönnberg, Jerker

2016-01-01

The present study compared elderly hearing aid (EHA) users (n = 20) with elderly normal-hearing (ENH) listeners (n = 20) in terms of isolation points (IPs, the shortest time required for correct identification of a speech stimulus) and accuracy of audiovisual gated speech stimuli (consonants, words, and final words in highly and less predictable sentences) presented in silence. In addition, we compared the IPs of audiovisual speech stimuli from the present study with auditory ones extracted from a previous study, to determine the impact of the addition of visual cues. Both participant groups achieved ceiling levels in terms of accuracy in the audiovisual identification of gated speech stimuli; however, the EHA group needed longer IPs for the audiovisual identification of consonants and words. The benefit of adding visual cues to auditory speech stimuli was more evident in the EHA group, as audiovisual presentation significantly shortened the IPs for consonants, words, and final words in less predictable sentences; in the ENH group, audiovisual presentation only shortened the IPs for consonants and words. In conclusion, although the audiovisual benefit was greater for EHA group, this group had inferior performance compared with the ENH group in terms of IPs when supportive semantic context was lacking. Consequently, EHA users needed the initial part of the audiovisual speech signal to be longer than did their counterparts with normal hearing to reach the same level of accuracy in the absence of a semantic context. PMID:27317667
A glimpsing account of the role of temporal fine structure information in speech recognition.

Science.gov (United States)

Apoux, Frédéric; Healy, Eric W

2013-01-01

Many behavioral studies have reported a significant decrease in intelligibility when the temporal fine structure (TFS) of a sound mixture is replaced with noise or tones (i.e., vocoder processing). This finding has led to the conclusion that TFS information is critical for speech recognition in noise. How the normal -auditory system takes advantage of the original TFS, however, remains unclear. Three -experiments on the role of TFS in noise are described. All three experiments measured speech recognition in various backgrounds while manipulating the envelope, TFS, or both. One experiment tested the hypothesis that vocoder processing may artificially increase the apparent importance of TFS cues. Another experiment evaluated the relative contribution of the target and masker TFS by disturbing only the TFS of the target or that of the masker. Finally, a last experiment evaluated the -relative contribution of envelope and TFS information. In contrast to previous -studies, however, the original envelope and TFS were both preserved - to some extent - in all conditions. Overall, the experiments indicate a limited influence of TFS and suggest that little speech information is extracted from the TFS. Concomitantly, these experiments confirm that most speech information is carried by the temporal envelope in real-world conditions. When interpreted within the framework of the glimpsing model, the results of these experiments suggest that TFS is primarily used as a grouping cue to select the time-frequency regions -corresponding to the target speech signal.

Atypical audio-visual speech perception and McGurk effects in children with specific language impairment.

Science.gov (United States)

Leybaert, Jacqueline; Macchi, Lucie; Huyse, Aurélie; Champoux, François; Bayard, Clémence; Colin, Cécile; Berthommier, Frédéric

2014-01-01

Audiovisual speech perception of children with specific language impairment (SLI) and children with typical language development (TLD) was compared in two experiments using /aCa/ syllables presented in the context of a masking release paradigm. Children had to repeat syllables presented in auditory alone, visual alone (speechreading), audiovisual congruent and incongruent (McGurk) conditions. Stimuli were masked by either stationary (ST) or amplitude modulated (AM) noise. Although children with SLI were less accurate in auditory and audiovisual speech perception, they showed similar auditory masking release effect than children with TLD. Children with SLI also had less correct responses in speechreading than children with TLD, indicating impairment in phonemic processing of visual speech information. In response to McGurk stimuli, children with TLD showed more fusions in AM noise than in ST noise, a consequence of the auditory masking release effect and of the influence of visual information. Children with SLI did not show this effect systematically, suggesting they were less influenced by visual speech. However, when the visual cues were easily identified, the profile of responses to McGurk stimuli was similar in both groups, suggesting that children with SLI do not suffer from an impairment of audiovisual integration. An analysis of percent of information transmitted revealed a deficit in the children with SLI, particularly for the place of articulation feature. Taken together, the data support the hypothesis of an intact peripheral processing of auditory speech information, coupled with a supra modal deficit of phonemic categorization in children with SLI. Clinical implications are discussed.
A randomized controlled trial on the beneficial effects of training letter-speech sound integration on reading fluency in children with dyslexia

NARCIS (Netherlands)

Fraga González, G.; Žarić, G.; Tijms, J.; Bonte, M.; Blomert, L.; van der Molen, M.W.

2015-01-01

A recent account of dyslexia assumes that a failure to develop automated letter-speech sound integration might be responsible for the observed lack of reading fluency. This study uses a pre-test-training-post-test design to evaluate the effects of a training program based on letter-speech sound
Use of "um" in the Deceptive Speech of a Convicted Murderer

Science.gov (United States)

Villar, Gina; Arciuli, Joanne; Mallard, David

2012-01-01

Previous studies have demonstrated a link between language behaviors and deception; however, questions remain about the role of specific linguistic cues, especially in real-life high-stakes lies. This study investigated use of the so-called filler, "um," in externally verifiable truthful versus deceptive speech of a convicted murderer. The data…
Speech recognition in natural background noise.

Directory of Open Access Journals (Sweden)

Julien Meyer

Full Text Available In the real world, human speech recognition nearly always involves listening in background noise. The impact of such noise on speech signals and on intelligibility performance increases with the separation of the listener from the speaker. The present behavioral experiment provides an overview of the effects of such acoustic disturbances on speech perception in conditions approaching ecologically valid contexts. We analysed the intelligibility loss in spoken word lists with increasing listener-to-speaker distance in a typical low-level natural background noise. The noise was combined with the simple spherical amplitude attenuation due to distance, basically changing the signal-to-noise ratio (SNR. Therefore, our study draws attention to some of the most basic environmental constraints that have pervaded spoken communication throughout human history. We evaluated the ability of native French participants to recognize French monosyllabic words (spoken at 65.3 dB(A, reference at 1 meter at distances between 11 to 33 meters, which corresponded to the SNRs most revealing of the progressive effect of the selected natural noise (-8.8 dB to -18.4 dB. Our results showed that in such conditions, identity of vowels is mostly preserved, with the striking peculiarity of the absence of confusion in vowels. The results also confirmed the functional role of consonants during lexical identification. The extensive analysis of recognition scores, confusion patterns and associated acoustic cues revealed that sonorant, sibilant and burst properties were the most important parameters influencing phoneme recognition. . Altogether these analyses allowed us to extract a resistance scale from consonant recognition scores. We also identified specific perceptual consonant confusion groups depending of the place in the words (onset vs. coda. Finally our data suggested that listeners may access some acoustic cues of the CV transition, opening interesting perspectives for
Speech recognition in natural background noise.

Science.gov (United States)

Meyer, Julien; Dentel, Laure; Meunier, Fanny

2013-01-01

In the real world, human speech recognition nearly always involves listening in background noise. The impact of such noise on speech signals and on intelligibility performance increases with the separation of the listener from the speaker. The present behavioral experiment provides an overview of the effects of such acoustic disturbances on speech perception in conditions approaching ecologically valid contexts. We analysed the intelligibility loss in spoken word lists with increasing listener-to-speaker distance in a typical low-level natural background noise. The noise was combined with the simple spherical amplitude attenuation due to distance, basically changing the signal-to-noise ratio (SNR). Therefore, our study draws attention to some of the most basic environmental constraints that have pervaded spoken communication throughout human history. We evaluated the ability of native French participants to recognize French monosyllabic words (spoken at 65.3 dB(A), reference at 1 meter) at distances between 11 to 33 meters, which corresponded to the SNRs most revealing of the progressive effect of the selected natural noise (-8.8 dB to -18.4 dB). Our results showed that in such conditions, identity of vowels is mostly preserved, with the striking peculiarity of the absence of confusion in vowels. The results also confirmed the functional role of consonants during lexical identification. The extensive analysis of recognition scores, confusion patterns and associated acoustic cues revealed that sonorant, sibilant and burst properties were the most important parameters influencing phoneme recognition. . Altogether these analyses allowed us to extract a resistance scale from consonant recognition scores. We also identified specific perceptual consonant confusion groups depending of the place in the words (onset vs. coda). Finally our data suggested that listeners may access some acoustic cues of the CV transition, opening interesting perspectives for future studies.
Use of explicit memory cues following parietal lobe lesions.

Science.gov (United States)

Dobbins, Ian G; Jaeger, Antonio; Studer, Bettina; Simons, Jon S

2012-11-01

The putative role of the lateral parietal lobe in episodic memory has recently become a topic of considerable debate, owing primarily to its consistent activation for studied materials during functional magnetic resonance imaging studies of recognition. Here we examined the performance of patients with parietal lobe lesions using an explicit memory cueing task in which probabilistic cues ("Likely Old" or "Likely New"; 75% validity) preceded the majority of verbal recognition memory probes. Without cues, patients and control participants did not differ in accuracy. However, group differences emerged during the "Likely New" cue condition with controls responding more accurately than parietal patients when these cues were valid (preceding new materials) and trending towards less accuracy when these cues were invalid (preceding old materials). Both effects suggest insufficient integration of external cues into memory judgments on the part of the parietal patients whose cued performance largely resembled performance in the complete absence of cues. Comparison of the parietal patients to a patient group with frontal lobe lesions suggested the pattern was specific to parietal and adjacent area lesions. Overall, the data indicate that parietal lobe patients fail to appropriately incorporate external cues of novelty into recognition attributions. This finding supports a role for the lateral parietal lobe in the adaptive biasing of memory judgments through the integration of external cues and internal memory evidence. We outline the importance of such adaptive biasing through consideration of basic signal detection predictions regarding maximum possible accuracy with and without informative environmental cues. Copyright © 2012 Elsevier Ltd. All rights reserved.
Acoustic foundations of the speech-to-song illusion.

Science.gov (United States)

Tierney, Adam; Patel, Aniruddh D; Breen, Mara

2018-06-01

In the "speech-to-song illusion," certain spoken phrases are heard as highly song-like when isolated from context and repeated. This phenomenon occurs to a greater degree for some stimuli than for others, suggesting that particular cues prompt listeners to perceive a spoken phrase as song. Here we investigated the nature of these cues across four experiments. In Experiment 1, participants were asked to rate how song-like spoken phrases were after each of eight repetitions. Initial ratings were correlated with the consistency of an underlying beat and within-syllable pitch slope, while rating change was linked to beat consistency, within-syllable pitch slope, and melodic structure. In Experiment 2, the within-syllable pitch slope of the stimuli was manipulated, and this manipulation changed the extent to which participants heard certain stimuli as more musical than others. In Experiment 3, the extent to which the pitch sequences of a phrase fit a computational model of melodic structure was altered, but this manipulation did not have a significant effect on musicality ratings. In Experiment 4, the consistency of intersyllable timing was manipulated, but this manipulation did not have an effect on the change in perceived musicality after repetition. Our methods provide a new way of studying the causal role of specific acoustic features in the speech-to-song illusion via subtle acoustic manipulations of speech, and show that listeners can rapidly (and implicitly) assess the degree to which nonmusical stimuli contain musical structure. (PsycINFO Database Record (c) 2018 APA, all rights reserved).
Cross-Linguistic Differences in Prosodic Cues to Syntactic Disambiguation in German and English

Science.gov (United States)

O'Brien, Mary Grantham; Jackson, Carrie N.; Gardner, Christine E.

2014-01-01

This study examined whether late-learning English-German second language (L2) learners and late-learning German-English L2 learners use prosodic cues to disambiguate temporarily ambiguous first language and L2 sentences during speech production. Experiments 1a and 1b showed that English-German L2 learners and German-English L2 learners used a…
The Effect of Age and Type of Noise on Speech Perception under Conditions of Changing Context and Noise Levels.

Science.gov (United States)

Taitelbaum-Swead, Riki; Fostick, Leah

2016-01-01

Everyday life includes fluctuating noise levels, resulting in continuously changing speech intelligibility. The study aims were: (1) to quantify the amount of decrease in age-related speech perception, as a result of increasing noise level, and (2) to test the effect of age on context usage at the word level (smaller amount of contextual cues). A total of 24 young adults (age 20-30 years) and 20 older adults (age 60-75 years) were tested. Meaningful and nonsense one-syllable consonant-vowel-consonant words were presented with the background noise types of speech noise (SpN), babble noise (BN), and white noise (WN), with a signal-to-noise ratio (SNR) of 0 and -5 dB. Older adults had lower accuracy in SNR = 0, with WN being the most difficult condition for all participants. Measuring the change in speech perception when SNR decreased showed a reduction of 18.6-61.5% in intelligibility, with age effect only for BN. Both young and older adults used less phonemic context with WN, as compared to other conditions. Older adults are more affected by an increasing noise level of fluctuating informational noise as compared to steady-state noise. They also use less contextual cues when perceiving monosyllabic words. Further studies should take into consideration that when presenting the stimulus differently (change in noise level, less contextual cues), other perceptual and cognitive processes are involved. © 2016 S. Karger AG, Basel.
Cognitive-linguistic effort in multidisciplinary stroke rehabilitation: Decreasing vs. increasing cues for word retrieval.

Science.gov (United States)

Choe, Yu-Kyong; Foster, Tammie; Asselin, Abigail; LeVander, Meagan; Baird, Jennifer

2017-04-01

Approximately 24% of stroke survivors experience co-occurring aphasia and hemiparesis. These individuals typically attend back-to-back therapy sessions. However, sequentially scheduled therapy may trigger physical and mental fatigue and have an adverse impact on treatment outcomes. The current study tested a hypothesis that exerting less effort during a therapy session would reduce overall fatigue and enhance functional recovery. Two stroke survivors chronically challenged by non-fluent aphasia and right hemiparesis sequentially completed verbal naming and upper-limb tasks on their home computers. The level of cognitive-linguistic effort in speech/language practice was manipulated by presenting verbal naming tasks in two conditions: Decreasing cues (i.e., most-to-least support for word retrieval), and Increasing cues (i.e., least-to-most support). The participants completed the same upper-limb exercises throughout the study periods. Both individuals showed a statistically significant advantage of decreasing cues over increasing cues in word retrieval during the practice period, but not at the end of the practice period or thereafter. The participant with moderate aphasia and hemiparesis achieved clinically meaningful gains in upper-limb functions following the decreasing cues condition, but not after the increasing cues condition. Preliminary findings from the current study suggest a positive impact of decreasing cues in the context of multidisciplinary stroke rehabilitation.
Learning Grammatical Categories from Distributional Cues: Flexible Frames for Language Acquisition

Science.gov (United States)

St. Clair, Michelle C.; Monaghan, Padraic; Christiansen, Morten H.

2010-01-01

Numerous distributional cues in the child's environment may potentially assist in language learning, but what cues are useful to the child and when are these cues utilised? We propose that the most useful source of distributional cue is a flexible frame surrounding the word, where the language learner integrates information from the preceding and…
Role of short-time acoustic temporal fine structure cues in sentence recognition for normal-hearing listeners.

Science.gov (United States)

Hou, Limin; Xu, Li

2018-02-01

Short-time processing was employed to manipulate the amplitude, bandwidth, and temporal fine structure (TFS) in sentences. Fifty-two native-English-speaking, normal-hearing listeners participated in four sentence-recognition experiments. Results showed that recovered envelope (E) played an important role in speech recognition when the bandwidth was > 1 equivalent rectangular bandwidth. Removing TFS drastically reduced sentence recognition. Preserving TFS greatly improved sentence recognition when amplitude information was available at a rate ≥ 10 Hz (i.e., time segment ≤ 100 ms). Therefore, the short-time TFS facilitates speech perception together with the recovered E and works with the coarse amplitude cues to provide useful information for speech recognition.
Current management for word finding difficulties by speech-language therapists in South African remedial schools.

Science.gov (United States)

de Rauville, Ingrid; Chetty, Sandhya; Pahl, Jenny

2006-01-01

Word finding difficulties frequently found in learners with language learning difficulties (Casby, 1992) are an integral part of Speech-Language Therapists' management role when working with learning disabled children. This study investigated current management for word finding difficulties by 70 Speech-Language Therapists in South African remedial schools. A descriptive survey design using a quantitative and qualitative approach was used. A questionnaire and follow-up focus group discussion were used to collect data. Results highlighted the use of the Renfrew Word Finding Scale (Renfrew, 1972, 1995) as the most frequently used formal assessment tool. Language sample analysis and discourse analysis were the most frequently used informal assessment procedures. Formal intervention programmes were generally not used. Phonetic, phonemic or phonological cueing were the most frequently used therapeutic strategies. The authors note strengths and raise concerns about current management for word finding difficulties in South African remedial schools, particularly in terms of bilingualism. Opportunities are highlighted regarding the development of assessment and intervention measures relevant to the diverse learning disabled population in South Africa.
Severe Multisensory Speech Integration Deficits in High-Functioning School-Aged Children with Autism Spectrum Disorder (ASD) and Their Resolution During Early Adolescence

Science.gov (United States)

Foxe, John J.; Molholm, Sophie; Del Bene, Victor A.; Frey, Hans-Peter; Russo, Natalie N.; Blanco, Daniella; Saint-Amour, Dave; Ross, Lars A.

2015-01-01

Under noisy listening conditions, visualizing a speaker's articulations substantially improves speech intelligibility. This multisensory speech integration ability is crucial to effective communication, and the appropriate development of this capacity greatly impacts a child's ability to successfully navigate educational and social settings. Research shows that multisensory integration abilities continue developing late into childhood. The primary aim here was to track the development of these abilities in children with autism, since multisensory deficits are increasingly recognized as a component of the autism spectrum disorder (ASD) phenotype. The abilities of high-functioning ASD children (n = 84) to integrate seen and heard speech were assessed cross-sectionally, while environmental noise levels were systematically manipulated, comparing them with age-matched neurotypical children (n = 142). Severe integration deficits were uncovered in ASD, which were increasingly pronounced as background noise increased. These deficits were evident in school-aged ASD children (5–12 year olds), but were fully ameliorated in ASD children entering adolescence (13–15 year olds). The severity of multisensory deficits uncovered has important implications for educators and clinicians working in ASD. We consider the observation that the multisensory speech system recovers substantially in adolescence as an indication that it is likely amenable to intervention during earlier childhood, with potentially profound implications for the development of social communication abilities in ASD children. PMID:23985136
Seeing the talker’s face supports executive processing of speech in steady state noise

OpenAIRE

Sushmit eMishra; Thomas eLunner; Thomas eLunner; Thomas eLunner; Stefan eStenfelt; Stefan eStenfelt; Jerker eRönnberg; Mary eRudner

2013-01-01

Listening to speech in noise depletes cognitive resources, affecting speech processing. The present study investigated how remaining resources or cognitive spare capacity (CSC) can be deployed by young adults with normal hearing. We administered a test of CSC (CSCT, Mishra et al., 2013) along with a battery of established cognitive tests to 20 participants with normal hearing. In the CSCT, lists of two-digit numbers were presented with and without visual cues in quiet, as well as in steady-st...
Floral pathway integrator gene expression mediates gradual transmission of environmental and endogenous cues to flowering time.

Science.gov (United States)

van Dijk, Aalt D J; Molenaar, Jaap

2017-01-01

The appropriate timing of flowering is crucial for the reproductive success of plants. Hence, intricate genetic networks integrate various environmental and endogenous cues such as temperature or hormonal statues. These signals integrate into a network of floral pathway integrator genes. At a quantitative level, it is currently unclear how the impact of genetic variation in signaling pathways on flowering time is mediated by floral pathway integrator genes. Here, using datasets available from literature, we connect Arabidopsis thaliana flowering time in genetic backgrounds varying in upstream signalling components with the expression levels of floral pathway integrator genes in these genetic backgrounds. Our modelling results indicate that flowering time depends in a quite linear way on expression levels of floral pathway integrator genes. This gradual, proportional response of flowering time to upstream changes enables a gradual adaptation to changing environmental factors such as temperature and light.
Floral pathway integrator gene expression mediates gradual transmission of environmental and endogenous cues to flowering time

Directory of Open Access Journals (Sweden)

Aalt D.J. van Dijk

2017-04-01

Full Text Available The appropriate timing of flowering is crucial for the reproductive success of plants. Hence, intricate genetic networks integrate various environmental and endogenous cues such as temperature or hormonal statues. These signals integrate into a network of floral pathway integrator genes. At a quantitative level, it is currently unclear how the impact of genetic variation in signaling pathways on flowering time is mediated by floral pathway integrator genes. Here, using datasets available from literature, we connect Arabidopsis thaliana flowering time in genetic backgrounds varying in upstream signalling components with the expression levels of floral pathway integrator genes in these genetic backgrounds. Our modelling results indicate that flowering time depends in a quite linear way on expression levels of floral pathway integrator genes. This gradual, proportional response of flowering time to upstream changes enables a gradual adaptation to changing environmental factors such as temperature and light.
Motion Cueing Algorithm Development: Human-Centered Linear and Nonlinear Approaches

Science.gov (United States)

Houck, Jacob A. (Technical Monitor); Telban, Robert J.; Cardullo, Frank M.

2005-01-01

While the performance of flight simulator motion system hardware has advanced substantially, the development of the motion cueing algorithm, the software that transforms simulated aircraft dynamics into realizable motion commands, has not kept pace. Prior research identified viable features from two algorithms: the nonlinear "adaptive algorithm", and the "optimal algorithm" that incorporates human vestibular models. A novel approach to motion cueing, the "nonlinear algorithm" is introduced that combines features from both approaches. This algorithm is formulated by optimal control, and incorporates a new integrated perception model that includes both visual and vestibular sensation and the interaction between the stimuli. Using a time-varying control law, the matrix Riccati equation is updated in real time by a neurocomputing approach. Preliminary pilot testing resulted in the optimal algorithm incorporating a new otolith model, producing improved motion cues. The nonlinear algorithm vertical mode produced a motion cue with a time-varying washout, sustaining small cues for longer durations and washing out large cues more quickly compared to the optimal algorithm. The inclusion of the integrated perception model improved the responses to longitudinal and lateral cues. False cues observed with the NASA adaptive algorithm were absent. The neurocomputing approach was crucial in that the number of presentations of an input vector could be reduced to meet the real time requirement without degrading the quality of the motion cues.
Weighting of Acoustic Cues to a Manner Distinction by Children with and without Hearing Loss

Science.gov (United States)

Nittrouer, Susan; Lowenstein, Joanna H.

2015-01-01

Purpose: Children must develop optimal perceptual weighting strategies for processing speech in their first language. Hearing loss can interfere with that development, especially if cochlear implants are required. The three goals of this study were to measure, for children with and without hearing loss: (a) cue weighting for a manner distinction,…
Audiovisual Temporal Perception in Aging: The Role of Multisensory Integration and Age-Related Sensory Loss.

Science.gov (United States)

Brooks, Cassandra J; Chan, Yu Man; Anderson, Andrew J; McKendrick, Allison M

2018-01-01

Within each sensory modality, age-related deficits in temporal perception contribute to the difficulties older adults experience when performing everyday tasks. Since perceptual experience is inherently multisensory, older adults also face the added challenge of appropriately integrating or segregating the auditory and visual cues present in our dynamic environment into coherent representations of distinct objects. As such, many studies have investigated how older adults perform when integrating temporal information across audition and vision. This review covers both direct judgments about temporal information (the sound-induced flash illusion, temporal order, perceived synchrony, and temporal rate discrimination) and judgments regarding stimuli containing temporal information (the audiovisual bounce effect and speech perception). Although an age-related increase in integration has been demonstrated on a variety of tasks, research specifically investigating the ability of older adults to integrate temporal auditory and visual cues has produced disparate results. In this short review, we explore what factors could underlie these divergent findings. We conclude that both task-specific differences and age-related sensory loss play a role in the reported disparity in age-related effects on the integration of auditory and visual temporal information.

Audiovisual Temporal Perception in Aging: The Role of Multisensory Integration and Age-Related Sensory Loss

Science.gov (United States)

Brooks, Cassandra J.; Chan, Yu Man; Anderson, Andrew J.; McKendrick, Allison M.

2018-01-01

Within each sensory modality, age-related deficits in temporal perception contribute to the difficulties older adults experience when performing everyday tasks. Since perceptual experience is inherently multisensory, older adults also face the added challenge of appropriately integrating or segregating the auditory and visual cues present in our dynamic environment into coherent representations of distinct objects. As such, many studies have investigated how older adults perform when integrating temporal information across audition and vision. This review covers both direct judgments about temporal information (the sound-induced flash illusion, temporal order, perceived synchrony, and temporal rate discrimination) and judgments regarding stimuli containing temporal information (the audiovisual bounce effect and speech perception). Although an age-related increase in integration has been demonstrated on a variety of tasks, research specifically investigating the ability of older adults to integrate temporal auditory and visual cues has produced disparate results. In this short review, we explore what factors could underlie these divergent findings. We conclude that both task-specific differences and age-related sensory loss play a role in the reported disparity in age-related effects on the integration of auditory and visual temporal information. PMID:29867415
Music and speech prosody: a common rhythm.

Science.gov (United States)

Hausen, Maija; Torppa, Ritva; Salmela, Viljami R; Vainio, Martti; Särkämö, Teppo

2013-01-01

Disorders of music and speech perception, known as amusia and aphasia, have traditionally been regarded as dissociated deficits based on studies of brain damaged patients. This has been taken as evidence that music and speech are perceived by largely separate and independent networks in the brain. However, recent studies of congenital amusia have broadened this view by showing that the deficit is associated with problems in perceiving speech prosody, especially intonation and emotional prosody. In the present study the association between the perception of music and speech prosody was investigated with healthy Finnish adults (n = 61) using an on-line music perception test including the Scale subtest of Montreal Battery of Evaluation of Amusia (MBEA) and Off-Beat and Out-of-key tasks as well as a prosodic verbal task that measures the perception of word stress. Regression analyses showed that there was a clear association between prosody perception and music perception, especially in the domain of rhythm perception. This association was evident after controlling for music education, age, pitch perception, visuospatial perception, and working memory. Pitch perception was significantly associated with music perception but not with prosody perception. The association between music perception and visuospatial perception (measured using analogous tasks) was less clear. Overall, the pattern of results indicates that there is a robust link between music and speech perception and that this link can be mediated by rhythmic cues (time and stress).
Music and speech prosody: A common rhythm

Directory of Open Access Journals (Sweden)

Maija eHausen

2013-09-01

Full Text Available Disorders of music and speech perception, known as amusia and aphasia, have traditionally been regarded as dissociated deficits based on studies of brain damaged patients. This has been taken as evidence that music and speech are perceived by largely separate and independent networks in the brain. However, recent studies of congenital amusia have broadened this view by showing that the deficit is associated with problems in perceiving speech prosody, especially intonation and emotional prosody. In the present study the association between the perception of music and speech prosody was investigated with healthy Finnish adults (n = 61 using an on-line music perception test including the Scale subtest of Montreal Battery of Evaluation of Amusia (MBEA and Off-Beat and Out-of-key tasks as well as a prosodic verbal task that measures the perception of word stress. Regression analyses showed that there was a clear association between prosody perception and music perception, especially in the domain of rhythm perception. This association was evident after controlling for music education, age, pitch perception, visuospatial perception and working memory. Pitch perception was significantly associated with music perception but not with prosody perception. The association between music perception and visuospatial perception (measured using analogous tasks was less clear. Overall, the pattern of results indicates that there is a robust link between music and speech perception and that this link can be mediated by rhythmic cues (time and stress.
Music and speech prosody: a common rhythm

Science.gov (United States)

Hausen, Maija; Torppa, Ritva; Salmela, Viljami R.; Vainio, Martti; Särkämö, Teppo

2013-01-01

Disorders of music and speech perception, known as amusia and aphasia, have traditionally been regarded as dissociated deficits based on studies of brain damaged patients. This has been taken as evidence that music and speech are perceived by largely separate and independent networks in the brain. However, recent studies of congenital amusia have broadened this view by showing that the deficit is associated with problems in perceiving speech prosody, especially intonation and emotional prosody. In the present study the association between the perception of music and speech prosody was investigated with healthy Finnish adults (n = 61) using an on-line music perception test including the Scale subtest of Montreal Battery of Evaluation of Amusia (MBEA) and Off-Beat and Out-of-key tasks as well as a prosodic verbal task that measures the perception of word stress. Regression analyses showed that there was a clear association between prosody perception and music perception, especially in the domain of rhythm perception. This association was evident after controlling for music education, age, pitch perception, visuospatial perception, and working memory. Pitch perception was significantly associated with music perception but not with prosody perception. The association between music perception and visuospatial perception (measured using analogous tasks) was less clear. Overall, the pattern of results indicates that there is a robust link between music and speech perception and that this link can be mediated by rhythmic cues (time and stress). PMID:24032022
Identification of speech transients using variable frame rate analysis and wavelet packets.

Science.gov (United States)

Rasetshwane, Daniel M; Boston, J Robert; Li, Ching-Chung

2006-01-01

Speech transients are important cues for identifying and discriminating speech sounds. Yoo et al. and Tantibundhit et al. were successful in identifying speech transients and, emphasizing them, improving the intelligibility of speech in noise. However, their methods are computationally intensive and unsuitable for real-time applications. This paper presents a method to identify and emphasize speech transients that combines subband decomposition by the wavelet packet transform with variable frame rate (VFR) analysis and unvoiced consonant detection. The VFR analysis is applied to each wavelet packet to define a transitivity function that describes the extent to which the wavelet coefficients of that packet are changing. Unvoiced consonant detection is used to identify unvoiced consonant intervals and the transitivity function is amplified during these intervals. The wavelet coefficients are multiplied by the transitivity function for that packet, amplifying the coefficients localized at times when they are changing and attenuating coefficients at times when they are steady. Inverse transform of the modified wavelet packet coefficients produces a signal corresponding to speech transients similar to the transients identified by Yoo et al. and Tantibundhit et al. A preliminary implementation of the algorithm runs more efficiently.
Instant messages vs. speech: hormones and why we still need to hear each other.

Science.gov (United States)

Seltzer, Leslie J; Prososki, Ashley R; Ziegler, Toni E; Pollak, Seth D

2012-01-01

Human speech evidently conveys an adaptive advantage, given its apparently rapid dissemination through the ancient world and global use today. As such, speech must be capable of altering human biology in a positive way, possibly through those neuroendocrine mechanisms responsible for strengthening the social bonds between individuals. Indeed, speech between trusted individuals is capable of reducing levels of salivary cortisol, often considered a biomarker of stress, and increasing levels of urinary oxytocin, a hormone involved in the formation and maintenance of positive relationships. It is not clear, however, whether it is the uniquely human grammar, syntax, content and/or choice of words that causes these physiological changes, or whether the prosodic elements of speech, which are present in the vocal cues of many other species, are responsible. In order to tease apart these elements of human communication, we examined the hormonal responses of female children who instant messaged their mothers after undergoing a stressor. We discovered that unlike children interacting with their mothers in person or over the phone, girls who instant messaged did not release oxytocin; instead, these participants showed levels of salivary cortisol as high as control subjects who did not interact with their parents at all. We conclude that the comforting sound of a familiar voice is responsible for the hormonal differences observed and, hence, that similar differences may be seen in other species using vocal cues to communicate.
Perception of co-speech gestures in aphasic patients: a visual exploration study during the observation of dyadic conversations.

Science.gov (United States)

Preisig, Basil C; Eggenberger, Noëmi; Zito, Giuseppe; Vanbellingen, Tim; Schumacher, Rahel; Hopfner, Simone; Nyffeler, Thomas; Gutbrod, Klemens; Annoni, Jean-Marie; Bohlhalter, Stephan; Müri, René M

2015-03-01

Co-speech gestures are part of nonverbal communication during conversations. They either support the verbal message or provide the interlocutor with additional information. Furthermore, they prompt as nonverbal cues the cooperative process of turn taking. In the present study, we investigated the influence of co-speech gestures on the perception of dyadic dialogue in aphasic patients. In particular, we analysed the impact of co-speech gestures on gaze direction (towards speaker or listener) and fixation of body parts. We hypothesized that aphasic patients, who are restricted in verbal comprehension, adapt their visual exploration strategies. Sixteen aphasic patients and 23 healthy control subjects participated in the study. Visual exploration behaviour was measured by means of a contact-free infrared eye-tracker while subjects were watching videos depicting spontaneous dialogues between two individuals. Cumulative fixation duration and mean fixation duration were calculated for the factors co-speech gesture (present and absent), gaze direction (to the speaker or to the listener), and region of interest (ROI), including hands, face, and body. Both aphasic patients and healthy controls mainly fixated the speaker's face. We found a significant co-speech gesture × ROI interaction, indicating that the presence of a co-speech gesture encouraged subjects to look at the speaker. Further, there was a significant gaze direction × ROI × group interaction revealing that aphasic patients showed reduced cumulative fixation duration on the speaker's face compared to healthy controls. Co-speech gestures guide the observer's attention towards the speaker, the source of semantic input. It is discussed whether an underlying semantic processing deficit or a deficit to integrate audio-visual information may cause aphasic patients to explore less the speaker's face. Copyright © 2014 Elsevier Ltd. All rights reserved.
The role of periodicity in perceiving speech in quiet and in background noise.

Science.gov (United States)

Steinmetzger, Kurt; Rosen, Stuart

2015-12-01

The ability of normal-hearing listeners to perceive sentences in quiet and in background noise was investigated in a variety of conditions mixing the presence and absence of periodicity (i.e., voicing) in both target and masker. Experiment 1 showed that in quiet, aperiodic noise-vocoded speech and speech with a natural amount of periodicity were equally intelligible, while fully periodic speech was much harder to understand. In Experiments 2 and 3, speech reception thresholds for these targets were measured in the presence of four different maskers: speech-shaped noise, harmonic complexes with a dynamically varying F0 contour, and 10 Hz amplitude-modulated versions of both. For experiment 2, results of experiment 1 were used to identify conditions with equal intelligibility in quiet, while in experiment 3 target intelligibility in quiet was near ceiling. In the presence of a masker, periodicity in the target speech mattered little, but listeners strongly benefited from periodicity in the masker. Substantial fluctuating-masker benefits required the target speech to be almost perfectly intelligible in quiet. In summary, results suggest that the ability to exploit periodicity cues may be an even more important factor when attempting to understand speech embedded in noise than the ability to benefit from masker fluctuations.
Learning foreign sounds in an alien world: videogame training improves non-native speech categorization.

Science.gov (United States)

Lim, Sung-joo; Holt, Lori L

2011-01-01

Although speech categories are defined by multiple acoustic dimensions, some are perceptually weighted more than others and there are residual effects of native-language weightings in non-native speech perception. Recent research on nonlinguistic sound category learning suggests that the distribution characteristics of experienced sounds influence perceptual cue weights: Increasing variability across a dimension leads listeners to rely upon it less in subsequent category learning (Holt & Lotto, 2006). The present experiment investigated the implications of this among native Japanese learning English /r/-/l/ categories. Training was accomplished using a videogame paradigm that emphasizes associations among sound categories, visual information, and players' responses to videogame characters rather than overt categorization or explicit feedback. Subjects who played the game for 2.5h across 5 days exhibited improvements in /r/-/l/ perception on par with 2-4 weeks of explicit categorization training in previous research and exhibited a shift toward more native-like perceptual cue weights. Copyright © 2011 Cognitive Science Society, Inc.
Dog-directed speech: why do we use it and do dogs pay attention to it?

Science.gov (United States)

Ben-Aderet, Tobey; Gallego-Abenza, Mario; Reby, David; Mathevon, Nicolas

2017-01-11

Pet-directed speech is strikingly similar to infant-directed speech, a peculiar speaking pattern with higher pitch and slower tempo known to engage infants' attention and promote language learning. Here, we report the first investigation of potential factors modulating the use of dog-directed speech, as well as its immediate impact on dogs' behaviour. We recorded adult participants speaking in front of pictures of puppies, adult and old dogs, and analysed the quality of their speech. We then performed playback experiments to assess dogs' reaction to dog-directed speech compared with normal speech. We found that human speakers used dog-directed speech with dogs of all ages and that the acoustic structure of dog-directed speech was mostly independent of dog age, except for sound pitch which was relatively higher when communicating with puppies. Playback demonstrated that, in the absence of other non-auditory cues, puppies were highly reactive to dog-directed speech, and that the pitch was a key factor modulating their behaviour, suggesting that this specific speech register has a functional value in young dogs. Conversely, older dogs did not react differentially to dog-directed speech compared with normal speech. The fact that speakers continue to use dog-directed with older dogs therefore suggests that this speech pattern may mainly be a spontaneous attempt to facilitate interactions with non-verbal listeners. © 2017 The Author(s).
A configural dominant account of contextual cueing: Configural cues are stronger than colour cues.

Science.gov (United States)

Kunar, Melina A; John, Rebecca; Sweetman, Hollie

2014-01-01

Previous work has shown that reaction times to find a target in displays that have been repeated are faster than those for displays that have never been seen before. This learning effect, termed "contextual cueing" (CC), has been shown using contexts such as the configuration of the distractors in the display and the background colour. However, it is not clear how these two contexts interact to facilitate search. We investigated this here by comparing the strengths of these two cues when they appeared together. In Experiment 1, participants searched for a target that was cued by both colour and distractor configural cues, compared with when the target was only predicted by configural information. The results showed that the addition of a colour cue did not increase contextual cueing. In Experiment 2, participants searched for a target that was cued by both colour and distractor configuration compared with when the target was only cued by colour. The results showed that adding a predictive configural cue led to a stronger CC benefit. Experiments 3 and 4 tested the disruptive effects of removing either a learned colour cue or a learned configural cue and whether there was cue competition when colour and configural cues were presented together. Removing the configural cue was more disruptive to CC than removing colour, and configural learning was shown to overshadow the learning of colour cues. The data support a configural dominant account of CC, where configural cues act as the stronger cue in comparison to colour when they are presented together.
Audiovisual Speech Perception in Infancy: The Influence of Vowel Identity and Infants' Productive Abilities on Sensitivity to (Mis)Matches between Auditory and Visual Speech Cues

Science.gov (United States)

Altvater-Mackensen, Nicole; Mani, Nivedita; Grossmann, Tobias

2016-01-01

Recent studies suggest that infants' audiovisual speech perception is influenced by articulatory experience (Mugitani et al., 2008; Yeung & Werker, 2013). The current study extends these findings by testing if infants' emerging ability to produce native sounds in babbling impacts their audiovisual speech perception. We tested 44 6-month-olds…
Neural encoding of the speech envelope by children with developmental dyslexia.

Science.gov (United States)

Power, Alan J; Colling, Lincoln J; Mead, Natasha; Barnes, Lisa; Goswami, Usha

2016-09-01

Developmental dyslexia is consistently associated with difficulties in processing phonology (linguistic sound structure) across languages. One view is that dyslexia is characterised by a cognitive impairment in the "phonological representation" of word forms, which arises long before the child presents with a reading problem. Here we investigate a possible neural basis for developmental phonological impairments. We assess the neural quality of speech encoding in children with dyslexia by measuring the accuracy of low-frequency speech envelope encoding using EEG. We tested children with dyslexia and chronological age-matched (CA) and reading-level matched (RL) younger children. Participants listened to semantically-unpredictable sentences in a word report task. The sentences were noise-vocoded to increase reliance on envelope cues. Envelope reconstruction for envelopes between 0 and 10Hz showed that the children with dyslexia had significantly poorer speech encoding in the 0-2Hz band compared to both CA and RL controls. These data suggest that impaired neural encoding of low frequency speech envelopes, related to speech prosody, may underpin the phonological deficit that causes dyslexia across languages. Copyright © 2016 The Authors. Published by Elsevier Inc. All rights reserved.
Eliciting extra prominence in read-speech tasks: The effects of different text-highlighting methods on acoustic cues to perceived prominence

DEFF Research Database (Denmark)

Berger, Stephanie; Niebuhr, Oliver; Fischer, Kerstin

2018-01-01

The research initiative Innovating Speech EliCitation Techniques (INSPECT) aims to describe and quantify how recording methods, situations and materials influence speech produc-tion in lab-speech experiments. On this basis, INSPECT aims to develop methods that reliably stimulate specific patterns...... and styles of speech, like expressive or conversational speech or different types emphatic accents. The present study investigates if and how different text highlighting methods (yellow background, bold, capital letter, italics, and underlining) make speakers reinforce the level of perceived prominence...
Top-Down Modulation of Auditory-Motor Integration during Speech Production: The Role of Working Memory.

Science.gov (United States)

Guo, Zhiqiang; Wu, Xiuqin; Li, Weifeng; Jones, Jeffery A; Yan, Nan; Sheft, Stanley; Liu, Peng; Liu, Hanjun

2017-10-25

Although working memory (WM) is considered as an emergent property of the speech perception and production systems, the role of WM in sensorimotor integration during speech processing is largely unknown. We conducted two event-related potential experiments with female and male young adults to investigate the contribution of WM to the neurobehavioural processing of altered auditory feedback during vocal production. A delayed match-to-sample task that required participants to indicate whether the pitch feedback perturbations they heard during vocalizations in test and sample sequences matched, elicited significantly larger vocal compensations, larger N1 responses in the left middle and superior temporal gyrus, and smaller P2 responses in the left middle and superior temporal gyrus, inferior parietal lobule, somatosensory cortex, right inferior frontal gyrus, and insula compared with a control task that did not require memory retention of the sequence of pitch perturbations. On the other hand, participants who underwent extensive auditory WM training produced suppressed vocal compensations that were correlated with improved auditory WM capacity, and enhanced P2 responses in the left middle frontal gyrus, inferior parietal lobule, right inferior frontal gyrus, and insula that were predicted by pretraining auditory WM capacity. These findings indicate that WM can enhance the perception of voice auditory feedback errors while inhibiting compensatory vocal behavior to prevent voice control from being excessively influenced by auditory feedback. This study provides the first evidence that auditory-motor integration for voice control can be modulated by top-down influences arising from WM, rather than modulated exclusively by bottom-up and automatic processes. SIGNIFICANCE STATEMENT One outstanding question that remains unsolved in speech motor control is how the mismatch between predicted and actual voice auditory feedback is detected and corrected. The present study
Gesture facilitates the syntactic analysis of speech

Directory of Open Access Journals (Sweden)

Henning eHolle

2012-03-01

Full Text Available Recent research suggests that the brain routinely binds together information from gesture and speech. However, most of this research focused on the integration of representational gestures with the semantic content of speech. Much less is known about how other aspects of gesture, such as emphasis, influence the interpretation of the syntactic relations in a spoken message. Here, we investigated whether beat gestures alter which syntactic structure is assigned to ambiguous spoken German sentences. The P600 component of the Event Related Brain Potential indicated that the more complex syntactic structure is easier to process when the speaker emphasizes the subject of a sentence with a beat. Thus, a simple flick of the hand can change our interpretation of who has been doing what to whom in a spoken sentence. We conclude that gestures and speech are an integrated system. Unlike previous studies, which have shown that the brain effortlessly integrates semantic information from gesture and speech, our study is the first to demonstrate that this integration also occurs for syntactic information. Moreover, the effect appears to be gesture-specific and was not found for other stimuli that draw attention to certain parts of speech, including prosodic emphasis, or a moving visual stimulus with the same trajectory as the gesture. This suggests that only visual emphasis produced with a communicative intention in mind (that is, beat gestures influences language comprehension, but not a simple visual movement lacking such an intention.
The Interaction of Temporal and Spectral Acoustic Information with Word Predictability on Speech Intelligibility

Science.gov (United States)

Shahsavarani, Somayeh Bahar

High-level, top-down information such as linguistic knowledge is a salient cortical resource that influences speech perception under most listening conditions. But, are all listeners able to exploit these resources for speech facilitation to the same extent? It was found that children with cochlear implants showed different patterns of benefit from contextual information in speech perception compared with their normal-haring peers. Previous studies have discussed the role of non-acoustic factors such as linguistic and cognitive capabilities to account for this discrepancy. Given the fact that the amount of acoustic information encoded and processed by auditory nerves of listeners with cochlear implants differs from normal-hearing listeners and even varies across individuals with cochlear implants, it is important to study the interaction of specific acoustic properties of the speech signal with contextual cues. This relationship has been mostly neglected in previous research. In this dissertation, we aimed to explore how different acoustic dimensions interact to affect listeners' abilities to combine top-down information with bottom-up information in speech perception beyond the known effects of linguistic and cognitive capacities shown previously. Specifically, the present study investigated whether there were any distinct context effects based on the resolution of spectral versus slowly-varying temporal information in perception of spectrally impoverished speech. To that end, two experiments were conducted. In both experiments, a noise-vocoded technique was adopted to generate spectrally-degraded speech to approximate acoustic cues delivered to listeners with cochlear implants. The frequency resolution was manipulated by varying the number of frequency channels. The temporal resolution was manipulated by low-pass filtering of amplitude envelope with varying low-pass cutoff frequencies. The stimuli were presented to normal-hearing native speakers of American
78 FR 49693 - Speech-to-Speech and Internet Protocol (IP) Speech-to-Speech Telecommunications Relay Services...

Science.gov (United States)

2013-08-15

...-Speech Services for Individuals with Hearing and Speech Disabilities, Report and Order (Order), document...] Speech-to-Speech and Internet Protocol (IP) Speech-to-Speech Telecommunications Relay Services; Telecommunications Relay Services and Speech-to-Speech Services for Individuals With Hearing and Speech Disabilities...
The impact of brief restriction to articulation on children's subsequent speech production.

Science.gov (United States)

Seidl, Amanda; Brosseau-Lapré, Françoise; Goffman, Lisa

2018-02-01

This project explored whether disruption of articulation during listening impacts subsequent speech production in 4-yr-olds with and without speech sound disorder (SSD). During novel word learning, typically-developing children showed effects of articulatory disruption as revealed by larger differences between two acoustic cues to a sound contrast, but children with SSD were unaffected by articulatory disruption. Findings suggest that, when typically developing 4-yr-olds experience an articulatory disruption during a listening task, the children's subsequent production is affected. Children with SSD show less influence of articulatory experience during perception, which could be the result of impaired or attenuated ties between perception and articulation.
Individual differences in using geometric and featural cues to maintain spatial orientation: cue quantity and cue ambiguity are more important than cue type.

Science.gov (United States)

Kelly, Jonathan W; McNamara, Timothy P; Bodenheimer, Bobby; Carr, Thomas H; Rieser, John J

2009-02-01

Two experiments explored the role of environmental cues in maintaining spatial orientation (sense of self-location and direction) during locomotion. Of particular interest was the importance of geometric cues (provided by environmental surfaces) and featural cues (nongeometric properties provided by striped walls) in maintaining spatial orientation. Participants performed a spatial updating task within virtual environments containing geometric or featural cues that were ambiguous or unambiguous indicators of self-location and direction. Cue type (geometric or featural) did not affect performance, but the number and ambiguity of environmental cues did. Gender differences, interpreted as a proxy for individual differences in spatial ability and/or experience, highlight the interaction between cue quantity and ambiguity. When environmental cues were ambiguous, men stayed oriented with either one or two cues, whereas women stayed oriented only with two. When environmental cues were unambiguous, women stayed oriented with one cue.

Analysis of engagement behavior in children during dyadic interactions using prosodic cues.

Science.gov (United States)

Gupta, Rahul; Bone, Daniel; Lee, Sungbok; Narayanan, Shrikanth

2016-05-01

Child engagement is defined as the interaction of a child with his/her environment in a contextually appropriate manner. Engagement behavior in children is linked to socio-emotional and cognitive state assessment with enhanced engagement identified with improved skills. A vast majority of studies however rely solely, and often implicitly, on subjective perceptual measures of engagement. Access to automatic quantification could assist researchers/clinicians to objectively interpret engagement with respect to a target behavior or condition, and furthermore inform mechanisms for improving engagement in various settings. In this paper, we present an engagement prediction system based exclusively on vocal cues observed during structured interaction between a child and a psychologist involving several tasks. Specifically, we derive prosodic cues that capture engagement levels across the various tasks. Our experiments suggest that a child's engagement is reflected not only in the vocalizations, but also in the speech of the interacting psychologist. Moreover, we show that prosodic cues are informative of the engagement phenomena not only as characterized over the entire task (i.e., global cues), but also in short term patterns (i.e., local cues). We perform a classification experiment assigning the engagement of a child into three discrete levels achieving an unweighted average recall of 55.8% (chance is 33.3%). While the systems using global cues and local level cues are each statistically significant in predicting engagement, we obtain the best results after fusing these two components. We perform further analysis of the cues at local and global levels to achieve insights linking specific prosodic patterns to the engagement phenomenon. We observe that while the performance of our model varies with task setting and interacting psychologist, there exist universal prosodic patterns reflective of engagement.
Visual-Haptic Integration: Cue Weights are Varied Appropriately, to Account for Changes in Haptic Reliability Introduced by Using a Tool

Directory of Open Access Journals (Sweden)

Chie Takahashi

2011-10-01

Full Text Available Tools such as pliers systematically change the relationship between an object's size and the hand opening required to grasp it. Previous work suggests the brain takes this into account, integrating visual and haptic size information that refers to the same object, independent of the similarity of the ‘raw’ visual and haptic signals (Takahashi et al., VSS 2009. Variations in tool geometry also affect the reliability (precision of haptic size estimates, however, because they alter the change in hand opening caused by a given change in object size. Here, we examine whether the brain appropriately adjusts the weights given to visual and haptic size signals when tool geometry changes. We first estimated each cue's reliability by measuring size-discrimination thresholds in vision-alone and haptics-alone conditions. We varied haptic reliability using tools with different object-size:hand-opening ratios (1:1, 0.7:1, and 1.4:1. We then measured the weights given to vision and haptics with each tool, using a cue-conflict paradigm. The weight given to haptics varied with tool type in a manner that was well predicted by the single-cue reliabilities (MLE model; Ernst and Banks, 2002. This suggests that the process of visual-haptic integration appropriately accounts for variations in haptic reliability introduced by different tool geometries.
The mechanism of speech processing in congenital amusia: evidence from Mandarin speakers.

Directory of Open Access Journals (Sweden)

Fang Liu

Full Text Available Congenital amusia is a neuro-developmental disorder of pitch perception that causes severe problems with music processing but only subtle difficulties in speech processing. This study investigated speech processing in a group of Mandarin speakers with congenital amusia. Thirteen Mandarin amusics and thirteen matched controls participated in a set of tone and intonation perception tasks and two pitch threshold tasks. Compared with controls, amusics showed impaired performance on word discrimination in natural speech and their gliding tone analogs. They also performed worse than controls on discriminating gliding tone sequences derived from statements and questions, and showed elevated thresholds for pitch change detection and pitch direction discrimination. However, they performed as well as controls on word identification, and on statement-question identification and discrimination in natural speech. Overall, tasks that involved multiple acoustic cues to communicative meaning were not impacted by amusia. Only when the tasks relied mainly on pitch sensitivity did amusics show impaired performance compared to controls. These findings help explain why amusia only affects speech processing in subtle ways. Further studies on a larger sample of Mandarin amusics and on amusics of other language backgrounds are needed to consolidate these results.
The mechanism of speech processing in congenital amusia: evidence from Mandarin speakers.

Science.gov (United States)

Liu, Fang; Jiang, Cunmei; Thompson, William Forde; Xu, Yi; Yang, Yufang; Stewart, Lauren

2012-01-01

Congenital amusia is a neuro-developmental disorder of pitch perception that causes severe problems with music processing but only subtle difficulties in speech processing. This study investigated speech processing in a group of Mandarin speakers with congenital amusia. Thirteen Mandarin amusics and thirteen matched controls participated in a set of tone and intonation perception tasks and two pitch threshold tasks. Compared with controls, amusics showed impaired performance on word discrimination in natural speech and their gliding tone analogs. They also performed worse than controls on discriminating gliding tone sequences derived from statements and questions, and showed elevated thresholds for pitch change detection and pitch direction discrimination. However, they performed as well as controls on word identification, and on statement-question identification and discrimination in natural speech. Overall, tasks that involved multiple acoustic cues to communicative meaning were not impacted by amusia. Only when the tasks relied mainly on pitch sensitivity did amusics show impaired performance compared to controls. These findings help explain why amusia only affects speech processing in subtle ways. Further studies on a larger sample of Mandarin amusics and on amusics of other language backgrounds are needed to consolidate these results.
The Production of Emotional Prosody in Varying Degrees of Severity of Apraxia of Speech.

Science.gov (United States)

Van Putten, Steffany M.; Walker, Judy P.

2003-01-01

A study examined the abilities of three adults with varying degrees of apraxia of speech (AOS) to produce emotional prosody. Acoustic analyses of the subjects' productions revealed that unlike the control subject, the subjects with AOS did not produce differences in duration and amplitude cues to convey different emotions. (Contains references.)…
Integration of polarization and chromatic cues in the insect sky compass.

Science.gov (United States)

el Jundi, Basil; Pfeiffer, Keram; Heinze, Stanley; Homberg, Uwe

2014-06-01

Animals relying on a celestial compass for spatial orientation may use the position of the sun, the chromatic or intensity gradient of the sky, the polarization pattern of the sky, or a combination of these cues as compass signals. Behavioral experiments in bees and ants, indeed, showed that direct sunlight and sky polarization play a role in sky compass orientation, but the relative importance of these cues are species-specific. Intracellular recordings from polarization-sensitive interneurons in the desert locust and monarch butterfly suggest that inputs from different eye regions, including polarized-light input through the dorsal rim area of the eye and chromatic/intensity gradient input from the main eye, are combined at the level of the medulla to create a robust compass signal. Conflicting input from the polarization and chromatic/intensity channel, resulting from eccentric receptive fields, is eliminated at the level of the anterior optic tubercle and central complex through internal compensation for changing solar elevations, which requires input from a circadian clock. Across several species, the central complex likely serves as an internal sky compass, combining E-vector information with other celestial cues. Descending neurons, likewise, respond both to zenithal polarization and to unpolarized cues in an azimuth-dependent way.
The effect of filtered speech feedback on the frequency of stuttering

Science.gov (United States)

Rami, Manish Krishnakant

2000-10-01

whispered speech conditions all decreased the frequency of stuttering while the approximate glottal source did not. It is suggested that articulatory events, chiefly the encoded speech output of the vocal tract origin, afford effective cues and induces fluent speech in people who stutter.
Benefits to Speech Perception in Noise From the Binaural Integration of Electric and Acoustic Signals in Simulated Unilateral Deafness.

Science.gov (United States)

Ma, Ning; Morris, Saffron; Kitterick, Pádraig Thomas

2016-01-01

This study used vocoder simulations with normal-hearing (NH) listeners to (1) measure their ability to integrate speech information from an NH ear and a simulated cochlear implant (CI), and (2) investigate whether binaural integration is disrupted by a mismatch in the delivery of spectral information between the ears arising from a misalignment in the mapping of frequency to place. Eight NH volunteers participated in the study and listened to sentences embedded in background noise via headphones. Stimuli presented to the left ear were unprocessed. Stimuli presented to the right ear (referred to as the CI-simulation ear) were processed using an eight-channel noise vocoder with one of the three processing strategies. An Ideal strategy simulated a frequency-to-place map across all channels that matched the delivery of spectral information between the ears. A Realistic strategy created a misalignment in the mapping of frequency to place in the CI-simulation ear where the size of the mismatch between the ears varied across channels. Finally, a Shifted strategy imposed a similar degree of misalignment in all channels, resulting in consistent mismatch between the ears across frequency. The ability to report key words in sentences was assessed under monaural and binaural listening conditions and at signal to noise ratios (SNRs) established by estimating speech-reception thresholds in each ear alone. The SNRs ensured that the monaural performance of the left ear never exceeded that of the CI-simulation ear. The advantages of binaural integration were calculated by comparing binaural performance with monaural performance using the CI-simulation ear alone. Thus, these advantages reflected the additional use of the experimentally constrained left ear and were not attributable to better-ear listening. Binaural performance was as accurate as, or more accurate than, monaural performance with the CI-simulation ear alone. When both ears supported a similar level of monaural
Perceived gender in clear and conversational speech

Science.gov (United States)

Booz, Jaime A.

Although many studies have examined acoustic and sociolinguistic differences between male and female speech, the relationship between talker speaking style and perceived gender has not yet been explored. The present study attempts to determine whether clear speech, a style adopted by talkers who perceive some barrier to effective communication, shifts perceptions of femininity for male and female talkers. Much of our understanding of gender perception in voice and speech is based on sustained vowels or single words, eliminating temporal, prosodic, and articulatory cues available in more naturalistic, connected speech. Thus, clear and conversational sentence stimuli, selected from the 41 talkers of the Ferguson Clear Speech Database (Ferguson, 2004) were presented to 17 normal-hearing listeners, aged 18 to 30. They rated the talkers' gender using a visual analog scale with "masculine" and "feminine" endpoints. This response method was chosen to account for within-category shifts of gender perception by allowing nonbinary responses. Mixed-effects regression analysis of listener responses revealed a small but significant effect of speaking style, and this effect was larger for male talkers than female talkers. Because of the high degree of talker variability observed for talker gender, acoustic analyses of these sentences were undertaken to determine the relationship between acoustic changes in clear and conversational speech and perceived femininity. Results of these analyses showed that mean fundamental frequency (fo) and f o standard deviation were significantly correlated to perceived gender for both male and female talkers, and vowel space was significantly correlated only for male talkers. Speaking rate and breathiness measures (CPPS) were not significantly related for either group. Outcomes of this study indicate that adopting a clear speaking style is correlated with increases in perceived femininity. Although the increase was small, some changes associated
Evidence for cue-independent spatial representation in the human auditory cortex during active listening.

Science.gov (United States)

Higgins, Nathan C; McLaughlin, Susan A; Rinne, Teemu; Stecker, G Christopher

2017-09-05

Few auditory functions are as important or as universal as the capacity for auditory spatial awareness (e.g., sound localization). That ability relies on sensitivity to acoustical cues-particularly interaural time and level differences (ITD and ILD)-that correlate with sound-source locations. Under nonspatial listening conditions, cortical sensitivity to ITD and ILD takes the form of broad contralaterally dominated response functions. It is unknown, however, whether that sensitivity reflects representations of the specific physical cues or a higher-order representation of auditory space (i.e., integrated cue processing), nor is it known whether responses to spatial cues are modulated by active spatial listening. To investigate, sensitivity to parametrically varied ITD or ILD cues was measured using fMRI during spatial and nonspatial listening tasks. Task type varied across blocks where targets were presented in one of three dimensions: auditory location, pitch, or visual brightness. Task effects were localized primarily to lateral posterior superior temporal gyrus (pSTG) and modulated binaural-cue response functions differently in the two hemispheres. Active spatial listening (location tasks) enhanced both contralateral and ipsilateral responses in the right hemisphere but maintained or enhanced contralateral dominance in the left hemisphere. Two observations suggest integrated processing of ITD and ILD. First, overlapping regions in medial pSTG exhibited significant sensitivity to both cues. Second, successful classification of multivoxel patterns was observed for both cue types and-critically-for cross-cue classification. Together, these results suggest a higher-order representation of auditory space in the human auditory cortex that at least partly integrates the specific underlying cues.
An oscillopathic approach to developmental dyslexia: From genes to speech processing.

Science.gov (United States)

Jiménez-Bravo, Miguel; Marrero, Victoria; Benítez-Burraco, Antonio

2017-06-30

Developmental dyslexia is a heterogeneous condition entailing problems with reading and spelling. Several genes have been linked or associated to the disease, many of which contribute to the development and function of brain areas important for auditory and phonological processing. Nonetheless, a clear link between genes, the brain, and the symptoms of dyslexia is still pending. The goal of this paper is contributing to bridge this gap. With this aim, we have focused on how the dyslexic brain fails to process speech sounds and reading cues. We have adopted an oscillatory perspective, according to which dyslexia may result from a deficient integration of different brain rhythms during reading/spellings tasks. Moreover, we show that some candidate genes for this condition are related to brain rhythms. This fresh approach is expected to provide a better understanding of the aetiology and the clinical presentation of developmental dyslexia, but also to achieve an earlier and more accurate diagnosis of the disease. Copyright © 2017 Elsevier B.V. All rights reserved.
Neural pathways for visual speech perception

Directory of Open Access Journals (Sweden)

Lynne E Bernstein

2014-12-01

Full Text Available This paper examines the questions, what levels of speech can be perceived visually, and how is visual speech represented by the brain? Review of the literature leads to the conclusions that every level of psycholinguistic speech structure (i.e., phonetic features, phonemes, syllables, words, and prosody can be perceived visually, although individuals differ in their abilities to do so; and that there are visual modality-specific representations of speech qua speech in higher-level vision brain areas. That is, the visual system represents the modal patterns of visual speech. The suggestion that the auditory speech pathway receives and represents visual speech is examined in light of neuroimaging evidence on the auditory speech pathways. We outline the generally agreed-upon organization of the visual ventral and dorsal pathways and examine several types of visual processing that might be related to speech through those pathways, specifically, face and body, orthography, and sign language processing. In this context, we examine the visual speech processing literature, which reveals widespread diverse patterns activity in posterior temporal cortices in response to visual speech stimuli. We outline a model of the visual and auditory speech pathways and make several suggestions: (1 The visual perception of speech relies on visual pathway representations of speech qua speech. (2 A proposed site of these representations, the temporal visual speech area (TVSA has been demonstrated in posterior temporal cortex, ventral and posterior to multisensory posterior superior temporal sulcus (pSTS. (3 Given that visual speech has dynamic and configural features, its representations in feedforward visual pathways are expected to integrate these features, possibly in TVSA.
Speech-to-Speech Relay Service

Science.gov (United States)

Consumer Guide Speech to Speech Relay Service Speech-to-Speech (STS) is one form of Telecommunications Relay Service (TRS). TRS is a service that allows persons with hearing and speech disabilities ...
A Randomized Controlled Trial on The Beneficial Effects of Training Letter-Speech Sound Integration on Reading Fluency in Children with Dyslexia.

Directory of Open Access Journals (Sweden)

Gorka Fraga González

Full Text Available A recent account of dyslexia assumes that a failure to develop automated letter-speech sound integration might be responsible for the observed lack of reading fluency. This study uses a pre-test-training-post-test design to evaluate the effects of a training program based on letter-speech sound associations with a special focus on gains in reading fluency. A sample of 44 children with dyslexia and 23 typical readers, aged 8 to 9, was recruited. Children with dyslexia were randomly allocated to either the training program group (n = 23 or a waiting-list control group (n = 21. The training intensively focused on letter-speech sound mapping and consisted of 34 individual sessions of 45 minutes over a five month period. The children with dyslexia showed substantial reading gains for the main word reading and spelling measures after training, improving at a faster rate than typical readers and waiting-list controls. The results are interpreted within the conceptual framework assuming a multisensory integration deficit as the most proximal cause of dysfluent reading in dyslexia.ISRCTN register ISRCTN12783279.
A frequency bin-wise nonlinear masking algorithm in convolutive mixtures for speech segregation.

Science.gov (United States)

Chi, Tai-Shih; Huang, Ching-Wen; Chou, Wen-Sheng

2012-05-01

A frequency bin-wise nonlinear masking algorithm is proposed in the spectrogram domain for speech segregation in convolutive mixtures. The contributive weight from each speech source to a time-frequency unit of the mixture spectrogram is estimated by a nonlinear function based on location cues. For each sound source, a non-binary mask is formed from the estimated weights and is multiplied to the mixture spectrogram to extract the sound. Head-related transfer functions (HRTFs) are used to simulate convolutive sound mixtures perceived by listeners. Simulation results show our proposed method outperforms convolutive independent component analysis and degenerate unmixing and estimation technique methods in almost all test conditions.
The effect of viewing speech on auditory speech processing is different in the left and right hemispheres.

Science.gov (United States)

Davis, Chris; Kislyuk, Daniel; Kim, Jeesun; Sams, Mikko

2008-11-25

We used whole-head magnetoencephalograpy (MEG) to record changes in neuromagnetic N100m responses generated in the left and right auditory cortex as a function of the match between visual and auditory speech signals. Stimuli were auditory-only (AO) and auditory-visual (AV) presentations of /pi/, /ti/ and /vi/. Three types of intensity matched auditory stimuli were used: intact speech (Normal), frequency band filtered speech (Band) and speech-shaped white noise (Noise). The behavioural task was to detect the /vi/ syllables which comprised 12% of stimuli. N100m responses were measured to averaged /pi/ and /ti/ stimuli. Behavioural data showed that identification of the stimuli was faster and more accurate for Normal than for Band stimuli, and for Band than for Noise stimuli. Reaction times were faster for AV than AO stimuli. MEG data showed that in the left hemisphere, N100m to both AO and AV stimuli was largest for the Normal, smaller for Band and smallest for Noise stimuli. In the right hemisphere, Normal and Band AO stimuli elicited N100m responses of quite similar amplitudes, but N100m amplitude to Noise was about half of that. There was a reduction in N100m for the AV compared to the AO conditions. The size of this reduction for each stimulus type was same in the left hemisphere but graded in the right (being largest to the Normal, smaller to the Band and smallest to the Noise stimuli). The N100m decrease for the Normal stimuli was significantly larger in the right than in the left hemisphere. We suggest that the effect of processing visual speech seen in the right hemisphere likely reflects suppression of the auditory response based on AV cues for place of articulation.
How hearing aids, background noise, and visual cues influence objective listening effort.

Science.gov (United States)

Picou, Erin M; Ricketts, Todd A; Hornsby, Benjamin W Y

2013-09-01

The purpose of this article was to evaluate factors that influence the listening effort experienced when processing speech for people with hearing loss. Specifically, the change in listening effort resulting from introducing hearing aids, visual cues, and background noise was evaluated. An additional exploratory aim was to investigate the possible relationships between the magnitude of listening effort change and individual listeners' working memory capacity, verbal processing speed, or lipreading skill. Twenty-seven participants with bilateral sensorineural hearing loss were fitted with linear behind-the-ear hearing aids and tested using a dual-task paradigm designed to evaluate listening effort. The primary task was monosyllable word recognition and the secondary task was a visual reaction time task. The test conditions varied by hearing aids (unaided, aided), visual cues (auditory-only, auditory-visual), and background noise (present, absent). For all participants, the signal to noise ratio was set individually so that speech recognition performance in noise was approximately 60% in both the auditory-only and auditory-visual conditions. In addition to measures of listening effort, working memory capacity, verbal processing speed, and lipreading ability were measured using the Automated Operational Span Task, a Lexical Decision Task, and the Revised Shortened Utley Lipreading Test, respectively. In general, the effects measured using the objective measure of listening effort were small (~10 msec). Results indicated that background noise increased listening effort, and hearing aids reduced listening effort, while visual cues did not influence listening effort. With regard to the individual variables, verbal processing speed was negatively correlated with hearing aid benefit for listening effort; faster processors were less likely to derive benefit. Working memory capacity, verbal processing speed, and lipreading ability were related to benefit from visual cues. No
How much does language proficiency by non-native listeners influence speech audiometric tests in noise?

Science.gov (United States)

Warzybok, Anna; Brand, Thomas; Wagener, Kirsten C; Kollmeier, Birger

2015-01-01

The current study investigates the extent to which the linguistic complexity of three commonly employed speech recognition tests and second language proficiency influence speech recognition thresholds (SRTs) in noise in non-native listeners. SRTs were measured for non-natives and natives using three German speech recognition tests: the digit triplet test (DTT), the Oldenburg sentence test (OLSA), and the Göttingen sentence test (GÖSA). Sixty-four non-native and eight native listeners participated. Non-natives can show native-like SRTs in noise only for the linguistically easy speech material (DTT). Furthermore, the limitation of phonemic-acoustical cues in digit triplets affects speech recognition to the same extent in non-natives and natives. For more complex and less familiar speech materials, non-natives, ranging from basic to advanced proficiency in German, require on average 3-dB better signal-to-noise ratio for the OLSA and 6-dB for the GÖSA to obtain 50% speech recognition compared to native listeners. In clinical audiology, SRT measurements with a closed-set speech test (i.e. DTT for screening or OLSA test for clinical purposes) should be used with non-native listeners rather than open-set speech tests (such as the GÖSA or HINT), especially if a closed-set version in the patient's own native language is available.
Don't speak too fast! Processing of fast rate speech in children with specific language impairment.

Directory of Open Access Journals (Sweden)

Hélène Guiraud

Full Text Available Perception of speech rhythm requires the auditory system to track temporal envelope fluctuations, which carry syllabic and stress information. Reduced sensitivity to rhythmic acoustic cues has been evidenced in children with Specific Language Impairment (SLI, impeding syllabic parsing and speech decoding. Our study investigated whether these children experience specific difficulties processing fast rate speech as compared with typically developing (TD children.Sixteen French children with SLI (8-13 years old with mainly expressive phonological disorders and with preserved comprehension and 16 age-matched TD children performed a judgment task on sentences produced 1 at normal rate, 2 at fast rate or 3 time-compressed. Sensitivity index (d' to semantically incongruent sentence-final words was measured.Overall children with SLI perform significantly worse than TD children. Importantly, as revealed by the significant Group × Speech Rate interaction, children with SLI find it more challenging than TD children to process both naturally or artificially accelerated speech. The two groups do not significantly differ in normal rate speech processing.In agreement with rhythm-processing deficits in atypical language development, our results suggest that children with SLI face difficulties adjusting to rapid speech rate. These findings are interpreted in light of temporal sampling and prosodic phrasing frameworks and of oscillatory mechanisms underlying speech perception.
Don't speak too fast! Processing of fast rate speech in children with specific language impairment.

Science.gov (United States)

Guiraud, Hélène; Bedoin, Nathalie; Krifi-Papoz, Sonia; Herbillon, Vania; Caillot-Bascoul, Aurélia; Gonzalez-Monge, Sibylle; Boulenger, Véronique

2018-01-01

Perception of speech rhythm requires the auditory system to track temporal envelope fluctuations, which carry syllabic and stress information. Reduced sensitivity to rhythmic acoustic cues has been evidenced in children with Specific Language Impairment (SLI), impeding syllabic parsing and speech decoding. Our study investigated whether these children experience specific difficulties processing fast rate speech as compared with typically developing (TD) children. Sixteen French children with SLI (8-13 years old) with mainly expressive phonological disorders and with preserved comprehension and 16 age-matched TD children performed a judgment task on sentences produced 1) at normal rate, 2) at fast rate or 3) time-compressed. Sensitivity index (d') to semantically incongruent sentence-final words was measured. Overall children with SLI perform significantly worse than TD children. Importantly, as revealed by the significant Group × Speech Rate interaction, children with SLI find it more challenging than TD children to process both naturally or artificially accelerated speech. The two groups do not significantly differ in normal rate speech processing. In agreement with rhythm-processing deficits in atypical language development, our results suggest that children with SLI face difficulties adjusting to rapid speech rate. These findings are interpreted in light of temporal sampling and prosodic phrasing frameworks and of oscillatory mechanisms underlying speech perception.

Children's Responses to Computer-Synthesized Speech in Educational Media: Gender Consistency and Gender Similarity Effects

Science.gov (United States)

Lee, Kwan Min; Liao, Katharine; Ryu, Seoungho

2007-01-01

This study examines children's social responses to gender cues in synthesized speech in a computer-based instruction setting. Eighty 5th-grade elementary school children were randomly assigned to one of the conditions in a full-factorial 2 (participant gender) x 2 (voice gender) x 2 (content gender) experiment. Results show that children apply…
Neuronal basis of speech comprehension.

Science.gov (United States)

Specht, Karsten

2014-01-01

Verbal communication does not rely only on the simple perception of auditory signals. It is rather a parallel and integrative processing of linguistic and non-linguistic information, involving temporal and frontal areas in particular. This review describes the inherent complexity of auditory speech comprehension from a functional-neuroanatomical perspective. The review is divided into two parts. In the first part, structural and functional asymmetry of language relevant structures will be discus. The second part of the review will discuss recent neuroimaging studies, which coherently demonstrate that speech comprehension processes rely on a hierarchical network involving the temporal, parietal, and frontal lobes. Further, the results support the dual-stream model for speech comprehension, with a dorsal stream for auditory-motor integration, and a ventral stream for extracting meaning but also the processing of sentences and narratives. Specific patterns of functional asymmetry between the left and right hemisphere can also be demonstrated. The review article concludes with a discussion on interactions between the dorsal and ventral streams, particularly the involvement of motor related areas in speech perception processes, and outlines some remaining unresolved issues. This article is part of a Special Issue entitled Human Auditory Neuroimaging. Copyright © 2013 Elsevier B.V. All rights reserved.
The Auditory-Visual Speech Benefit on Working Memory in Older Adults with Hearing Impairment

Directory of Open Access Journals (Sweden)

Jana B. Frtusova

2016-04-01

Full Text Available This study examined the effect of auditory-visual (AV speech stimuli on working memory in hearing impaired participants (HIP in comparison to age- and education-matched normal elderly controls (NEC. Participants completed a working memory n-back task (0- to 2-back in which sequences of digits were presented in visual-only (i.e., speech-reading, auditory-only (A-only, and AV conditions. Auditory event-related potentials (ERP were collected to assess the relationship between perceptual and working memory processing. The behavioural results showed that both groups were faster in the AV condition in comparison to the unisensory conditions. The ERP data showed perceptual facilitation in the AV condition, in the form of reduced amplitudes and latencies of the auditory N1 and/or P1 components, in the HIP group. Furthermore, a working memory ERP component, the P3, peaked earlier for both groups in the AV condition compared to the A-only condition. In general, the HIP group showed a more robust AV benefit; however, the NECs showed a dose-response relationship between perceptual facilitation and working memory improvement, especially for facilitation of processing speed. Two measures, reaction time and P3 amplitude, suggested that the presence of visual speech cues may have helped the HIP to counteract the demanding auditory processing, to the level that no group differences were evident during the AV modality despite lower performance during the A-only condition. Overall, this study provides support for the theory of an integrated perceptual-cognitive system. The practical significance of these findings is also discussed.
Cue reactivity towards shopping cues in female participants.

Science.gov (United States)

Starcke, Katrin; Schlereth, Berenike; Domass, Debora; Schöler, Tobias; Brand, Matthias

2013-03-01

Background and aims It is currently under debate whether pathological buying can be considered as a behavioural addiction. Addictions have often been investigated with cue-reactivity paradigms to assess subjective, physiological and neural craving reactions. The current study aims at testing whether cue reactivity towards shopping cues is related to pathological buying tendencies. Methods A sample of 66 non-clinical female participants rated shopping related pictures concerning valence, arousal, and subjective craving. In a subgroup of 26 participants, electrodermal reactions towards those pictures were additionally assessed. Furthermore, all participants were screened concerning pathological buying tendencies and baseline craving for shopping. Results Results indicate a relationship between the subjective ratings of the shopping cues and pathological buying tendencies, even if baseline craving for shopping was controlled for. Electrodermal reactions were partly related to the subjective ratings of the cues. Conclusions Cue reactivity may be a potential correlate of pathological buying tendencies. Thus, pathological buying may be accompanied by craving reactions towards shopping cues. Results support the assumption that pathological buying can be considered as a behavioural addiction. From a methodological point of view, results support the view that the cue-reactivity paradigm is suited for the investigation of craving reactions in pathological buying and future studies should implement this paradigm in clinical samples.
Gated audiovisual speech identification in silence vs. noise: effects on time and accuracy

Science.gov (United States)

Moradi, Shahram; Lidestam, Björn; Rönnberg, Jerker

2013-01-01

This study investigated the degree to which audiovisual presentation (compared to auditory-only presentation) affected isolation point (IPs, the amount of time required for the correct identification of speech stimuli using a gating paradigm) in silence and noise conditions. The study expanded on the findings of Moradi et al. (under revision), using the same stimuli, but presented in an audiovisual instead of an auditory-only manner. The results showed that noise impeded the identification of consonants and words (i.e., delayed IPs and lowered accuracy), but not the identification of final words in sentences. In comparison with the previous study by Moradi et al., it can be concluded that the provision of visual cues expedited IPs and increased the accuracy of speech stimuli identification in both silence and noise. The implication of the results is discussed in terms of models for speech understanding. PMID:23801980
Multisensory speech perception without the left superior temporal sulcus.

Science.gov (United States)

Baum, Sarah H; Martin, Randi C; Hamilton, A Cris; Beauchamp, Michael S

2012-09-01

Converging evidence suggests that the left superior temporal sulcus (STS) is a critical site for multisensory integration of auditory and visual information during speech perception. We report a patient, SJ, who suffered a stroke that damaged the left tempo-parietal area, resulting in mild anomic aphasia. Structural MRI showed complete destruction of the left middle and posterior STS, as well as damage to adjacent areas in the temporal and parietal lobes. Surprisingly, SJ demonstrated preserved multisensory integration measured with two independent tests. First, she perceived the McGurk effect, an illusion that requires integration of auditory and visual speech. Second, her perception of morphed audiovisual speech with ambiguous auditory or visual information was significantly influenced by the opposing modality. To understand the neural basis for this preserved multisensory integration, blood-oxygen level dependent functional magnetic resonance imaging (BOLD fMRI) was used to examine brain responses to audiovisual speech in SJ and 23 healthy age-matched controls. In controls, bilateral STS activity was observed. In SJ, no activity was observed in the damaged left STS but in the right STS, more cortex was active in SJ than in any of the normal controls. Further, the amplitude of the BOLD response in right STS response to McGurk stimuli was significantly greater in SJ than in controls. The simplest explanation of these results is a reorganization of SJ's cortical language networks such that the right STS now subserves multisensory integration of speech. Copyright © 2012 Elsevier Inc. All rights reserved.
A magnetorheological haptic cue accelerator for manual transmission vehicles

International Nuclear Information System (INIS)

Han, Young-Min; Noh, Kyung-Wook; Choi, Seung-Bok; Lee, Yang-Sub

2010-01-01

This paper proposes a new haptic cue function for manual transmission vehicles to achieve optimal gear shifting. This function is implemented on the accelerator pedal by utilizing a magnetorheological (MR) brake mechanism. By combining the haptic cue function with the accelerator pedal, the proposed haptic cue device can transmit the optimal moment of gear shifting for manual transmission to a driver without requiring the driver's visual attention. As a first step to achieve this goal, a MR fluid-based haptic device is devised to enable rotary motion of the accelerator pedal. Taking into account spatial limitations, the design parameters are optimally determined using finite element analysis to maximize the relative control torque. The proposed haptic cue device is then manufactured and its field-dependent torque and time response are experimentally evaluated. Then the manufactured MR haptic cue device is integrated with the accelerator pedal. A simple virtual vehicle emulating the operation of the engine of a passenger vehicle is constructed and put into communication with the haptic cue device. A feed-forward torque control algorithm for the haptic cue is formulated and control performances are experimentally evaluated and presented in the time domain
The Relationship Between Spectral Modulation Detection and Speech Recognition: Adult Versus Pediatric Cochlear Implant Recipients.

Science.gov (United States)

Gifford, René H; Noble, Jack H; Camarata, Stephen M; Sunderhaus, Linsey W; Dwyer, Robert T; Dawant, Benoit M; Dietrich, Mary S; Labadie, Robert F

2018-01-01

Adult cochlear implant (CI) recipients demonstrate a reliable relationship between spectral modulation detection and speech understanding. Prior studies documenting this relationship have focused on postlingually deafened adult CI recipients-leaving an open question regarding the relationship between spectral resolution and speech understanding for adults and children with prelingual onset of deafness. Here, we report CI performance on the measures of speech recognition and spectral modulation detection for 578 CI recipients including 477 postlingual adults, 65 prelingual adults, and 36 prelingual pediatric CI users. The results demonstrated a significant correlation between spectral modulation detection and various measures of speech understanding for 542 adult CI recipients. For 36 pediatric CI recipients, however, there was no significant correlation between spectral modulation detection and speech understanding in quiet or in noise nor was spectral modulation detection significantly correlated with listener age or age at implantation. These findings suggest that pediatric CI recipients might not depend upon spectral resolution for speech understanding in the same manner as adult CI recipients. It is possible that pediatric CI users are making use of different cues, such as those contained within the temporal envelope, to achieve high levels of speech understanding. Further investigation is warranted to investigate the relationship between spectral and temporal resolution and speech recognition to describe the underlying mechanisms driving peripheral auditory processing in pediatric CI users.
Speech understanding in noise with integrated in-ear and muff-style hearing protection systems

Directory of Open Access Journals (Sweden)

Sharon M Abel

2011-01-01

Full Text Available Integrated hearing protection systems are designed to enhance free field and radio communications during military operations while protecting against the damaging effects of high-level noise exposure. A study was conducted to compare the effect of increasing the radio volume on the intelligibility of speech over the radios of two candidate systems, in-ear and muff-style, in 85-dBA speech babble noise presented free field. Twenty normal-hearing, English-fluent subjects, half male and half female, were tested in same gender pairs. Alternating as talker and listener, their task was to discriminate consonant-vowel-consonant syllables that contrasted either the initial or final consonant. Percent correct consonant discrimination increased with increases in the radio volume. At the highest volume, subjects achieved 79% with the in-ear device but only 69% with the muff-style device, averaged across the gender of listener/talker pairs and consonant position. Although there was no main effect of gender, female listener/talkers showed a 10% advantage for the final consonant and male listener/talkers showed a 1% advantage for the initial consonant. These results indicate that normal hearing users can achieve reasonably high radio communication scores with integrated in-ear hearing protection in moderately high-level noise that provides both energetic and informational masking. The adequacy of the range of available radio volumes for users with hearing loss has yet to be determined.
On the use of the distortion-sensitivity approach in examining the role of linguistic abilities in speech understanding in noise.

Science.gov (United States)

Goverts, S Theo; Huysmans, Elke; Kramer, Sophia E; de Groot, Annette M B; Houtgast, Tammo

2011-12-01

Researchers have used the distortion-sensitivity approach in the psychoacoustical domain to investigate the role of auditory processing abilities in speech perception in noise (van Schijndel, Houtgast, & Festen, 2001; Goverts & Houtgast, 2010). In this study, the authors examined the potential applicability of the distortion-sensitivity approach for investigating the role of linguistic abilities in speech understanding in noise. The authors applied the distortion-sensitivity approach by measuring the processing of visually presented masked text in a condition with manipulated syntactic, lexical, and semantic cues and while using the Text Reception Threshold (George et al., 2007; Kramer, Zekveld, & Houtgast, 2009; Zekveld, George, Kramer, Goverts, & Houtgast, 2007) method. Two groups that differed in linguistic abilities were studied: 13 native and 10 non-native speakers of Dutch, all typically hearing university students. As expected, the non-native subjects showed substantially reduced performance. The results of the distortion-sensitivity approach yielded differentiated results on the use of specific linguistic cues in the 2 groups. The results show the potential value of the distortion-sensitivity approach in studying the role of linguistic abilities in speech understanding in noise of individuals with hearing impairment.
Prosody production networks are modulated by sensory cues and social context.

Science.gov (United States)

Klasen, Martin; von Marschall, Clara; Isman, Güldehen; Zvyagintsev, Mikhail; Gur, Ruben C; Mathiak, Klaus

2018-03-05

The neurobiology of emotional prosody production is not well investigated. In particular, the effects of cues and social context are not known. The present study sought to differentiate cued from free emotion generation and the effect of social feedback from a human listener. Online speech filtering enabled fMRI during prosodic communication in 30 participants. Emotional vocalizations were a) free, b) auditorily cued, c) visually cued, or d) with interactive feedback. In addition to distributed language networks, cued emotions increased activity in auditory and - in case of visual stimuli - visual cortex. Responses were larger in pSTG at the right hemisphere and the ventral striatum when participants were listened to and received feedback from the experimenter. Sensory, language, and reward networks contributed to prosody production and were modulated by cues and social context. The right pSTG is a central hub for communication in social interactions - in particular for interpersonal evaluation of vocal emotions.
Influences of selective adaptation on perception of audiovisual speech

Science.gov (United States)

Dias, James W.; Cook, Theresa C.; Rosenblum, Lawrence D.

2016-01-01

Research suggests that selective adaptation in speech is a low-level process dependent on sensory-specific information shared between the adaptor and test-stimuli. However, previous research has only examined how adaptors shift perception of unimodal test stimuli, either auditory or visual. In the current series of experiments, we investigated whether adaptation to cross-sensory phonetic information can influence perception of integrated audio-visual phonetic information. We examined how selective adaptation to audio and visual adaptors shift perception of speech along an audiovisual test continuum. This test-continuum consisted of nine audio-/ba/-visual-/va/ stimuli, ranging in visual clarity of the mouth. When the mouth was clearly visible, perceivers “heard” the audio-visual stimulus as an integrated “va” percept 93.7% of the time (e.g., McGurk & MacDonald, 1976). As visibility of the mouth became less clear across the nine-item continuum, the audio-visual “va” percept weakened, resulting in a continuum ranging in audio-visual percepts from /va/ to /ba/. Perception of the test-stimuli was tested before and after adaptation. Changes in audiovisual speech perception were observed following adaptation to visual-/va/ and audiovisual-/va/, but not following adaptation to auditory-/va/, auditory-/ba/, or visual-/ba/. Adaptation modulates perception of integrated audio-visual speech by modulating the processing of sensory-specific information. The results suggest that auditory and visual speech information are not completely integrated at the level of selective adaptation. PMID:27041781
The left dorsolateral prefrontal cortex and caudate pathway: New evidence for cue-induced craving of smokers.

Science.gov (United States)

Yuan, Kai; Yu, Dahua; Bi, Yanzhi; Wang, Ruonan; Li, Min; Zhang, Yajuan; Dong, Minghao; Zhai, Jinquan; Li, Yangding; Lu, Xiaoqi; Tian, Jie

2017-09-01

Although the activation of the prefrontal cortex (PFC) and the striatum had been found in smoking cue induced craving task, whether and how the functional interactions and white matter integrity between these brain regions contribute to craving processing during smoking cue exposure remains unknown. Twenty-five young male smokers and 26 age- and gender-matched nonsmokers participated in the smoking cue-reactivity task. Craving related brain activation was extracted and psychophysiological interactions (PPI) analysis was used to specify the PFC-efferent pathways contributed to smoking cue-induced craving. Diffusion tensor imaging (DTI) and probabilistic tractography was used to explore whether the fiber connectivity strength facilitated functional coupling of the circuit with the smoking cue-induced craving. The PPI analysis revealed the negative functional coupling of the left dorsolateral prefrontal cortex (DLPFC) and the caudate during smoking cue induced craving task, which positively correlated with the craving score. Neither significant activation nor functional connectivity in smoking cue exposure task was detected in nonsmokers. DTI analyses revealed that fiber tract integrity negatively correlated with functional coupling in the DLPFC-caudate pathway and activation of the caudate induced by smoking cue in smokers. Moreover, the relationship between the fiber connectivity integrity of the left DLPFC-caudate and smoking cue induced caudate activation can be fully mediated by functional coupling strength of this circuit in smokers. The present study highlighted the left DLPFC-caudate pathway in smoking cue-induced craving in smokers, which may reflect top-down prefrontal modulation of striatal reward processing in smoking cue induced craving processing. Hum Brain Mapp 38:4644-4656, 2017. © 2017 Wiley Periodicals, Inc. © 2017 Wiley Periodicals, Inc.
Near-optimal integration of facial form and motion.

Science.gov (United States)

Dobs, Katharina; Ma, Wei Ji; Reddy, Leila

2017-09-08

Human perception consists of the continuous integration of sensory cues pertaining to the same object. While it has been fairly well shown that humans use an optimal strategy when integrating low-level cues proportional to their relative reliability, the integration processes underlying high-level perception are much less understood. Here we investigate cue integration in a complex high-level perceptual system, the human face processing system. We tested cue integration of facial form and motion in an identity categorization task and found that an optimal model could successfully predict subjects' identity choices. Our results suggest that optimal cue integration may be implemented across different levels of the visual processing hierarchy.
Salience of Tactile Cues: An Examination of Tactor Actuator and Tactile Cue Characteristics

Science.gov (United States)

2015-08-01

Similarly, tactile alerts can help manage and focus attention in a complex high-tempo multitasked environment. Figure 1, while simple, can serve to...tactile cueing on concurrent performance of military and robotics tasks in a simulated multitasking environment. Ergonomics. 2008;51(8):1137–1152...2007;78(3):338. Moorhead IR, Holmes S, Furnell S. Understanding multisensory integration for pilot spatial orientation. Farnborough (UK): QinetiQ
Visual form Cues, Biological Motions, Auditory Cues, and Even Olfactory Cues Interact to Affect Visual Sex Discriminations

OpenAIRE

Rick Van Der Zwan; Anna Brooks; Duncan Blair; Coralia Machatch; Graeme Hacker

2011-01-01

Johnson and Tassinary (2005) proposed that visually perceived sex is signalled by structural or form cues. They suggested also that biological motion cues signal sex, but do so indirectly. We previously have shown that auditory cues can mediate visual sex perceptions (van der Zwan et al., 2009). Here we demonstrate that structural cues to body shape are alone sufficient for visual sex discriminations but that biological motion cues alone are not. Interestingly, biological motions can resolve ...
Inner Speech: Development, Cognitive Functions, Phenomenology, and Neurobiology

Science.gov (United States)

2015-01-01

Inner speech—also known as covert speech or verbal thinking—has been implicated in theories of cognitive development, speech monitoring, executive function, and psychopathology. Despite a growing body of knowledge on its phenomenology, development, and function, approaches to the scientific study of inner speech have remained diffuse and largely unintegrated. This review examines prominent theoretical approaches to inner speech and methodological challenges in its study, before reviewing current evidence on inner speech in children and adults from both typical and atypical populations. We conclude by considering prospects for an integrated cognitive science of inner speech, and present a multicomponent model of the phenomenon informed by developmental, cognitive, and psycholinguistic considerations. Despite its variability among individuals and across the life span, inner speech appears to perform significant functions in human cognition, which in some cases reflect its developmental origins and its sharing of resources with other cognitive processes. PMID:26011789
Effect of speech-intrinsic variations on human and automatic recognition of spoken phonemes.

Science.gov (United States)

Meyer, Bernd T; Brand, Thomas; Kollmeier, Birger

2011-01-01

The aim of this study is to quantify the gap between the recognition performance of human listeners and an automatic speech recognition (ASR) system with special focus on intrinsic variations of speech, such as speaking rate and effort, altered pitch, and the presence of dialect and accent. Second, it is investigated if the most common ASR features contain all information required to recognize speech in noisy environments by using resynthesized ASR features in listening experiments. For the phoneme recognition task, the ASR system achieved the human performance level only when the signal-to-noise ratio (SNR) was increased by 15 dB, which is an estimate for the human-machine gap in terms of the SNR. The major part of this gap is attributed to the feature extraction stage, since human listeners achieve comparable recognition scores when the SNR difference between unaltered and resynthesized utterances is 10 dB. Intrinsic variabilities result in strong increases of error rates, both in human speech recognition (HSR) and ASR (with a relative increase of up to 120%). An analysis of phoneme duration and recognition rates indicates that human listeners are better able to identify temporal cues than the machine at low SNRs, which suggests incorporating information about the temporal dynamics of speech into ASR systems.
Are precues effective in proactively controlling taboo interference during speech production?

Science.gov (United States)

White, Katherine K; Abrams, Lise; Hsi, Lisa R; Watkins, Emily C

2018-02-07

This research investigated whether precues engage proactive control to reduce emotional interference during speech production. A picture-word interference task required participants to name target pictures accompanied by taboo, negative, or neutral distractors. Proactive control was manipulated by presenting precues that signalled the type of distractor that would appear on the next trial. Experiment 1 included one block of trials with precues and one without, whereas Experiment 2 mixed precued and uncued trials. Consistent with previous research, picture naming was slowed in both experiments when distractors were taboo or negative compared to neutral, with the greatest slowing effect when distractors were taboo. Evidence that precues engaged proactive control to reduce interference from taboo (but not negative) distractors was found in Experiment 1. In contrast, mixing precued trials in Experiment 2 resulted in no taboo cueing benefit. These results suggest that item-level proactive control can be engaged under certain conditions to reduce taboo interference during speech production, findings that help to refine a role for cognitive control of distraction during speech production.
The development of co-speech gesture and its semantic integration with speech in 6- to 12-year-old children with autism spectrum disorders.

Science.gov (United States)

So, Wing-Chee; Wong, Miranda Kit-Yi; Lui, Ming; Yip, Virginia

2015-11-01

Previous work leaves open the question of whether children with autism spectrum disorders aged 6-12 years have delay in producing gestures compared to their typically developing peers. This study examined gestural production among school-aged children in a naturalistic context and how their gestures are semantically related to the accompanying speech. Delay in gestural production was found in children with autism spectrum disorders through their middle to late childhood. Compared to their typically developing counterparts, children with autism spectrum disorders gestured less often and used fewer types of gestures, in particular markers, which carry culture-specific meaning. Typically developing children's gestural production was related to language and cognitive skills, but among children with autism spectrum disorders, gestural production was more strongly related to the severity of socio-communicative impairment. Gesture impairment also included the failure to integrate speech with gesture: in particular, supplementary gestures are absent in children with autism spectrum disorders. The findings extend our understanding of gestural production in school-aged children with autism spectrum disorders during spontaneous interaction. The results can help guide new therapies for gestural production for children with autism spectrum disorders in middle and late childhood. © The Author(s) 2014.

Adapting Speech Recognition in Augmented Reality for Mobile Devices in Outdoor Environments

OpenAIRE

Pascoal, Rui; Ribeiro, Ricardo; Batista, Fernando; de Almeida, Ana

2017-01-01

This paper describes the process of integrating automatic speech recognition (ASR) into a mobile application and explores the benefits and challenges of integrating speech with augmented reality (AR) in outdoor environments. The augmented reality allows end-users to interact with the information displayed and perform tasks, while increasing the users perception about the real world by adding virtual information to it. Speech is the most natural way of communication: it allows hands-free inte...
LSVT LOUD and LSVT BIG: Behavioral Treatment Programs for Speech and Body Movement in Parkinson Disease

Directory of Open Access Journals (Sweden)

Cynthia Fox

2012-01-01

Full Text Available Recent advances in neuroscience have suggested that exercise-based behavioral treatments may improve function and possibly slow progression of motor symptoms in individuals with Parkinson disease (PD. The LSVT (Lee Silverman Voice Treatment Programs for individuals with PD have been developed and researched over the past 20 years beginning with a focus on the speech motor system (LSVT LOUD and more recently have been extended to address limb motor systems (LSVT BIG. The unique aspects of the LSVT Programs include the combination of (a an exclusive target on increasing amplitude (loudness in the speech motor system; bigger movements in the limb motor system, (b a focus on sensory recalibration to help patients recognize that movements with increased amplitude are within normal limits, even if they feel “too loud” or “too big,” and (c training self-cueing and attention to action to facilitate long-term maintenance of treatment outcomes. In addition, the intensive mode of delivery is consistent with principles that drive activity-dependent neuroplasticity and motor learning. The purpose of this paper is to provide an integrative discussion of the LSVT Programs including the rationale for their fundamentals, a summary of efficacy data, and a discussion of limitations and future directions for research.
Global cue inconsistency diminishes learning of cue validity

Directory of Open Access Journals (Sweden)

Tony Wang

2016-11-01

Full Text Available We present a novel two-stage probabilistic learning task that examines the participants’ ability to learn and utilize valid cues across several levels of probabilistic feedback. In the first stage, participants sample from one of three cues that gives predictive information about the outcome of the second stage. Participants are rewarded for correct prediction of the outcome in stage two. Only one of the three cues gives valid predictive information and thus participants can maximise their reward by learning to sample from the valid cue. The validity of this predictive information, however, is reinforced across several levels of probabilistic feedback. A second manipulation involved changing the consistency of the predictive information in stage one and the outcome in stage two. The results show that participants, with higher probabilistic feedback, learned to utilise the valid cue. In inconsistent task conditions, however, participants were significantly less successful in utilising higher validity cues. We interpret this result as implying that learning in probabilistic categorization is based on developing a representation of the task that allows for goal-directed action.
Studies of Speech Disorders in Schizophrenia. History and State-of-the-art

Directory of Open Access Journals (Sweden)

Shedovskiy E. F.

2015-08-01

Full Text Available The article reviews studies of speech disorders in schizophrenia. The authors paid attention to a historical course and characterization of studies of areas: the actual psychopathological (speech disorders as a psychopathological symptoms, their description and taxonomy, psychological (isolated neurons and pathopsychological perspective analysis separately analyzed some modern foreign works, covering a variety of approaches to the study of speech disorders in the endogenous mental disorders. Disorders and features of speech are among the most striking manifestations of schizophrenia along with impaired thinking (Savitskaya A. V., Mikirtumov B. E.. With all the variety of symptoms, speech disorders in schizophrenia could be classified and organized. The few clinical psychological studies of speech activity in schizophrenia presented work on the study of generation and standard speech utterance; features verbal associative process, speed parameters of speech utterances. Special attention is given to integrated research in the mainstream of biological psychiatry and genetic trends. It is shown that the topic for more than a half-century history of originality of speech pathology in schizophrenia has received some coverage in the psychiatric and psychological literature and continues to generate interest in the modern integrated multidisciplinary approach
Assessing the role of spectral and intensity cues in spectral ripple detection and discrimination in cochlear-implant users.

Science.gov (United States)

Anderson, Elizabeth S; Oxenham, Andrew J; Nelson, Peggy B; Nelson, David A

2012-12-01

Measures of spectral ripple resolution have become widely used psychophysical tools for assessing spectral resolution in cochlear-implant (CI) listeners. The objective of this study was to compare spectral ripple discrimination and detection in the same group of CI listeners. Ripple detection thresholds were measured over a range of ripple frequencies and were compared to spectral ripple discrimination thresholds previously obtained from the same CI listeners. The data showed that performance on the two measures was correlated, but that individual subjects' thresholds (at a constant spectral modulation depth) for the two tasks were not equivalent. In addition, spectral ripple detection was often found to be possible at higher rates than expected based on the available spectral cues, making it likely that temporal-envelope cues played a role at higher ripple rates. Finally, spectral ripple detection thresholds were compared to previously obtained speech-perception measures. Results confirmed earlier reports of a robust relationship between detection of widely spaced ripples and measures of speech recognition. In contrast, intensity difference limens for broadband noise did not correlate with spectral ripple detection measures, suggesting a dissociation between the ability to detect small changes in intensity across frequency and across time.
Unvoiced Speech Recognition Using Tissue-Conductive Acoustic Sensor

Directory of Open Access Journals (Sweden)

Heracleous Panikos

2007-01-01

Full Text Available We present the use of stethoscope and silicon NAM (nonaudible murmur microphones in automatic speech recognition. NAM microphones are special acoustic sensors, which are attached behind the talker's ear and can capture not only normal (audible speech, but also very quietly uttered speech (nonaudible murmur. As a result, NAM microphones can be applied in automatic speech recognition systems when privacy is desired in human-machine communication. Moreover, NAM microphones show robustness against noise and they might be used in special systems (speech recognition, speech transform, etc. for sound-impaired people. Using adaptation techniques and a small amount of training data, we achieved for a 20 k dictation task a word accuracy for nonaudible murmur recognition in a clean environment. In this paper, we also investigate nonaudible murmur recognition in noisy environments and the effect of the Lombard reflex on nonaudible murmur recognition. We also propose three methods to integrate audible speech and nonaudible murmur recognition using a stethoscope NAM microphone with very promising results.
Speech reception with different bilateral directional processing schemes: Influence of binaural hearing, audiometric asymmetry, and acoustic scenario.

Science.gov (United States)

Neher, Tobias; Wagener, Kirsten C; Latzel, Matthias

2017-09-01

Hearing aid (HA) users can differ markedly in their benefit from directional processing (or beamforming) algorithms. The current study therefore investigated candidacy for different bilateral directional processing schemes. Groups of elderly listeners with symmetric (N = 20) or asymmetric (N = 19) hearing thresholds for frequencies below 2 kHz, a large spread in the binaural intelligibility level difference (BILD), and no difference in age, overall degree of hearing loss, or performance on a measure of selective attention took part. Aided speech reception was measured using virtual acoustics together with a simulation of a linked pair of completely occluding behind-the-ear HAs. Five processing schemes and three acoustic scenarios were used. The processing schemes differed in the tradeoff between signal-to-noise ratio (SNR) improvement and binaural cue preservation. The acoustic scenarios consisted of a frontal target talker presented against two speech maskers from ±60° azimuth or spatially diffuse cafeteria noise. For both groups, a significant interaction between BILD, processing scheme, and acoustic scenario was found. This interaction implied that, in situations with lateral speech maskers, HA users with BILDs larger than about 2 dB profited more from preserved low-frequency binaural cues than from greater SNR improvement, whereas for smaller BILDs the opposite was true. Audiometric asymmetry reduced the influence of binaural hearing. In spatially diffuse noise, the maximal SNR improvement was generally beneficial. N 0 S π detection performance at 500 Hz predicted the benefit from low-frequency binaural cues. Together, these findings provide a basis for adapting bilateral directional processing to individual and situational influences. Further research is needed to investigate their generalizability to more realistic HA conditions (e.g., with low-frequency vent-transmitted sound). Copyright © 2017 Elsevier B.V. All rights reserved.
Integration of visual and non-visual self-motion cues during voluntary head movements in the human brain.

Science.gov (United States)

Schindler, Andreas; Bartels, Andreas

2018-05-15

Our phenomenological experience of the stable world is maintained by continuous integration of visual self-motion with extra-retinal signals. However, due to conventional constraints of fMRI acquisition in humans, neural responses to visuo-vestibular integration have only been studied using artificial stimuli, in the absence of voluntary head-motion. We here circumvented these limitations and let participants to move their heads during scanning. The slow dynamics of the BOLD signal allowed us to acquire neural signal related to head motion after the observer's head was stabilized by inflatable aircushions. Visual stimuli were presented on head-fixed display goggles and updated in real time as a function of head-motion that was tracked using an external camera. Two conditions simulated forward translation of the participant. During physical head rotation, the congruent condition simulated a stable world, whereas the incongruent condition added arbitrary lateral motion. Importantly, both conditions were precisely matched in visual properties and head-rotation. By comparing congruent with incongruent conditions we found evidence consistent with the multi-modal integration of visual cues with head motion into a coherent "stable world" percept in the parietal operculum and in an anterior part of parieto-insular cortex (aPIC). In the visual motion network, human regions MST, a dorsal part of VIP, the cingulate sulcus visual area (CSv) and a region in precuneus (Pc) showed differential responses to the same contrast. The results demonstrate for the first time neural multimodal interactions between precisely matched congruent versus incongruent visual and non-visual cues during physical head-movement in the human brain. The methodological approach opens the path to a new class of fMRI studies with unprecedented temporal and spatial control over visuo-vestibular stimulation. Copyright © 2018 Elsevier Inc. All rights reserved.
The level of audiovisual print-speech integration deficits in dyslexia.

Science.gov (United States)

Kronschnabel, Jens; Brem, Silvia; Maurer, Urs; Brandeis, Daniel

2014-09-01

The classical phonological deficit account of dyslexia is increasingly linked to impairments in grapho-phonological conversion, and to dysfunctions in superior temporal regions associated with audiovisual integration. The present study investigates mechanisms of audiovisual integration in typical and impaired readers at the critical developmental stage of adolescence. Congruent and incongruent audiovisual as well as unimodal (visual only and auditory only) material was presented. Audiovisual presentations were single letters and three-letter (consonant-vowel-consonant) stimuli accompanied by matching or mismatching speech sounds. Three-letter stimuli exhibited fast phonetic transitions as in real-life language processing and reading. Congruency effects, i.e. different brain responses to congruent and incongruent stimuli were taken as an indicator of audiovisual integration at a phonetic level (grapho-phonological conversion). Comparisons of unimodal and audiovisual stimuli revealed basic, more sensory aspects of audiovisual integration. By means of these two criteria of audiovisual integration, the generalizability of audiovisual deficits in dyslexia was tested. Moreover, it was expected that the more naturalistic three-letter stimuli are superior to single letters in revealing group differences. Electrophysiological and hemodynamic (EEG and fMRI) data were acquired simultaneously in a simple target detection task. Applying the same statistical models to event-related EEG potentials and fMRI responses allowed comparing the effects detected by the two techniques at a descriptive level. Group differences in congruency effects (congruent against incongruent) were observed in regions involved in grapho-phonological processing, including the left inferior frontal and angular gyri and the inferotemporal cortex. Importantly, such differences also emerged in superior temporal key regions. Three-letter stimuli revealed stronger group differences than single letters. No
Integration of reward signalling and appetite regulating peptide systems in the control of food-cue responses.

Science.gov (United States)

Reichelt, A C; Westbrook, R F; Morris, M J

2015-11-01

Understanding the neurobiological substrates that encode learning about food-associated cues and how those signals are modulated is of great clinical importance especially in light of the worldwide obesity problem. Inappropriate or maladaptive responses to food-associated cues can promote over-consumption, leading to excessive energy intake and weight gain. Chronic exposure to foods rich in fat and sugar alters the reinforcing value of foods and weakens inhibitory neural control, triggering learned, but maladaptive, associations between environmental cues and food rewards. Thus, responses to food-associated cues can promote cravings and food-seeking by activating mesocorticolimbic dopamine neurocircuitry, and exert physiological effects including salivation. These responses may be analogous to the cravings experienced by abstaining drug addicts that can trigger relapse into drug self-administration. Preventing cue-triggered eating may therefore reduce the over-consumption seen in obesity and binge-eating disorder. In this review we discuss recent research examining how cues associated with palatable foods can promote reward-based feeding behaviours and the potential involvement of appetite-regulating peptides including leptin, ghrelin, orexin and melanin concentrating hormone. These peptide signals interface with mesolimbic dopaminergic regions including the ventral tegmental area to modulate reactivity to cues associated with palatable foods. Thus, a novel target for anti-obesity therapeutics is to reduce non-homeostatic, reward driven eating behaviour, which can be triggered by environmental cues associated with highly palatable, fat and sugar rich foods. © 2015 The British Pharmacological Society.
The Auditory-Visual Speech Benefit on Working Memory in Older Adults with Hearing Impairment.

Science.gov (United States)

Frtusova, Jana B; Phillips, Natalie A

2016-01-01

This study examined the effect of auditory-visual (AV) speech stimuli on working memory in older adults with poorer-hearing (PH) in comparison to age- and education-matched older adults with better hearing (BH). Participants completed a working memory n-back task (0- to 2-back) in which sequences of digits were presented in visual-only (i.e., speech-reading), auditory-only (A-only), and AV conditions. Auditory event-related potentials (ERP) were collected to assess the relationship between perceptual and working memory processing. The behavioral results showed that both groups were faster in the AV condition in comparison to the unisensory conditions. The ERP data showed perceptual facilitation in the AV condition, in the form of reduced amplitudes and latencies of the auditory N1 and/or P1 components, in the PH group. Furthermore, a working memory ERP component, the P3, peaked earlier for both groups in the AV condition compared to the A-only condition. In general, the PH group showed a more robust AV benefit; however, the BH group showed a dose-response relationship between perceptual facilitation and working memory improvement, especially for facilitation of processing speed. Two measures, reaction time and P3 amplitude, suggested that the presence of visual speech cues may have helped the PH group to counteract the demanding auditory processing, to the level that no group differences were evident during the AV modality despite lower performance during the A-only condition. Overall, this study provides support for the theory of an integrated perceptual-cognitive system. The practical significance of these findings is also discussed.
ANALYSIS OF MULTIMODAL FUSION TECHNIQUES FOR AUDIO-VISUAL SPEECH RECOGNITION

Directory of Open Access Journals (Sweden)

D.V. Ivanko

2016-05-01

Full Text Available The paper deals with analytical review, covering the latest achievements in the field of audio-visual (AV fusion (integration of multimodal information. We discuss the main challenges and report on approaches to address them. One of the most important tasks of the AV integration is to understand how the modalities interact and influence each other. The paper addresses this problem in the context of AV speech processing and speech recognition. In the first part of the review we set out the basic principles of AV speech recognition and give the classification of audio and visual features of speech. Special attention is paid to the systematization of the existing techniques and the AV data fusion methods. In the second part we provide a consolidated list of tasks and applications that use the AV fusion based on carried out analysis of research area. We also indicate used methods, techniques, audio and video features. We propose classification of the AV integration, and discuss the advantages and disadvantages of different approaches. We draw conclusions and offer our assessment of the future in the field of AV fusion. In the further research we plan to implement a system of audio-visual Russian continuous speech recognition using advanced methods of multimodal fusion.
SUSTAINABILITY IN THE BOWELS OF SPEECHES

Directory of Open Access Journals (Sweden)

Jadir Mauro Galvao

2012-10-01

Full Text Available The theme of sustainability has not yet achieved the feat of make up as an integral part the theoretical medley that brings out our most everyday actions, often visits some of our thoughts and permeates many of our speeches. The big event of 2012, the meeting gathered Rio +20 glances from all corners of the planet around that theme as burning, but we still see forward timidly. Although we have no very clear what the term sustainability closes it does not sound quite strange. Associate with things like ecology, planet, wastes emitted by smokestacks of factories, deforestation, recycling and global warming must be related, but our goal in this article is the least of clarifying the term conceptually and more try to observe as it appears in speeches of such conference. When the competent authorities talk about sustainability relate to what? We intend to investigate the lines and between the lines of these speeches, any assumptions associated with the term. Therefore we will analyze the speech of the People´s Summit, the opening speech of President Dilma and emblematic speech of the President of Uruguay, José Pepe Mujica.
Non-fluent speech following stroke is caused by impaired efference copy.

Science.gov (United States)

Feenaughty, Lynda; Basilakos, Alexandra; Bonilha, Leonardo; den Ouden, Dirk-Bart; Rorden, Chris; Stark, Brielle; Fridriksson, Julius

2017-09-01

Efference copy is a cognitive mechanism argued to be critical for initiating and monitoring speech: however, the extent to which breakdown of efference copy mechanisms impact speech production is unclear. This study examined the best mechanistic predictors of non-fluent speech among 88 stroke survivors. Objective speech fluency measures were subjected to a principal component analysis (PCA). The primary PCA factor was then entered into a multiple stepwise linear regression analysis as the dependent variable, with a set of independent mechanistic variables. Participants' ability to mimic audio-visual speech ("speech entrainment response") was the best independent predictor of non-fluent speech. We suggest that this "speech entrainment" factor reflects integrity of internal monitoring (i.e., efference copy) of speech production, which affects speech initiation and maintenance. Results support models of normal speech production and suggest that therapy focused on speech initiation and maintenance may improve speech fluency for individuals with chronic non-fluent aphasia post stroke.
Counterconditioning reduces cue-induced craving and actual cue-elicited consumption.

Science.gov (United States)

Van Gucht, Dinska; Baeyens, Frank; Vansteenwegen, Debora; Hermans, Dirk; Beckers, Tom

2010-10-01

Cue-induced craving is not easily reduced by an extinction or exposure procedure and may constitute an important route toward relapse in addictive behavior after treatment. In the present study, we investigated the effectiveness of counterconditioning as an alternative procedure to reduce cue-induced craving, in a nonclinical population. We found that a cue, initially paired with chocolate consumption, did not cease to elicit craving for chocolate after extinction (repeated presentation of the cue without chocolate consumption), but did so after counterconditioning (repeated pairing of the cue with consumption of a highly disliked liquid, Polysorbate 20). This effect persisted after 1 week. Counterconditioning moreover was more effective than extinction in disrupting reported expectancy to get to eat chocolate, and also appeared to be more effective in reducing actual cue-elicited chocolate consumption. These results suggest that counterconditioning may be more promising than cue exposure for the prevention of relapse in addictive behavior. (PsycINFO Database Record (c) 2010 APA, all rights reserved).
Particularities of the (not so emotional speech in European Portuguese: Acted and spontaneous data analysis

Directory of Open Access Journals (Sweden)

Ana Margarida Belém Nunes

2017-12-01

Full Text Available The present article is a symbiosis of two previous studies made by the author on European Portuguese Emotional Speech. It is known that nonverbal vocal expressions, such as laughter, vocalizations and, for instance, screams are an important source of emotional cues in social contexts (Lima et al., 2013. In social contexts we get information’s about others emotional states also by facial and corporal expressions, touch and voice cues, (Lima et al., 2013 & Cowie et al, 2003. Nevertheless most of the existent research on emotion is based on simulated emotions that are induced in laboratory and/or produced by professional actors. In this study in particular, it is proposed to explore how much and in which voice related parameters spontaneous and acted speech diverge. On the other hand, this study will help to obtain data on emotional speech and to describe the expression of emotions, by voice alone, for the first time for European Portuguese. Analyses are mainly focused on parameters that are generally accepted as more directly related with voice quality like F0; jitter; shimmer and HNR (Lima et all, 2013; Tiovanen et al, 2006; Drioli et all, 2003. Given the scarcity of studies on voice quality in European Portuguese, it is important to highlight that this work presents original corpora specifically created for the presented research: a small corpus for spontaneous emotional speech and Feeltrace system to provide the necessary annotation and interpretation of emotions; a second corpus for acted emotions produced by a professional actor. It is particularly important to highlight that was found that European Portuguese presents some specificities on the values obtained for neutral expression, sadness and joy, that do not occur in other languages.
Neural Entrainment to Speech Modulates Speech Intelligibility

NARCIS (Netherlands)

Riecke, Lars; Formisano, Elia; Sorger, Bettina; Baskent, Deniz; Gaudrain, Etienne

2018-01-01

Speech is crucial for communication in everyday life. Speech-brain entrainment, the alignment of neural activity to the slow temporal fluctuations (envelope) of acoustic speech input, is a ubiquitous element of current theories of speech processing. Associations between speech-brain entrainment and
Effects of Early Bilingual Experience with a Tone and a Non-Tone Language on Speech-Music Integration.

Directory of Open Access Journals (Sweden)

Salomi S Asaridou

Full Text Available We investigated music and language processing in a group of early bilinguals who spoke a tone language and a non-tone language (Cantonese and Dutch. We assessed online speech-music processing interactions, that is, interactions that occur when speech and music are processed simultaneously in songs, with a speeded classification task. In this task, participants judged sung pseudowords either musically (based on the direction of the musical interval or phonologically (based on the identity of the sung vowel. We also assessed longer-term effects of linguistic experience on musical ability, that is, the influence of extensive prior experience with language when processing music. These effects were assessed with a task in which participants had to learn to identify musical intervals and with four pitch-perception tasks. Our hypothesis was that due to their experience in two different languages using lexical versus intonational tone, the early Cantonese-Dutch bilinguals would outperform the Dutch control participants. In online processing, the Cantonese-Dutch bilinguals processed speech and music more holistically than controls. This effect seems to be driven by experience with a tone language, in which integration of segmental and pitch information is fundamental. Regarding longer-term effects of linguistic experience, we found no evidence for a bilingual advantage in either the music-interval learning task or the pitch-perception tasks. Together, these results suggest that being a Cantonese-Dutch bilingual does not have any measurable longer-term effects on pitch and music processing, but does have consequences for how speech and music are processed jointly.
Retrieval of bilingual autobiographical memories: effects of cue language and cue imageability.

Science.gov (United States)

Mortensen, Linda; Berntsen, Dorthe; Bohn, Ocke-Schwen

2015-01-01

An important issue in theories of bilingual autobiographical memory is whether linguistically encoded memories are represented in language-specific stores or in a common language-independent store. Previous research has found that autobiographical memory retrieval is facilitated when the language of the cue is the same as the language of encoding, consistent with language-specific memory stores. The present study examined whether this language congruency effect is influenced by cue imageability. Danish-English bilinguals retrieved autobiographical memories in response to Danish and English high- or low-imageability cues. Retrieval latencies were shorter to Danish than English cues and shorter to high- than low-imageability cues. Importantly, the cue language effect was stronger for low-than high-imageability cues. To examine the relationship between cue language and the language of internal retrieval, participants identified the language in which the memories were internally retrieved. More memories were retrieved when the cue language was the same as the internal language than when the cue was in the other language, and more memories were identified as being internally retrieved in Danish than English, regardless of the cue language. These results provide further evidence for language congruency effects in bilingual memory and suggest that this effect is influenced by cue imageability.
Neural entrainment to rhythmically-presented auditory, visual and audio-visual speech in children

Directory of Open Access Journals (Sweden)

Alan James Power

2012-07-01

Full Text Available Auditory cortical oscillations have been proposed to play an important role in speech perception. It is suggested that the brain may take temporal ‘samples’ of information from the speech stream at different rates, phase-resetting ongoing oscillations so that they are aligned with similar frequency bands in the input (‘phase locking’. Information from these frequency bands is then bound together for speech perception. To date, there are no explorations of neural phase-locking and entrainment to speech input in children. However, it is clear from studies of language acquisition that infants use both visual speech information and auditory speech information in learning. In order to study neural entrainment to speech in typically-developing children, we use a rhythmic entrainment paradigm (underlying 2 Hz or delta rate based on repetition of the syllable ba, presented in either the auditory modality alone, the visual modality alone, or as auditory-visual speech (via a talking head. To ensure attention to the task, children aged 13 years were asked to press a button as fast as possible when the ba stimulus violated the rhythm for each stream type. Rhythmic violation depended on delaying the occurrence of a ba in the isochronous stream. Neural entrainment was demonstrated for all stream types, and individual differences in standardized measures of language processing were related to auditory entrainment at the theta rate. Further, there was significant modulation of the preferred phase of auditory entrainment in the theta band when visual speech cues were present, indicating cross-modal phase resetting. The rhythmic entrainment paradigm developed here offers a method for exploring individual differences in oscillatory phase locking during development. In particular, a method for assessing neural entrainment and cross-modal phase resetting would be useful for exploring developmental learning difficulties thought to involve temporal sampling

Study of wavelet packet energy entropy for emotion classification in speech and glottal signals

Science.gov (United States)

He, Ling; Lech, Margaret; Zhang, Jing; Ren, Xiaomei; Deng, Lihua

2013-07-01

The automatic speech emotion recognition has important applications in human-machine communication. Majority of current research in this area is focused on finding optimal feature parameters. In recent studies, several glottal features were examined as potential cues for emotion differentiation. In this study, a new type of feature parameter is proposed, which calculates energy entropy on values within selected Wavelet Packet frequency bands. The modeling and classification tasks are conducted using the classical GMM algorithm. The experiments use two data sets: the Speech Under Simulated Emotion (SUSE) data set annotated with three different emotions (angry, neutral and soft) and Berlin Emotional Speech (BES) database annotated with seven different emotions (angry, bored, disgust, fear, happy, sad and neutral). The average classification accuracy achieved for the SUSE data (74%-76%) is significantly higher than the accuracy achieved for the BES data (51%-54%). In both cases, the accuracy was significantly higher than the respective random guessing levels (33% for SUSE and 14.3% for BES).
Interaction and Representational Integration: Evidence from Speech Errors

Science.gov (United States)

Goldrick, Matthew; Baker, H. Ross; Murphy, Amanda; Baese-Berk, Melissa

2011-01-01

We examine the mechanisms that support interaction between lexical, phonological and phonetic processes during language production. Studies of the phonetics of speech errors have provided evidence that partially activated lexical and phonological representations influence phonetic processing. We examine how these interactive effects are modulated…
New tests of the distal speech rate effect: Examining cross-linguistic generalization

Directory of Open Access Journals (Sweden)

Laura eDilley

2013-12-01

Full Text Available Recent findings [Dilley and Pitt, 2010. Psych. Science. 21, 1664-1670] have shown that manipulating context speech rate in English can cause entire syllables to disappear or appear perceptually. The current studies tested two rate-based explanations of this phenomenon while attempting to replicate and extend these findings to another language, Russian. In Experiment 1, native Russian speakers listened to Russian sentences which had been subjected to rate manipulations and performed a lexical report task. Experiment 2 investigated speech rate effects in cross-language speech perception; non-native speakers of Russian of both high and low proficiency were tested on the same Russian sentences as in Experiment 1. They decided between two lexical interpretations of a critical portion of the sentence, where one choice contained more phonological material than the other (e.g., /stərʌ'na/ side vs. /strʌ'na/ country. In both experiments, with native and non-native speakers of Russian, context speech rate and the relative duration of the critical sentence portion were found to influence the amount of phonological material perceived. The results support the generalized rate normalization hypothesis, according to which the content perceived in a spectrally ambiguous stretch of speech depends on the duration of that content relative to the surrounding speech, while showing that the findings of Dilley and Pitt (2010 extend to a variety of morphosyntactic contexts and a new language, Russian. Findings indicate that relative timing cues across an utterance can be critical to accurate lexical perception by both native and non-native speakers.
Modern Tools in Patient-Centred Speech Therapy for Romanian Language

Directory of Open Access Journals (Sweden)

Mirela Danubianu

2016-03-01

Full Text Available The most common way to communicate with those around us is speech. Suffering from a speech disorder can have negative social effects: from leaving the individuals with low confidence and moral to problems with social interaction and the ability to live independently like adults. The speech therapy intervention is a complex process having particular objectives such as: discovery and identification of speech disorder and directing the therapy to correction, recovery, compensation, adaptation and social integration of patients. Computer-based Speech Therapy systems are a real help for therapists by creating a special learning environment. The Romanian language is a phonetic one, with special linguistic particularities. This paper aims to present a few computer-based speech therapy systems developed for the treatment of various speech disorders specific to Romanian language.
Detecting sarcasm from paralinguistic cues: anatomic and cognitive correlates in neurodegenerative disease.

Science.gov (United States)

Rankin, Katherine P; Salazar, Andrea; Gorno-Tempini, Maria Luisa; Sollberger, Marc; Wilson, Stephen M; Pavlic, Danijela; Stanley, Christine M; Glenn, Shenly; Weiner, Michael W; Miller, Bruce L

2009-10-01

While sarcasm can be conveyed solely through contextual cues such as counterfactual or echoic statements, face-to-face sarcastic speech may be characterized by specific paralinguistic features that alert the listener to interpret the utterance as ironic or critical, even in the absence of contextual information. We investigated the neuroanatomy underlying failure to understand sarcasm from dynamic vocal and facial paralinguistic cues. Ninety subjects (20 frontotemporal dementia, 11 semantic dementia [SemD], 4 progressive non-fluent aphasia, 27 Alzheimer's disease, 6 corticobasal degeneration, 9 progressive supranuclear palsy, 13 healthy older controls) were tested using the Social Inference - Minimal subtest of The Awareness of Social Inference Test (TASIT). Subjects watched brief videos depicting sincere or sarcastic communication and answered yes-no questions about the speaker's intended meaning. All groups interpreted Sincere (SIN) items normally, and only the SemD group was impaired on the Simple Sarcasm (SSR) condition. Patients failing the SSR performed more poorly on dynamic emotion recognition tasks and had more neuropsychiatric disturbances, but had better verbal and visuospatial working memory than patients who comprehended sarcasm. Voxel-based morphometry analysis of SSR scores in SPM5 demonstrated that poorer sarcasm comprehension was predicted by smaller volume in bilateral posterior parahippocampi (PHc), temporal poles, and R medial frontal pole (pFWE<0.05). This study provides lesion data suggesting that the PHc may be involved in recognizing a paralinguistic speech profile as abnormal, leading to interpretive processing by the temporal poles and right medial frontal pole that identifies the social context as sarcastic, and recognizes the speaker's paradoxical intentions.
A dominance hierarchy of auditory spatial cues in barn owls.

Directory of Open Access Journals (Sweden)

Ilana B Witten

2010-04-01

Full Text Available Barn owls integrate spatial information across frequency channels to localize sounds in space.We presented barn owls with synchronous sounds that contained different bands of frequencies (3-5 kHz and 7-9 kHz from different locations in space. When the owls were confronted with the conflicting localization cues from two synchronous sounds of equal level, their orienting responses were dominated by one of the sounds: they oriented toward the location of the low frequency sound when the sources were separated in azimuth; in contrast, they oriented toward the location of the high frequency sound when the sources were separated in elevation. We identified neural correlates of this behavioral effect in the optic tectum (OT, superior colliculus in mammals, which contains a map of auditory space and is involved in generating orienting movements to sounds. We found that low frequency cues dominate the representation of sound azimuth in the OT space map, whereas high frequency cues dominate the representation of sound elevation.We argue that the dominance hierarchy of localization cues reflects several factors: 1 the relative amplitude of the sound providing the cue, 2 the resolution with which the auditory system measures the value of a cue, and 3 the spatial ambiguity in interpreting the cue. These same factors may contribute to the relative weighting of sound localization cues in other species, including humans.
Sensory modality of smoking cues modulates neural cue reactivity.

Science.gov (United States)

Yalachkov, Yavor; Kaiser, Jochen; Görres, Andreas; Seehaus, Arne; Naumer, Marcus J

2013-01-01

Behavioral experiments have demonstrated that the sensory modality of presentation modulates drug cue reactivity. The present study on nicotine addiction tested whether neural responses to smoking cues are modulated by the sensory modality of stimulus presentation. We measured brain activation using functional magnetic resonance imaging (fMRI) in 15 smokers and 15 nonsmokers while they viewed images of smoking paraphernalia and control objects and while they touched the same objects without seeing them. Haptically presented, smoking-related stimuli induced more pronounced neural cue reactivity than visual cues in the left dorsal striatum in smokers compared to nonsmokers. The severity of nicotine dependence correlated positively with the preference for haptically explored smoking cues in the left inferior parietal lobule/somatosensory cortex, right fusiform gyrus/inferior temporal cortex/cerebellum, hippocampus/parahippocampal gyrus, posterior cingulate cortex, and supplementary motor area. These observations are in line with the hypothesized role of the dorsal striatum for the expression of drug habits and the well-established concept of drug-related automatized schemata, since haptic perception is more closely linked to the corresponding object-specific action pattern than visual perception. Moreover, our findings demonstrate that with the growing severity of nicotine dependence, brain regions involved in object perception, memory, self-processing, and motor control exhibit an increasing preference for haptic over visual smoking cues. This difference was not found for control stimuli. Considering the sensory modality of the presented cues could serve to develop more reliable fMRI-specific biomarkers, more ecologically valid experimental designs, and more effective cue-exposure therapies of addiction.
A treat for the eyes. An eye-tracking study on children's attention to unhealthy and healthy food cues in media content.

Science.gov (United States)

Spielvogel, Ines; Matthes, Jörg; Naderer, Brigitte; Karsay, Kathrin

2018-06-01

Based on cue reactivity theory, food cues embedded in media content can lead to physiological and psychological responses in children. Research suggests that unhealthy food cues are represented more extensively and interactively in children's media environments than healthy ones. However, it is not clear to this date whether children react differently to unhealthy compared to healthy food cues. In an experimental study with 56 children (55.4% girls; M age  = 8.00, SD = 1.58), we used eye-tracking to determine children's attention to unhealthy and healthy food cues embedded in a narrative cartoon movie. Besides varying the food type (i.e., healthy vs. unhealthy), we also manipulated the integration levels of food cues with characters (i.e., level of food integration; no interaction vs. handling vs. consumption), and we assessed children's individual susceptibility factors by measuring the impact of their hunger level. Our results indicated that unhealthy food cues attract children's visual attention to a larger extent than healthy cues. However, their initial visual interest did not differ between unhealthy and healthy food cues. Furthermore, an increase in the level of food integration led to an increase in visual attention. Our findings showed no moderating impact of hunger. We conclude that especially unhealthy food cues with an interactive connection trigger cue reactivity in children. Copyright © 2018 Elsevier Ltd. All rights reserved.
Unvoiced Speech Recognition Using Tissue-Conductive Acoustic Sensor

Directory of Open Access Journals (Sweden)

Hiroshi Saruwatari

2007-01-01

Full Text Available We present the use of stethoscope and silicon NAM (nonaudible murmur microphones in automatic speech recognition. NAM microphones are special acoustic sensors, which are attached behind the talker's ear and can capture not only normal (audible speech, but also very quietly uttered speech (nonaudible murmur. As a result, NAM microphones can be applied in automatic speech recognition systems when privacy is desired in human-machine communication. Moreover, NAM microphones show robustness against noise and they might be used in special systems (speech recognition, speech transform, etc. for sound-impaired people. Using adaptation techniques and a small amount of training data, we achieved for a 20 k dictation task a 93.9% word accuracy for nonaudible murmur recognition in a clean environment. In this paper, we also investigate nonaudible murmur recognition in noisy environments and the effect of the Lombard reflex on nonaudible murmur recognition. We also propose three methods to integrate audible speech and nonaudible murmur recognition using a stethoscope NAM microphone with very promising results.
A Maximum Likelihood Estimation of Vocal-Tract-Related Filter Characteristics for Single Channel Speech Separation

Directory of Open Access Journals (Sweden)

Dansereau Richard M

2007-01-01

Full Text Available We present a new technique for separating two speech signals from a single recording. The proposed method bridges the gap between underdetermined blind source separation techniques and those techniques that model the human auditory system, that is, computational auditory scene analysis (CASA. For this purpose, we decompose the speech signal into the excitation signal and the vocal-tract-related filter and then estimate the components from the mixed speech using a hybrid model. We first express the probability density function (PDF of the mixed speech's log spectral vectors in terms of the PDFs of the underlying speech signal's vocal-tract-related filters. Then, the mean vectors of PDFs of the vocal-tract-related filters are obtained using a maximum likelihood estimator given the mixed signal. Finally, the estimated vocal-tract-related filters along with the extracted fundamental frequencies are used to reconstruct estimates of the individual speech signals. The proposed technique effectively adds vocal-tract-related filter characteristics as a new cue to CASA models using a new grouping technique based on an underdetermined blind source separation. We compare our model with both an underdetermined blind source separation and a CASA method. The experimental results show that our model outperforms both techniques in terms of SNR improvement and the percentage of crosstalk suppression.
A Maximum Likelihood Estimation of Vocal-Tract-Related Filter Characteristics for Single Channel Speech Separation

Directory of Open Access Journals (Sweden)

Mohammad H. Radfar

2006-11-01

Full Text Available We present a new technique for separating two speech signals from a single recording. The proposed method bridges the gap between underdetermined blind source separation techniques and those techniques that model the human auditory system, that is, computational auditory scene analysis (CASA. For this purpose, we decompose the speech signal into the excitation signal and the vocal-tract-related filter and then estimate the components from the mixed speech using a hybrid model. We first express the probability density function (PDF of the mixed speech's log spectral vectors in terms of the PDFs of the underlying speech signal's vocal-tract-related filters. Then, the mean vectors of PDFs of the vocal-tract-related filters are obtained using a maximum likelihood estimator given the mixed signal. Finally, the estimated vocal-tract-related filters along with the extracted fundamental frequencies are used to reconstruct estimates of the individual speech signals. The proposed technique effectively adds vocal-tract-related filter characteristics as a new cue to CASA models using a new grouping technique based on an underdetermined blind source separation. We compare our model with both an underdetermined blind source separation and a CASA method. The experimental results show that our model outperforms both techniques in terms of SNR improvement and the percentage of crosstalk suppression.
Processing changes when listening to foreign-accented speech

Directory of Open Access Journals (Sweden)

Carlos eRomero-Rivas

2015-03-01

Full Text Available This study investigates the mechanisms responsible for fast changes in processing foreign-accented speech. Event Related brain Potentials (ERPs were obtained while native speakers of Spanish listened to native and foreign-accented speakers of Spanish. We observed a less positive P200 component for foreign-accented speech relative to native speech comprehension. This suggests that the extraction of spectral information and other important acoustic features was hampered during foreign-accented speech comprehension. However, the amplitude of the N400 component for foreign-accented speech comprehension decreased across the experiment, suggesting the use of a higher level, lexical mechanism. Furthermore, during native speech comprehension, semantic violations in the critical words elicited an N400 effect followed by a late positivity. During foreign-accented speech comprehension, semantic violations only elicited an N400 effect. Overall, our results suggest that, despite a lack of improvement in phonetic discrimination, native listeners experience changes at lexical-semantic levels of processing after brief exposure to foreign-accented speech. Moreover, these results suggest that lexical access, semantic integration and linguistic re-analysis processes are permeable to external factors, such as the accent of the speaker.
THE SELF-CORRECTION OF ENGLISH SPEECH ERRORS IN SECOND LANGUANGE LEARNING

Directory of Open Access Journals (Sweden)

Ketut Santi Indriani

2015-05-01

Full Text Available The process of second language (L2 learning is strongly influenced by the factors of error reconstruction that occur when the language is learned. Errors will definitely appear in the learning process. However, errors can be used as a step to accelerate the process of understanding the language. Doing self-correction (with or without giving cues is one of the examples. In the aspect of speaking, self-correction is done immediately after the error appears. This study is aimed at finding (i what speech errors the L2 speakers are able to identify, (ii of the errors identified, what speech errors the L2 speakers are able to self correct and (iii whether the self-correction of speech error are able to immediately improve the L2 learning. Based on the data analysis, it was found that the majority identified errors are related to noun (plurality, subject-verb agreement, grammatical structure and pronunciation.. B2 speakers tend to correct errors properly. Of the 78% identified speech errors, as much as 66% errors could be self-corrected accurately by the L2 speakers. Based on the analysis, it was also found that self-correction is able to improve L2 learning ability directly. This is evidenced by the absence of repetition of the same error after the error had been corrected.
Speech networks at rest and in action: interactions between functional brain networks controlling speech production

Science.gov (United States)

Fuertinger, Stefan

2015-01-01

Speech production is one of the most complex human behaviors. Although brain activation during speaking has been well investigated, our understanding of interactions between the brain regions and neural networks remains scarce. We combined seed-based interregional correlation analysis with graph theoretical analysis of functional MRI data during the resting state and sentence production in healthy subjects to investigate the interface and topology of functional networks originating from the key brain regions controlling speech, i.e., the laryngeal/orofacial motor cortex, inferior frontal and superior temporal gyri, supplementary motor area, cingulate cortex, putamen, and thalamus. During both resting and speaking, the interactions between these networks were bilaterally distributed and centered on the sensorimotor brain regions. However, speech production preferentially recruited the inferior parietal lobule (IPL) and cerebellum into the large-scale network, suggesting the importance of these regions in facilitation of the transition from the resting state to speaking. Furthermore, the cerebellum (lobule VI) was the most prominent region showing functional influences on speech-network integration and segregation. Although networks were bilaterally distributed, interregional connectivity during speaking was stronger in the left vs. right hemisphere, which may have underlined a more homogeneous overlap between the examined networks in the left hemisphere. Among these, the laryngeal motor cortex (LMC) established a core network that fully overlapped with all other speech-related networks, determining the extent of network interactions. Our data demonstrate complex interactions of large-scale brain networks controlling speech production and point to the critical role of the LMC, IPL, and cerebellum in the formation of speech production network. PMID:25673742
Prosodic differences between declaratives and interrogatives in infant-directed speech.

Science.gov (United States)

Geffen, Susan; Mintz, Toben H

2017-07-01

In many languages, declaratives and interrogatives differ in word order properties, and in syntactic organization more broadly. Thus, in order to learn the distinct syntactic properties of the two sentence types, learners must first be able to distinguish them using non-syntactic information. Prosodic information is often assumed to be a useful basis for this type of discrimination, although no systematic studies of the prosodic cues available to infants have been reported. Analysis of maternal speech in three Standard American English-speaking mother-infant dyads found that polar interrogatives differed from declaratives on the patterning of pitch and duration on the final two syllables, but wh-questions did not. Thus, while prosody is unlikely to aid discrimination of declaratives from wh-questions, infant-directed speech provides prosodic information that infants could use to distinguish declaratives and polar interrogatives. We discuss how learners could leverage this information to identify all question forms, in the context of syntax acquisition.
Contribution of auditory working memory to speech understanding in mandarin-speaking cochlear implant users.

Science.gov (United States)

Tao, Duoduo; Deng, Rui; Jiang, Ye; Galvin, John J; Fu, Qian-Jie; Chen, Bing

2014-01-01

of voice pitch cues (albeit poorly coded by the CI) did not influence the relationship between working memory and speech perception.
Perception of audio-visual speech synchrony in Spanish-speaking children with and without specific language impairment.

Science.gov (United States)

Pons, Ferran; Andreu, Llorenç; Sanz-Torrent, Monica; Buil-Legaz, Lucía; Lewkowicz, David J

2013-06-01

Speech perception involves the integration of auditory and visual articulatory information, and thus requires the perception of temporal synchrony between this information. There is evidence that children with specific language impairment (SLI) have difficulty with auditory speech perception but it is not known if this is also true for the integration of auditory and visual speech. Twenty Spanish-speaking children with SLI, twenty typically developing age-matched Spanish-speaking children, and twenty Spanish-speaking children matched for MLU-w participated in an eye-tracking study to investigate the perception of audiovisual speech synchrony. Results revealed that children with typical language development perceived an audiovisual asynchrony of 666 ms regardless of whether the auditory or visual speech attribute led the other one. Children with SLI only detected the 666 ms asynchrony when the auditory component preceded [corrected] the visual component. None of the groups perceived an audiovisual asynchrony of 366 ms. These results suggest that the difficulty of speech processing by children with SLI would also involve difficulties in integrating auditory and visual aspects of speech perception.
Are computers effective lie detectors? A meta-analysis of linguistic cues to deception.

Science.gov (United States)

Hauch, Valerie; Blandón-Gitlin, Iris; Masip, Jaume; Sporer, Siegfried L

2015-11-01

This meta-analysis investigates linguistic cues to deception and whether these cues can be detected with computer programs. We integrated operational definitions for 79 cues from 44 studies where software had been used to identify linguistic deception cues. These cues were allocated to six research questions. As expected, the meta-analyses demonstrated that, relative to truth-tellers, liars experienced greater cognitive load, expressed more negative emotions, distanced themselves more from events, expressed fewer sensory-perceptual words, and referred less often to cognitive processes. However, liars were not more uncertain than truth-tellers. These effects were moderated by event type, involvement, emotional valence, intensity of interaction, motivation, and other moderators. Although the overall effect size was small, theory-driven predictions for certain cues received support. These findings not only further our knowledge about the usefulness of linguistic cues to detect deception with computers in applied settings but also elucidate the relationship between language and deception. © 2014 by the Society for Personality and Social Psychology, Inc.
A Binaural Grouping Model for Predicting Speech Intelligibility in Multitalker Environments

Directory of Open Access Journals (Sweden)

Jing Mi

2016-09-01

Full Text Available Spatially separating speech maskers from target speech often leads to a large intelligibility improvement. Modeling this phenomenon has long been of interest to binaural-hearing researchers for uncovering brain mechanisms and for improving signal-processing algorithms in hearing-assistive devices. Much of the previous binaural modeling work focused on the unmasking enabled by binaural cues at the periphery, and little quantitative modeling has been directed toward the grouping or source-separation benefits of binaural processing. In this article, we propose a binaural model that focuses on grouping, specifically on the selection of time-frequency units that are dominated by signals from the direction of the target. The proposed model uses Equalization-Cancellation (EC processing with a binary decision rule to estimate a time-frequency binary mask. EC processing is carried out to cancel the target signal and the energy change between the EC input and output is used as a feature that reflects target dominance in each time-frequency unit. The processing in the proposed model requires little computational resources and is straightforward to implement. In combination with the Coherence-based Speech Intelligibility Index, the model is applied to predict the speech intelligibility data measured by Marrone et al. The predicted speech reception threshold matches the pattern of the measured data well, even though the predicted intelligibility improvements relative to the colocated condition are larger than some of the measured data, which may reflect the lack of internal noise in this initial version of the model.
A Binaural Grouping Model for Predicting Speech Intelligibility in Multitalker Environments.

Science.gov (United States)

Mi, Jing; Colburn, H Steven

2016-10-03

Spatially separating speech maskers from target speech often leads to a large intelligibility improvement. Modeling this phenomenon has long been of interest to binaural-hearing researchers for uncovering brain mechanisms and for improving signal-processing algorithms in hearing-assistive devices. Much of the previous binaural modeling work focused on the unmasking enabled by binaural cues at the periphery, and little quantitative modeling has been directed toward the grouping or source-separation benefits of binaural processing. In this article, we propose a binaural model that focuses on grouping, specifically on the selection of time-frequency units that are dominated by signals from the direction of the target. The proposed model uses Equalization-Cancellation (EC) processing with a binary decision rule to estimate a time-frequency binary mask. EC processing is carried out to cancel the target signal and the energy change between the EC input and output is used as a feature that reflects target dominance in each time-frequency unit. The processing in the proposed model requires little computational resources and is straightforward to implement. In combination with the Coherence-based Speech Intelligibility Index, the model is applied to predict the speech intelligibility data measured by Marrone et al. The predicted speech reception threshold matches the pattern of the measured data well, even though the predicted intelligibility improvements relative to the colocated condition are larger than some of the measured data, which may reflect the lack of internal noise in this initial version of the model. © The Author(s) 2016.

Five-Year-Olds’ and Adults’ Use of Paralinguistic Cues to Overcome Referential Uncertainty

Directory of Open Access Journals (Sweden)

Justine M. Thacker

2018-02-01

Full Text Available An eye-tracking methodology was used to explore adults’ and children’s use of two utterance-based cues to overcome referential uncertainty in real time. Participants were first introduced to two characters with distinct color preferences. These characters then produced fluent (“Look! Look at the blicket.” or disfluent (“Look! Look at thee, uh, blicket.” instructions referring to novel objects in a display containing both talker-preferred and talker-dispreferred colored items. Adults (Expt 1, n = 24 directed a greater proportion of looks to talker-preferred objects during the initial portion of the utterance (“Look! Look at…”, reflecting the use of indexical cues for talker identity. However, they immediately reduced consideration of an object bearing the talker’s preferred color when the talker was disfluent, suggesting they infer disfluency would be more likely as a talker describes dispreferred objects. Like adults, 5-year-olds (Expt 2, n = 27 directed more attention to talker-preferred objects during the initial portion of the utterance. Children’s initial predictions, however, were not modulated when disfluency was encountered. Together, these results demonstrate that adults, but not 5-year-olds, can act on information from two talker-produced cues within an utterance, talker preference, and speech disfluencies, to establish reference.
Combining symbolic cues with sensory input and prior experience in an iterative Bayesian framework

Directory of Open Access Journals (Sweden)

Frederike Hermi Petzschner

2012-08-01

Full Text Available Perception and action are the result of an integration of various sources of information, such as current sensory input, prior experience, or the context in which a stimulus occurs. Often, the interpretation is not trivial hence needs to be learned from the co-occurrence of stimuli. Yet, how do we combine such diverse information to guide our action?Here we use a distance production-reproduction task to investigate the influence of auxiliary, symbolic cues, sensory input, and prior experience on human performance under three different conditions that vary in the information provided. Our results indicate that subjects can (1 learn the mapping of a verbal, symbolic cue onto the stimulus dimension and (2 integrate symbolic information and prior experience into their estimate of displacements.The behavioral results are explained by to two distinct generative models that represent different structural approaches of how a Bayesian observer would combine prior experience, sensory input, and symbolic cue information into a single estimate of displacement. The first model interprets the symbolic cue in the context of categorization, assuming that it reflects information about a distinct underlying stimulus range (categorical model. The second model applies a multi-modal integration approach and treats the symbolic cue as additional sensory input to the system, which is combined with the current sensory measurement and the subjects’ prior experience (cue-combination model. Notably, both models account equally well for the observed behavior despite their different structural assumptions. The present work thus provides evidence that humans can interpret abstract symbolic information and combine it with other types of information such as sensory input and prior experience. The similar explanatory power of the two models further suggest that issues such as categorization and cue-combination could be explained by alternative probabilistic approaches.
Inner Speech's Relationship With Overt Speech in Poststroke Aphasia.

Science.gov (United States)

Stark, Brielle C; Geva, Sharon; Warburton, Elizabeth A

2017-09-18

Relatively preserved inner speech alongside poor overt speech has been documented in some persons with aphasia (PWA), but the relationship of overt speech with inner speech is still largely unclear, as few studies have directly investigated these factors. The present study investigates the relationship of relatively preserved inner speech in aphasia with selected measures of language and cognition. Thirty-eight persons with chronic aphasia (27 men, 11 women; average age 64.53 ± 13.29 years, time since stroke 8-111 months) were classified as having relatively preserved inner and overt speech (n = 21), relatively preserved inner speech with poor overt speech (n = 8), or not classified due to insufficient measurements of inner and/or overt speech (n = 9). Inner speech scores (by group) were correlated with selected measures of language and cognition from the Comprehensive Aphasia Test (Swinburn, Porter, & Al, 2004). The group with poor overt speech showed a significant relationship of inner speech with overt naming (r = .95, p speech and language and cognition factors were not significant for the group with relatively good overt speech. As in previous research, we show that relatively preserved inner speech is found alongside otherwise severe production deficits in PWA. PWA with poor overt speech may rely more on preserved inner speech for overt picture naming (perhaps due to shared resources with verbal working memory) and for written picture description (perhaps due to reliance on inner speech due to perceived task difficulty). Assessments of inner speech may be useful as a standard component of aphasia screening, and therapy focused on improving and using inner speech may prove clinically worthwhile. https://doi.org/10.23641/asha.5303542.
Effects of congruent and incongruent visual cues on speech perception and brain activity in cochlear implant users.

Science.gov (United States)

Song, Jae-Jin; Lee, Hyo-Jeong; Kang, Hyejin; Lee, Dong Soo; Chang, Sun O; Oh, Seung Ha

2015-03-01

While deafness-induced plasticity has been investigated in the visual and auditory domains, not much is known about language processing in audiovisual multimodal environments for patients with restored hearing via cochlear implant (CI) devices. Here, we examined the effect of agreeing or conflicting visual inputs on auditory processing in deaf patients equipped with degraded artificial hearing. Ten post-lingually deafened CI users with good performance, along with matched control subjects, underwent H 2 (15) O-positron emission tomography scans while carrying out a behavioral task requiring the extraction of speech information from unimodal auditory stimuli, bimodal audiovisual congruent stimuli, and incongruent stimuli. Regardless of congruency, the control subjects demonstrated activation of the auditory and visual sensory cortices, as well as the superior temporal sulcus, the classical multisensory integration area, indicating a bottom-up multisensory processing strategy. Compared to CI users, the control subjects exhibited activation of the right ventral premotor-supramarginal pathway. In contrast, CI users activated primarily the visual cortices more in the congruent audiovisual condition than in the null condition. In addition, compared to controls, CI users displayed an activation focus in the right amygdala for congruent audiovisual stimuli. The most notable difference between the two groups was an activation focus in the left inferior frontal gyrus in CI users confronted with incongruent audiovisual stimuli, suggesting top-down cognitive modulation for audiovisual conflict. Correlation analysis revealed that good speech performance was positively correlated with right amygdala activity for the congruent condition, but negatively correlated with bilateral visual cortices regardless of congruency. Taken together these results suggest that for multimodal inputs, cochlear implant users are more vision-reliant when processing congruent stimuli and are disturbed
Does computer-synthesized speech manifest personality? Experimental tests of recognition, similarity-attraction, and consistency-attraction.

Science.gov (United States)

Nass, C; Lee, K M

2001-09-01

Would people exhibit similarity-attraction and consistency-attraction toward unambiguously computer-generated speech even when personality is clearly not relevant? In Experiment 1, participants (extrovert or introvert) heard a synthesized voice (extrovert or introvert) on a book-buying Web site. Participants accurately recognized personality cues in text to speech and showed similarity-attraction in their evaluation of the computer voice, the book reviews, and the reviewer. Experiment 2, in a Web auction context, added personality of the text to the previous design. The results replicated Experiment 1 and demonstrated consistency (voice and text personality)-attraction. To maximize liking and trust, designers should set parameters, for example, words per minute or frequency range, that create a personality that is consistent with the user and the content being presented.
Automatic Speech Recognition from Neural Signals: A Focused Review

Directory of Open Access Journals (Sweden)

Christian Herff

2016-09-01

Full Text Available Speech interfaces have become widely accepted and are nowadays integrated in various real-life applications and devices. They have become a part of our daily life. However, speech interfaces presume the ability to produce intelligible speech, which might be impossible due to either loud environments, bothering bystanders or incapabilities to produce speech (i.e.~patients suffering from locked-in syndrome. For these reasons it would be highly desirable to not speak but to simply envision oneself to say words or sentences. Interfaces based on imagined speech would enable fast and natural communication without the need for audible speech and would give a voice to otherwise mute people.This focused review analyzes the potential of different brain imaging techniques to recognize speech from neural signals by applying Automatic Speech Recognition technology. We argue that modalities based on metabolic processes, such as functional Near Infrared Spectroscopy and functional Magnetic Resonance Imaging, are less suited for Automatic Speech Recognition from neural signals due to low temporal resolution but are very useful for the investigation of the underlying neural mechanisms involved in speech processes. In contrast, electrophysiologic activity is fast enough to capture speech processes and is therefor better suited for ASR. Our experimental results indicate the potential of these signals for speech recognition from neural data with a focus on invasively measured brain activity (electrocorticography. As a first example of Automatic Speech Recognition techniques used from neural signals, we discuss the emph{Brain-to-text} system.
Musician advantage for speech-on-speech perception

NARCIS (Netherlands)

Başkent, Deniz; Gaudrain, Etienne

Evidence for transfer of musical training to better perception of speech in noise has been mixed. Unlike speech-in-noise, speech-on-speech perception utilizes many of the skills that musical training improves, such as better pitch perception and stream segregation, as well as use of higher-level
Reduced neural integration of letters and speech sounds in dyslexic children scales with individual differences in reading fluency.

Directory of Open Access Journals (Sweden)

Gojko Žarić

Full Text Available The acquisition of letter-speech sound associations is one of the basic requirements for fluent reading acquisition and its failure may contribute to reading difficulties in developmental dyslexia. Here we investigated event-related potential (ERP measures of letter-speech sound integration in 9-year-old typical and dyslexic readers and specifically test their relation to individual differences in reading fluency. We employed an audiovisual oddball paradigm in typical readers (n = 20, dysfluent (n = 18 and severely dysfluent (n = 18 dyslexic children. In one auditory and two audiovisual conditions the Dutch spoken vowels/a/and/o/were presented as standard and deviant stimuli. In audiovisual blocks, the letter 'a' was presented either simultaneously (AV0, or 200 ms before (AV200 vowel sound onset. Across the three children groups, vowel deviancy in auditory blocks elicited comparable mismatch negativity (MMN and late negativity (LN responses. In typical readers, both audiovisual conditions (AV0 and AV200 led to enhanced MMN and LN amplitudes. In both dyslexic groups, the audiovisual LN effects were mildly reduced. Most interestingly, individual differences in reading fluency were correlated with MMN latency in the AV0 condition. A further analysis revealed that this effect was driven by a short-lived MMN effect encompassing only the N1 window in severely dysfluent dyslexics versus a longer MMN effect encompassing both the N1 and P2 windows in the other two groups. Our results confirm and extend previous findings in dyslexic children by demonstrating a deficient pattern of letter-speech sound integration depending on the level of reading dysfluency. These findings underscore the importance of considering individual differences across the entire spectrum of reading skills in addition to group differences between typical and dyslexic readers.
Effects of Age and Working Memory Capacity on Speech Recognition Performance in Noise Among Listeners With Normal Hearing.

Science.gov (United States)

Gordon-Salant, Sandra; Cole, Stacey Samuels

2016-01-01

This study aimed to determine if younger and older listeners with normal hearing who differ on working memory span perform differently on speech recognition tests in noise. Older adults typically exhibit poorer speech recognition scores in noise than younger adults, which is attributed primarily to poorer hearing sensitivity and more limited working memory capacity in older than younger adults. Previous studies typically tested older listeners with poorer hearing sensitivity and shorter working memory spans than younger listeners, making it difficult to discern the importance of working memory capacity on speech recognition. This investigation controlled for hearing sensitivity and compared speech recognition performance in noise by younger and older listeners who were subdivided into high and low working memory groups. Performance patterns were compared for different speech materials to assess whether or not the effect of working memory capacity varies with the demands of the specific speech test. The authors hypothesized that (1) normal-hearing listeners with low working memory span would exhibit poorer speech recognition performance in noise than those with high working memory span; (2) older listeners with normal hearing would show poorer speech recognition scores than younger listeners with normal hearing, when the two age groups were matched for working memory span; and (3) an interaction between age and working memory would be observed for speech materials that provide contextual cues. Twenty-eight older (61 to 75 years) and 25 younger (18 to 25 years) normal-hearing listeners were assigned to groups based on age and working memory status. Northwestern University Auditory Test No. 6 words and Institute of Electrical and Electronics Engineers sentences were presented in noise using an adaptive procedure to measure the signal-to-noise ratio corresponding to 50% correct performance. Cognitive ability was evaluated with two tests of working memory (Listening
Speech networks at rest and in action: interactions between functional brain networks controlling speech production.

Science.gov (United States)

Simonyan, Kristina; Fuertinger, Stefan

2015-04-01

Speech production is one of the most complex human behaviors. Although brain activation during speaking has been well investigated, our understanding of interactions between the brain regions and neural networks remains scarce. We combined seed-based interregional correlation analysis with graph theoretical analysis of functional MRI data during the resting state and sentence production in healthy subjects to investigate the interface and topology of functional networks originating from the key brain regions controlling speech, i.e., the laryngeal/orofacial motor cortex, inferior frontal and superior temporal gyri, supplementary motor area, cingulate cortex, putamen, and thalamus. During both resting and speaking, the interactions between these networks were bilaterally distributed and centered on the sensorimotor brain regions. However, speech production preferentially recruited the inferior parietal lobule (IPL) and cerebellum into the large-scale network, suggesting the importance of these regions in facilitation of the transition from the resting state to speaking. Furthermore, the cerebellum (lobule VI) was the most prominent region showing functional influences on speech-network integration and segregation. Although networks were bilaterally distributed, interregional connectivity during speaking was stronger in the left vs. right hemisphere, which may have underlined a more homogeneous overlap between the examined networks in the left hemisphere. Among these, the laryngeal motor cortex (LMC) established a core network that fully overlapped with all other speech-related networks, determining the extent of network interactions. Our data demonstrate complex interactions of large-scale brain networks controlling speech production and point to the critical role of the LMC, IPL, and cerebellum in the formation of speech production network. Copyright © 2015 the American Physiological Society.
Commencement Speech as a Hybrid Polydiscursive Practice

Directory of Open Access Journals (Sweden)

Светлана Викторовна Иванова

2017-12-01

Full Text Available Discourse and media communication researchers pay attention to the fact that popular discursive and communicative practices have a tendency to hybridization and convergence. Discourse which is understood as language in use is flexible. Consequently, it turns out that one and the same text can represent several types of discourses. A vivid example of this tendency is revealed in American commencement speech / commencement address / graduation speech. A commencement speech is a speech university graduates are addressed with which in compliance with the modern trend is delivered by outstanding media personalities (politicians, athletes, actors, etc.. The objective of this study is to define the specificity of the realization of polydiscursive practices within commencement speech. The research involves discursive, contextual, stylistic and definitive analyses. Methodologically the study is based on the discourse analysis theory, in particular the notion of a discursive practice as a verbalized social practice makes up the conceptual basis of the research. This research draws upon a hundred commencement speeches delivered by prominent representatives of American society since 1980s till now. In brief, commencement speech belongs to institutional discourse public speech embodies. Commencement speech institutional parameters are well represented in speeches delivered by people in power like American and university presidents. Nevertheless, as the results of the research indicate commencement speech institutional character is not its only feature. Conceptual information analysis enables to refer commencement speech to didactic discourse as it is aimed at teaching university graduates how to deal with challenges life is rich in. Discursive practices of personal discourse are also actively integrated into the commencement speech discourse. More than that, existential discursive practices also find their way into the discourse under study. Commencement
The effect of F0 contour on the intelligibility of speech in the presence of interfering sounds for Mandarin Chinese.

Science.gov (United States)

Chen, Jing; Yang, Hongying; Wu, Xihong; Moore, Brian C J

2018-02-01

In Mandarin Chinese, the fundamental frequency (F0) contour defines lexical "Tones" that differ in meaning despite being phonetically identical. Flattening the F0 contour impairs the intelligibility of Mandarin Chinese in background sounds. This might occur because the flattening introduces misleading lexical information. To avoid this effect, two types of speech were used: single-Tone speech contained Tones 1 and 0 only, which have a flat F0 contour; multi-Tone speech contained all Tones and had a varying F0 contour. The intelligibility of speech in steady noise was slightly better for single-Tone speech than for multi-Tone speech. The intelligibility of speech in a two-talker masker, with the difference in mean F0 between the target and masker matched across conditions, was worse for the multi-Tone target in the multi-Tone masker than for any other combination of target and masker, probably because informational masking was maximal for this combination. The introduction of a perceived spatial separation between the target and masker, via the precedence effect, led to better performance for all target-masker combinations, especially the multi-Tone target in the multi-Tone masker. In summary, a flat F0 contour does not reduce the intelligibility of Mandarin Chinese when the introduction of misleading lexical cues is avoided.
Phonological Awareness Intervention for Children with Childhood Apraxia of Speech

Science.gov (United States)

Moriarty, Brigid C.; Gillon, Gail T.

2006-01-01

Aims: To investigate the effectiveness of an integrated phonological awareness intervention to improve the speech production, phonological awareness and printed word decoding skills for three children with childhood apraxia of speech (CAS) aged 7;3, 6;3 and 6;10. The three children presented with severely delayed phonological awareness skills…
Relationship between perceptual learning in speech and statistical learning in younger and older adults

Directory of Open Access Journals (Sweden)

Thordis Marisa Neger

2014-09-01

Full Text Available Within a few sentences, listeners learn to understand severely degraded speech such as noise-vocoded speech. However, individuals vary in the amount of such perceptual learning and it is unclear what underlies these differences. The present study investigates whether perceptual learning in speech relates to statistical learning, as sensitivity to probabilistic information may aid identification of relevant cues in novel speech input. If statistical learning and perceptual learning (partly draw on the same general mechanisms, then statistical learning in a non-auditory modality using non-linguistic sequences should predict adaptation to degraded speech.In the present study, 73 older adults (aged over 60 years and 60 younger adults (aged between 18 and 30 years performed a visual artificial grammar learning task and were presented with sixty meaningful noise-vocoded sentences in an auditory recall task. Within age groups, sentence recognition performance over exposure was analyzed as a function of statistical learning performance, and other variables that may predict learning (i.e., hearing, vocabulary, attention switching control, working memory and processing speed. Younger and older adults showed similar amounts of perceptual learning, but only younger adults showed significant statistical learning. In older adults, improvement in understanding noise-vocoded speech was constrained by age. In younger adults, amount of adaptation was associated with lexical knowledge and with statistical learning ability. Thus, individual differences in general cognitive abilities explain listeners' variability in adapting to noise-vocoded speech. Results suggest that perceptual and statistical learning share mechanisms of implicit regularity detection, but that the ability to detect statistical regularities is impaired in older adults if visual sequences are presented quickly.
Conspecific and Heterospecific Cues Override Resource Quality to Influence Offspring Production

Science.gov (United States)

Miller, Christine W.; Fletcher, Robert J.; Gillespie, Stephanie R.

2013-01-01

Animals live in an uncertain world. To reduce uncertainty, animals use cues that can encode diverse information regarding habitat quality, including both non-social and social cues. While it is increasingly appreciated that the sources of potential information are vast, our understanding of how individuals integrate different types of cues to guide decision-making remains limited. We experimentally manipulated both resource quality (presence/absence of cactus fruit) and social cues (conspecific juveniles, heterospecific juveniles, no juveniles) for a cactus-feeding insect, Narnia femorata (Hemiptera: Coreidae), to ask how individuals responded to resource quality in the presence or absence of social cues. Cactus with fruit is a high-quality environment for juvenile development, and indeed we found that females laid 56% more eggs when cactus fruit was present versus when it was absent. However, when conspecific or heterospecific juveniles were present, the effects of resource quality on egg numbers vanished. Overall, N . femorata laid approximately twice as many eggs in the presence of heterospecifics than alone or in the presence of conspecifics. Our results suggest that the presence of both conspecific and heterospecific social cues can disrupt responses of individuals to environmental gradients in resource quality. PMID:23861984
Prediction and imitation in speech

Directory of Open Access Journals (Sweden)

Chiara eGambi

2013-06-01

Full Text Available It has been suggested that intra- and inter-speaker variability in speech are correlated. Interlocutors have been shown to converge on various phonetic dimensions. In addition, speakers imitate the phonetic properties of voices they are exposed to in shadowing, repetition, and even passive listening tasks. We review three theoretical accounts of speech imitation and convergence phenomena: (i the Episodic Theory (ET of speech perception and production (Goldinger, 1998; (ii the Motor Theory (MT of speech perception (Liberman and Whalen, 2000;Galantucci et al., 2006 ; (iii Communication Accommodation Theory (CAT; Giles et al., 1991;Giles and Coupland, 1991. We argue that no account is able to explain all the available evidence. In particular, there is a need to integrate low-level, mechanistic accounts (like ET and MT and higher-level accounts (like CAT. We propose that this is possible within the framework of an integrated theory of production and comprehension (Pickering & Garrod, in press. Similarly to both ET and MT, this theory assumes parity between production and perception. Uniquely, however, it posits that listeners simulate speakers’ utterances by computing forward-model predictions at many different levels, which are then compared to the incoming phonetic input. In our account phonetic imitation can be achieved via the same mechanism that is responsible for sensorimotor adaptation; i.e. the correction of prediction errors. In addition, the model assumes that the degree to which sensory prediction errors lead to motor adjustments is context-dependent. The notion of context subsumes both the preceding linguistic input and non-linguistic attributes of the situation (e.g., the speaker’s and listener’s social identities, their conversational roles, the listener’s intention to imitate.
Speech endpoint detection with non-language speech sounds for generic speech processing applications

Science.gov (United States)

McClain, Matthew; Romanowski, Brian

2009-05-01

Non-language speech sounds (NLSS) are sounds produced by humans that do not carry linguistic information. Examples of these sounds are coughs, clicks, breaths, and filled pauses such as "uh" and "um" in English. NLSS are prominent in conversational speech, but can be a significant source of errors in speech processing applications. Traditionally, these sounds are ignored by speech endpoint detection algorithms, where speech regions are identified in the audio signal prior to processing. The ability to filter NLSS as a pre-processing step can significantly enhance the performance of many speech processing applications, such as speaker identification, language identification, and automatic speech recognition. In order to be used in all such applications, NLSS detection must be performed without the use of language models that provide knowledge of the phonology and lexical structure of speech. This is especially relevant to situations where the languages used in the audio are not known apriori. We present the results of preliminary experiments using data from American and British English speakers, in which segments of audio are classified as language speech sounds (LSS) or NLSS using a set of acoustic features designed for language-agnostic NLSS detection and a hidden-Markov model (HMM) to model speech generation. The results of these experiments indicate that the features and model used are capable of detection certain types of NLSS, such as breaths and clicks, while detection of other types of NLSS such as filled pauses will require future research.
The effect of combined sensory and semantic components on audio-visual speech perception in older adults

Directory of Open Access Journals (Sweden)

Corrina eMaguinness

2011-12-01

Full Text Available Previous studies have found that perception in older people benefits from multisensory over uni-sensory information. As normal speech recognition is affected by both the auditory input and the visual lip-movements of the speaker, we investigated the efficiency of audio and visual integration in an older population by manipulating the relative reliability of the auditory and visual information in speech. We also investigated the role of the semantic context of the sentence to assess whether audio-visual integration is affected by top-down semantic processing. We presented participants with audio-visual sentences in which the visual component was either blurred or not blurred. We found that there was a greater cost in recall performance for semantically meaningless speech in the audio-visual blur compared to audio-visual no blur condition and this effect was specific to the older group. Our findings have implications for understanding how aging affects efficient multisensory integration for the perception of speech and suggests that multisensory inputs may benefit speech perception in older adults when the semantic content of the speech is unpredictable.
Hearing and seeing meaning in speech and gesture: insights from brain and behaviour.

Science.gov (United States)

Özyürek, Aslı

2014-09-19

As we speak, we use not only the arbitrary form-meaning mappings of the speech channel but also motivated form-meaning correspondences, i.e. iconic gestures that accompany speech (e.g. inverted V-shaped hand wiggling across gesture space to demonstrate walking). This article reviews what we know about processing of semantic information from speech and iconic gestures in spoken languages during comprehension of such composite utterances. Several studies have shown that comprehension of iconic gestures involves brain activations known to be involved in semantic processing of speech: i.e. modulation of the electrophysiological recording component N400, which is sensitive to the ease of semantic integration of a word to previous context, and recruitment of the left-lateralized frontal-posterior temporal network (left inferior frontal gyrus (IFG), medial temporal gyrus (MTG) and superior temporal gyrus/sulcus (STG/S)). Furthermore, we integrate the information coming from both channels recruiting brain areas such as left IFG, posterior superior temporal sulcus (STS)/MTG and even motor cortex. Finally, this integration is flexible: the temporal synchrony between the iconic gesture and the speech segment, as well as the perceived communicative intent of the speaker, modulate the integration process. Whether these findings are special to gestures or are shared with actions or other visual accompaniments to speech (e.g. lips) or other visual symbols such as pictures are discussed, as well as the implications for a multimodal view of language. © 2014 The Author(s) Published by the Royal Society. All rights reserved.
Post-cueing deficits with maintained cueing benefits in patients with Parkinson's disease dementia

Directory of Open Access Journals (Sweden)

Susanne eGräber

2014-11-01

Full Text Available In Parkinson’s disease (PD internal cueing mechanisms are impaired leading to symptoms such as like hypokinesia. However external cues can improve movement execution by using cortical resources. These cortical processes can be affected by cognitive decline in dementia.It is still unclear how dementia in PD influences external cueing. We investigated a group of 25 PD patients with dementia (PDD and 25 non-demented PD patients (PDnD matched by age, sex and disease duration in a simple reaction time (SRT task using an additional acoustic cue. PDD patients benefited from the additional cue in similar magnitude as did PDnD patients. However, withdrawal of the cue led to a significantly increased reaction time in the PDD group compared to the PDnD patients. Our results indicate that even PDD patients can benefit from strategies using external cue presentation but the process of cognitive worsening can reduce the effect when cues are withdrawn.

Cue-reactors: individual differences in cue-induced craving after food or smoking abstinence.

Directory of Open Access Journals (Sweden)

Stephen V Mahler

Full Text Available BACKGROUND: Pavlovian conditioning plays a critical role in both drug addiction and binge eating. Recent animal research suggests that certain individuals are highly sensitive to conditioned cues, whether they signal food or drugs. Are certain humans also more reactive to both food and drug cues? METHODS: We examined cue-induced craving for both cigarettes and food, in the same individuals (n = 15 adult smokers. Subjects viewed smoking-related or food-related images after abstaining from either smoking or eating. RESULTS: Certain individuals reported strong cue-induced craving after both smoking and food cues. That is, subjects who reported strong cue-induced craving for cigarettes also rated stronger cue-induced food craving. CONCLUSIONS: In humans, like in nonhumans, there may be a "cue-reactive" phenotype, consisting of individuals who are highly sensitive to conditioned stimuli. This finding extends recent reports from nonhuman studies. Further understanding this subgroup of smokers may allow clinicians to individually tailor therapies for smoking cessation.
Cue-reactors: individual differences in cue-induced craving after food or smoking abstinence.

Science.gov (United States)

Mahler, Stephen V; de Wit, Harriet

2010-11-10

Pavlovian conditioning plays a critical role in both drug addiction and binge eating. Recent animal research suggests that certain individuals are highly sensitive to conditioned cues, whether they signal food or drugs. Are certain humans also more reactive to both food and drug cues? We examined cue-induced craving for both cigarettes and food, in the same individuals (n = 15 adult smokers). Subjects viewed smoking-related or food-related images after abstaining from either smoking or eating. Certain individuals reported strong cue-induced craving after both smoking and food cues. That is, subjects who reported strong cue-induced craving for cigarettes also rated stronger cue-induced food craving. In humans, like in nonhumans, there may be a "cue-reactive" phenotype, consisting of individuals who are highly sensitive to conditioned stimuli. This finding extends recent reports from nonhuman studies. Further understanding this subgroup of smokers may allow clinicians to individually tailor therapies for smoking cessation.
Experience-Dependency of Reliance on Local Visual and Idiothetic Cues for Spatial Representations Created in the Absence of Distal Information

Directory of Open Access Journals (Sweden)

Fabian Draht

2017-06-01

Full Text Available Spatial encoding in the hippocampus is based on a range of different input sources. To generate spatial representations, reliable sensory cues from the external environment are integrated with idiothetic cues, derived from self-movement, that enable path integration and directional perception. In this study, we examined to what extent idiothetic cues significantly contribute to spatial representations and navigation: we recorded place cells while rodents navigated towards two visually identical chambers in 180° orientation via two different paths in darkness and in the absence of reliable auditory or olfactory cues. Our goal was to generate a conflict between local visual and direction-specific information, and then to assess which strategy was prioritized in different learning phases. We observed that, in the absence of distal cues, place fields are initially controlled by local visual cues that override idiothetic cues, but that with multiple exposures to the paradigm, spaced at intervals of days, idiothetic cues become increasingly implemented in generating an accurate spatial representation. Taken together, these data support that, in the absence of distal cues, local visual cues are prioritized in the generation of context-specific spatial representations through place cells, whereby idiothetic cues are deemed unreliable. With cumulative exposures to the environments, the animal learns to attend to subtle idiothetic cues to resolve the conflict between visual and direction-specific information.
Experience-Dependency of Reliance on Local Visual and Idiothetic Cues for Spatial Representations Created in the Absence of Distal Information.

Science.gov (United States)

Draht, Fabian; Zhang, Sijie; Rayan, Abdelrahman; Schönfeld, Fabian; Wiskott, Laurenz; Manahan-Vaughan, Denise

2017-01-01

Spatial encoding in the hippocampus is based on a range of different input sources. To generate spatial representations, reliable sensory cues from the external environment are integrated with idiothetic cues, derived from self-movement, that enable path integration and directional perception. In this study, we examined to what extent idiothetic cues significantly contribute to spatial representations and navigation: we recorded place cells while rodents navigated towards two visually identical chambers in 180° orientation via two different paths in darkness and in the absence of reliable auditory or olfactory cues. Our goal was to generate a conflict between local visual and direction-specific information, and then to assess which strategy was prioritized in different learning phases. We observed that, in the absence of distal cues, place fields are initially controlled by local visual cues that override idiothetic cues, but that with multiple exposures to the paradigm, spaced at intervals of days, idiothetic cues become increasingly implemented in generating an accurate spatial representation. Taken together, these data support that, in the absence of distal cues, local visual cues are prioritized in the generation of context-specific spatial representations through place cells, whereby idiothetic cues are deemed unreliable. With cumulative exposures to the environments, the animal learns to attend to subtle idiothetic cues to resolve the conflict between visual and direction-specific information.
The Functional Connectome of Speech Control.

Directory of Open Access Journals (Sweden)

Stefan Fuertinger

2015-07-01

Full Text Available In the past few years, several studies have been directed to understanding the complexity of functional interactions between different brain regions during various human behaviors. Among these, neuroimaging research installed the notion that speech and language require an orchestration of brain regions for comprehension, planning, and integration of a heard sound with a spoken word. However, these studies have been largely limited to mapping the neural correlates of separate speech elements and examining distinct cortical or subcortical circuits involved in different aspects of speech control. As a result, the complexity of the brain network machinery controlling speech and language remained largely unknown. Using graph theoretical analysis of functional MRI (fMRI data in healthy subjects, we quantified the large-scale speech network topology by constructing functional brain networks of increasing hierarchy from the resting state to motor output of meaningless syllables to complex production of real-life speech as well as compared to non-speech-related sequential finger tapping and pure tone discrimination networks. We identified a segregated network of highly connected local neural communities (hubs in the primary sensorimotor and parietal regions, which formed a commonly shared core hub network across the examined conditions, with the left area 4p playing an important role in speech network organization. These sensorimotor core hubs exhibited features of flexible hubs based on their participation in several functional domains across different networks and ability to adaptively switch long-range functional connectivity depending on task content, resulting in a distinct community structure of each examined network. Specifically, compared to other tasks, speech production was characterized by the formation of six distinct neural communities with specialized recruitment of the prefrontal cortex, insula, putamen, and thalamus, which collectively
Twisting Tongues to Test for Conflict-Monitoring in Speech Production

Directory of Open Access Journals (Sweden)

Daniel eAcheson

2014-04-01

Full Text Available A number of recent studies have hypothesized that monitoring in speech production may occur via domain-general mechanisms responsible for the detection of response conflict. Outside of language, two ERP components have consistently been elicited in conflict-inducing tasks (e.g., the flanker task: The stimulus-locked N2 on correct trials, and the response-locked error-related negativity (ERN. The present investigation used these electrophysiological markers to test whether a common response conflict monitor is responsible for monitoring in speech and non-speech tasks.EEG was recorded while participants performed a tongue twister (TT task and a manual version of the flanker task. In the TT task, people rapidly read sequences of four nonwords arranged in TT and non-TT patterns three times. In the flanker task, people responded with a left/right button press to a center-facing arrow, and conflict was manipulated by the congruency of the flanking arrows.Behavioral results showed typical effects of both tasks, with increased error rates and slower speech onset times for TT relative to non-TT trials and for incongruent relative to congruent flanker trials. In the flanker task, stimulus-locked EEG analyses replicated previous results, with a larger N2 for incongruent relative to congruent trials, and a response-locked ERN. In the TT task, stimulus-locked analyses revealed broad, frontally-distributed differences beginning around 50 ms and lasting until just before speech initiation, with TT trials more negative than non-TT trials; response-locked analyses revealed an ERN. Correlation across these measures showed some correlations within a task, but little evidence of systematic cross-task correlation. Although the present results do not speak against conflict signals from the production system serving as cues to self-monitoring, they are not consistent with signatures of response conflict being mediated by a single, domain-general conflict monitor.
Common region wins the competition between extrinsic grouping cues: Evidence from a task without explicit attention to grouping.

Science.gov (United States)

Montoro, Pedro R; Villalba-García, Cristina; Luna, Dolores; Hinojosa, José A

2017-12-01

The competition between perceptual grouping factors is a relatively ignored topic, especially in the case of extrinsic grouping cues (e.g., common region or connectedness). Recent studies have examined the integration of extrinsic cues using tasks that induce selective attention to groups based on different grouping cues. However, this procedure could generate alternative strategies for task performance, which are non-related to the perceptual grouping operations. In the current work, we used an indirect task, i.e. repetition discrimination task, without explicit attention to grouping cues to further examine the rules that govern dominance between competing extrinsic grouping factors. This procedure allowed us to obtain an unbiased measure of the competition between common region and connectedness cues acting within the same display. The results corroborate previous data showing that grouping by common region dominated the perceived organization of the display, even though the phenomenological strength of the grouping cues was equated for each participant by means of a preliminary scaling task. Our results highlight the relevance of using indirect tasks as an essential tool for the systematic study of the integration of extrinsic grouping cues.
Interaction and representational integration: Evidence from speech errors

OpenAIRE

Goldrick, Matthew; Baker, H. Ross; Murphy, Amanda; Baese-Berk, Melissa

2011-01-01

We examine the mechanisms that support interaction between lexical, phonological and phonetic processes during language production. Studies of the phonetics of speech errors have provided evidence that partially activated lexical and phonological representations influence phonetic processing. We examine how these interactive effects are modulated by lexical frequency. Previous research has demonstrated that during lexical access, the processing of high frequency words is facilitated; in contr...
Sensorimotor Adaptation Following Exposure to Ambiguous Inertial Motion Cues

Science.gov (United States)

Wood, S. J.; Clement, G. R.; Rupert, A. H.; Reschke, M. F.; Harm, D. L.; Guedry, F. E.

2007-01-01

The central nervous system must resolve the ambiguity of inertial motion sensory cues in order to derive accurate spatial orientation awareness. Adaptive changes in how inertial cues from the otolith system are integrated with other sensory information lead to perceptual and postural disturbances upon return to Earth s gravity. The primary goals of this ground-based research investigation are to explore physiological mechanisms and operational implications of tilt-translation disturbances during and following re-entry, and to evaluate a tactile prosthesis as a countermeasure for improving control of whole-body orientation during tilt and translation motion.
Ordinal models of audiovisual speech perception

DEFF Research Database (Denmark)

Andersen, Tobias

2011-01-01

Audiovisual information is integrated in speech perception. One manifestation of this is the McGurk illusion in which watching the articulating face alters the auditory phonetic percept. Understanding this phenomenon fully requires a computational model with predictive power. Here, we describe...
Hybrid methodological approach to context-dependent speech recognition

Directory of Open Access Journals (Sweden)

Dragiša Mišković

2017-01-01

Full Text Available Although the importance of contextual information in speech recognition has been acknowledged for a long time now, it has remained clearly underutilized even in state-of-the-art speech recognition systems. This article introduces a novel, methodologically hybrid approach to the research question of context-dependent speech recognition in human–machine interaction. To the extent that it is hybrid, the approach integrates aspects of both statistical and representational paradigms. We extend the standard statistical pattern-matching approach with a cognitively inspired and analytically tractable model with explanatory power. This methodological extension allows for accounting for contextual information which is otherwise unavailable in speech recognition systems, and using it to improve post-processing of recognition hypotheses. The article introduces an algorithm for evaluation of recognition hypotheses, illustrates it for concrete interaction domains, and discusses its implementation within two prototype conversational agents.
Music and Speech Perception in Children Using Sung Speech.

Science.gov (United States)

Nie, Yingjiu; Galvin, John J; Morikawa, Michael; André, Victoria; Wheeler, Harley; Fu, Qian-Jie

2018-01-01

This study examined music and speech perception in normal-hearing children with some or no musical training. Thirty children (mean age = 11.3 years), 15 with and 15 without formal music training participated in the study. Music perception was measured using a melodic contour identification (MCI) task; stimuli were a piano sample or sung speech with a fixed timbre (same word for each note) or a mixed timbre (different words for each note). Speech perception was measured in quiet and in steady noise using a matrix-styled sentence recognition task; stimuli were naturally intonated speech or sung speech with a fixed pitch (same note for each word) or a mixed pitch (different notes for each word). Significant musician advantages were observed for MCI and speech in noise but not for speech in quiet. MCI performance was significantly poorer with the mixed timbre stimuli. Speech performance in noise was significantly poorer with the fixed or mixed pitch stimuli than with spoken speech. Across all subjects, age at testing and MCI performance were significantly correlated with speech performance in noise. MCI and speech performance in quiet was significantly poorer for children than for adults from a related study using the same stimuli and tasks; speech performance in noise was significantly poorer for young than for older children. Long-term music training appeared to benefit melodic pitch perception and speech understanding in noise in these pediatric listeners.
Retrieval-induced forgetting and interference between cues:Training a cue-outcome association attenuates retrieval by alternative cues

OpenAIRE

Ortega-Castro, Nerea; Vadillo Nistal, Miguel

2013-01-01

Some researchers have attempted to determine whether situations in which a single cue is paired with several outcomes (A-B, A-C interference or interference between outcomes) involve the same learning and retrieval mechanisms as situations in which several cues are paired with a single outcome (A-B, C-B interference or interference between cues). Interestingly, current research on a related effect, which is known as retrieval-induced forgetting, can illuminate this debate. Most retrieval-indu...
The influence of spectral characteristics of early reflections on speech intelligibility

DEFF Research Database (Denmark)

Arweiler, Iris; Buchholz, Jörg

2011-01-01

The auditory system takes advantage of early reflections (ERs) in a room by integrating them with the direct sound (DS) and thereby increasing the effective speech level. In the present paper the benefit from realistic ERs on speech intelligibility in diffuse speech-shaped noise was investigated...... ascribed to their altered spectrum compared to the DS and to the filtering by the torso, head, and pinna. No binaural processing other than a binaural summation effect could be observed....
Incorporating Speech Recognition into a Natural User Interface

Science.gov (United States)

Chapa, Nicholas

2017-01-01

The Augmented/ Virtual Reality (AVR) Lab has been working to study the applicability of recent virtual and augmented reality hardware and software to KSC operations. This includes the Oculus Rift, HTC Vive, Microsoft HoloLens, and Unity game engine. My project in this lab is to integrate voice recognition and voice commands into an easy to modify system that can be added to an existing portion of a Natural User Interface (NUI). A NUI is an intuitive and simple to use interface incorporating visual, touch, and speech recognition. The inclusion of speech recognition capability will allow users to perform actions or make inquiries using only their voice. The simplicity of needing only to speak to control an on-screen object or enact some digital action means that any user can quickly become accustomed to using this system. Multiple programs were tested for use in a speech command and recognition system. Sphinx4 translates speech to text using a Hidden Markov Model (HMM) based Language Model, an Acoustic Model, and a word Dictionary running on Java. PocketSphinx had similar functionality to Sphinx4 but instead ran on C. However, neither of these programs were ideal as building a Java or C wrapper slowed performance. The most ideal speech recognition system tested was the Unity Engine Grammar Recognizer. A Context Free Grammar (CFG) structure is written in an XML file to specify the structure of phrases and words that will be recognized by Unity Grammar Recognizer. Using Speech Recognition Grammar Specification (SRGS) 1.0 makes modifying the recognized combinations of words and phrases very simple and quick to do. With SRGS 1.0, semantic information can also be added to the XML file, which allows for even more control over how spoken words and phrases are interpreted by Unity. Additionally, using a CFG with SRGS 1.0 produces a Finite State Machine (FSM) functionality limiting the potential for incorrectly heard words or phrases. The purpose of my project was to
Interdependent processing and encoding of speech and concurrent background noise.

Science.gov (United States)

Cooper, Angela; Brouwer, Susanne; Bradlow, Ann R

2015-05-01

Speech processing can often take place in adverse listening conditions that involve the mixing of speech and background noise. In this study, we investigated processing dependencies between background noise and indexical speech features, using a speeded classification paradigm (Garner, 1974; Exp. 1), and whether background noise is encoded and represented in memory for spoken words in a continuous recognition memory paradigm (Exp. 2). Whether or not the noise spectrally overlapped with the speech signal was also manipulated. The results of Experiment 1 indicated that background noise and indexical features of speech (gender, talker identity) cannot be completely segregated during processing, even when the two auditory streams are spectrally nonoverlapping. Perceptual interference was asymmetric, whereby irrelevant indexical feature variation in the speech signal slowed noise classification to a greater extent than irrelevant noise variation slowed speech classification. This asymmetry may stem from the fact that speech features have greater functional relevance to listeners, and are thus more difficult to selectively ignore than background noise. Experiment 2 revealed that a recognition cost for words embedded in different types of background noise on the first and second occurrences only emerged when the noise and the speech signal were spectrally overlapping. Together, these data suggest integral processing of speech and background noise, modulated by the level of processing and the spectral separation of the speech and noise.
Apraxia of Speech

Science.gov (United States)

... Health Info » Voice, Speech, and Language Apraxia of Speech On this page: What is apraxia of speech? ... about apraxia of speech? What is apraxia of speech? Apraxia of speech (AOS)—also known as acquired ...
Audiovisual Integration in Children Listening to Spectrally Degraded Speech

Science.gov (United States)

Maidment, David W.; Kang, Hi Jee; Stewart, Hannah J.; Amitay, Sygal

2015-01-01

Purpose: The study explored whether visual information improves speech identification in typically developing children with normal hearing when the auditory signal is spectrally degraded. Method: Children (n = 69) and adults (n = 15) were presented with noise-vocoded sentences from the Children's Co-ordinate Response Measure (Rosen, 2011) in…
Cue-induced craving in patients with cocaine use disorder predicts cognitive control deficits toward cocaine cues.

Science.gov (United States)

DiGirolamo, Gregory J; Smelson, David; Guevremont, Nathan

2015-08-01

Cue-induced craving is a clinically important aspect of cocaine addiction influencing ongoing use and sobriety. However, little is known about the relationship between cue-induced craving and cognitive control toward cocaine cues. While studies suggest that cocaine users have an attentional bias toward cocaine cues, the present study extends this research by testing if cocaine use disorder patients (CDPs) can control their eye movements toward cocaine cues and whether their response varied by cue-induced craving intensity. Thirty CDPs underwent a cue exposure procedure to dichotomize them into high and low craving groups followed by a modified antisaccade task in which subjects were asked to control their eye movements toward either a cocaine or neutral drug cue by looking away from the suddenly presented cue. The relationship between breakdowns in cognitive control (as measured by eye errors) and cue-induced craving (changes in self-reported craving following cocaine cue exposure) was investigated. CDPs overall made significantly more errors toward cocaine cues compared to neutral cues, with higher cravers making significantly more errors than lower cravers even though they did not differ significantly in addiction severity, impulsivity, anxiety, or depression levels. Cue-induced craving was the only specific and significant predictor of subsequent errors toward cocaine cues. Cue-induced craving directly and specifically relates to breakdowns of cognitive control toward cocaine cues in CDPs, with higher cravers being more susceptible. Hence, it may be useful identifying high cravers and target treatment toward curbing craving to decrease the likelihood of a subsequent breakdown in control. Copyright © 2015 Elsevier Ltd. All rights reserved.
Diminutives facilitate word segmentation in natural speech: cross-linguistic evidence.

Science.gov (United States)

Kempe, Vera; Brooks, Patricia J; Gillis, Steven; Samson, Graham

2007-06-01

Final-syllable invariance is characteristic of diminutives (e.g., doggie), which are a pervasive feature of the child-directed speech registers of many languages. Invariance in word endings has been shown to facilitate word segmentation (Kempe, Brooks, & Gillis, 2005) in an incidental-learning paradigm in which synthesized Dutch pseudonouns were used. To broaden the cross-linguistic evidence for this invariance effect and to increase its ecological validity, adult English speakers (n=276) were exposed to naturally spoken Dutch or Russian pseudonouns presented in sentence contexts. A forced choice test was given to assess target recognition, with foils comprising unfamiliar syllable combinations in Experiments 1 and 2 and syllable combinations straddling word boundaries in Experiment 3. A control group (n=210) received the recognition test with no prior exposure to targets. Recognition performance improved with increasing final-syllable rhyme invariance, with larger increases for the experimental group. This confirms that word ending invariance is a valid segmentation cue in artificial, as well as naturalistic, speech and that diminutives may aid segmentation in a number of languages.

Spatial attention triggered by unimodal, crossmodal, and bimodal exogenous cues: a comparison of reflexive orienting mechanisms

NARCIS (Netherlands)

Santangelo, Valerio; van der Lubbe, Robert Henricus Johannes; Belardinelli, Marta Olivetti; Postma, Albert

The aim of this study was to establish whether spatial attention triggered by bimodal exogenous cues acts differently as compared to unimodal and crossmodal exogenous cues due to crossmodal integration. In order to investigate this issue, we examined cuing effects in discrimination tasks and
Binaural noise reduction via cue-preserving MMSE filter and adaptive-blocking-based noise PSD estimation

Science.gov (United States)

Azarpour, Masoumeh; Enzner, Gerald

2017-12-01

Binaural noise reduction, with applications for instance in hearing aids, has been a very significant challenge. This task relates to the optimal utilization of the available microphone signals for the estimation of the ambient noise characteristics and for the optimal filtering algorithm to separate the desired speech from the noise. The additional requirements of low computational complexity and low latency further complicate the design. A particular challenge results from the desired reconstruction of binaural speech input with spatial cue preservation. The latter essentially diminishes the utility of multiple-input/single-output filter-and-sum techniques such as beamforming. In this paper, we propose a comprehensive and effective signal processing configuration with which most of the aforementioned criteria can be met suitably. This relates especially to the requirement of efficient online adaptive processing for noise estimation and optimal filtering while preserving the binaural cues. Regarding noise estimation, we consider three different architectures: interaural (ITF), cross-relation (CR), and principal-component (PCA) target blocking. An objective comparison with two other noise PSD estimation algorithms demonstrates the superiority of the blocking-based noise estimators, especially the CR-based and ITF-based blocking architectures. Moreover, we present a new noise reduction filter based on minimum mean-square error (MMSE), which belongs to the class of common gain filters, hence being rigorous in terms of spatial cue preservation but also efficient and competitive for the acoustic noise reduction task. A formal real-time subjective listening test procedure is also developed in this paper. The proposed listening test enables a real-time assessment of the proposed computationally efficient noise reduction algorithms in a realistic acoustic environment, e.g., considering time-varying room impulse responses and the Lombard effect. The listening test outcome
Music Training Can Improve Music and Speech Perception in Pediatric Mandarin-Speaking Cochlear Implant Users.

Science.gov (United States)

Cheng, Xiaoting; Liu, Yangwenyi; Shu, Yilai; Tao, Duo-Duo; Wang, Bing; Yuan, Yasheng; Galvin, John J; Fu, Qian-Jie; Chen, Bing

2018-01-01

Due to limited spectral resolution, cochlear implants (CIs) do not convey pitch information very well. Pitch cues are important for perception of music and tonal language; it is possible that music training may improve performance in both listening tasks. In this study, we investigated music training outcomes in terms of perception of music, lexical tones, and sentences in 22 young (4.8 to 9.3 years old), prelingually deaf Mandarin-speaking CI users. Music perception was measured using a melodic contour identification (MCI) task. Speech perception was measured for lexical tones and sentences presented in quiet. Subjects received 8 weeks of MCI training using pitch ranges not used for testing. Music and speech perception were measured at 2, 4, and 8 weeks after training was begun; follow-up measures were made 4 weeks after training was stopped. Mean baseline performance was 33.2%, 76.9%, and 45.8% correct for MCI, lexical tone recognition, and sentence recognition, respectively. After 8 weeks of MCI training, mean performance significantly improved by 22.9, 14.4, and 14.5 percentage points for MCI, lexical tone recognition, and sentence recognition, respectively ( p music and speech performance. The results suggest that music training can significantly improve pediatric Mandarin-speaking CI users' music and speech perception.
The ability of left- and right-hemisphere damaged individuals to produce prosodic cues to disambiguate Korean idiomatic sentences

Directory of Open Access Journals (Sweden)

Seung-Yun Yang

2014-05-01

Three speech language pathologists with training in phonetics participated as raters for vocal qualities. Nasality was significantly salient vocal quality of idiomatic utterances. Conclusion The findings support that (1 LHD negatively affected the production of durational cues and RHD negatively affected the production of fundamental frequency cues in idiomatic-literal contrasts; (2 healthy listeners successfully identified idiomatic and literal versions of ambiguous sentences produced by healthy speakers but not by RHD speakers; (3 Productions in brain-damaged participants approximated HC’s measures in the repetition tasks, but not in the elicitation tasks; (4 Nasal voice quality was judged to be associated with idiomatic utterances in all groups of participants. Findings agree with previous studies indicating HC’s abilities to discriminate literal versus idiomatic meanings in ditropically ambiguous idioms, as well as deficient processing of pitch production and impaired pragmatic ability in RHD.
Common neural substrates support speech and non-speech vocal tract gestures.

Science.gov (United States)

Chang, Soo-Eun; Kenney, Mary Kay; Loucks, Torrey M J; Poletto, Christopher J; Ludlow, Christy L

2009-08-01

The issue of whether speech is supported by the same neural substrates as non-speech vocal tract gestures has been contentious. In this fMRI study we tested whether producing non-speech vocal tract gestures in humans shares the same functional neuroanatomy as non-sense speech syllables. Production of non-speech vocal tract gestures, devoid of phonological content but similar to speech in that they had familiar acoustic and somatosensory targets, was compared to the production of speech syllables without meaning. Brain activation related to overt production was captured with BOLD fMRI using a sparse sampling design for both conditions. Speech and non-speech were compared using voxel-wise whole brain analyses, and ROI analyses focused on frontal and temporoparietal structures previously reported to support speech production. Results showed substantial activation overlap between speech and non-speech function in regions. Although non-speech gesture production showed greater extent and amplitude of activation in the regions examined, both speech and non-speech showed comparable left laterality in activation for both target perception and production. These findings posit a more general role of the previously proposed "auditory dorsal stream" in the left hemisphere--to support the production of vocal tract gestures that are not limited to speech processing.
Cue-induced craving among inhalant users: Development and preliminary validation of a visual cue paradigm.

Science.gov (United States)

Jain, Shobhit; Dhawan, Anju; Kumaran, S Senthil; Pattanayak, Raman Deep; Jain, Raka

2017-12-01

Cue-induced craving is known to be associated with a higher risk of relapse, wherein drug-specific cues become conditioned stimuli, eliciting conditioned responses. Cue-reactivity paradigm are important tools to study psychological responses and functional neuroimaging changes. However, till date, there has been no specific study or a validated paradigm for inhalant cue-induced craving research. The study aimed to develop and validate visual cue stimulus for inhalant cue-associated craving. The first step (picture selection) involved screening and careful selection of 30 cue- and 30 neutral-pictures based on their relevance for naturalistic settings. In the second step (time optimization), a random selection of ten cue-pictures each was presented for 4s, 6s, and 8s to seven adolescent male inhalant users, and pre-post craving response was compared using a Visual Analogue Scale(VAS) for each of the picture and time. In the third step (validation), craving response for each of 30 cue- and 30 neutral-pictures were analysed among 20 adolescent inhalant users. Findings revealed a significant difference in before and after craving response for the cue-pictures, but not neutral-pictures. Using ROC-curve, pictures were arranged in order of craving intensity. Finally, 20 best cue- and 20 neutral-pictures were used for the development of a 480s visual cue paradigm. This is the first study to systematically develop an inhalant cue picture paradigm which can be used as a tool to examine cue induced craving in neurobiological studies. Further research, including its further validation in larger study and diverse samples, is required. Copyright © 2017 Elsevier B.V. All rights reserved.
Speech Compression

Directory of Open Access Journals (Sweden)

Jerry D. Gibson

2016-06-01

Full Text Available Speech compression is a key technology underlying digital cellular communications, VoIP, voicemail, and voice response systems. We trace the evolution of speech coding based on the linear prediction model, highlight the key milestones in speech coding, and outline the structures of the most important speech coding standards. Current challenges, future research directions, fundamental limits on performance, and the critical open problem of speech coding for emergency first responders are all discussed.
Audiovisual Discrimination between Laughter and Speech

NARCIS (Netherlands)

Petridis, Stavros; Pantic, Maja

Past research on automatic laughter detection has focused mainly on audio-based detection. Here we present an audiovisual approach to distinguishing laughter from speech and we show that integrating the information from audio and video leads to an improved reliability of audiovisual approach in
Speech Recognition for the iCub Platform

Directory of Open Access Journals (Sweden)

Bertrand Higy

2018-02-01

Full Text Available This paper describes open source software (available at https://github.com/robotology/natural-speech to build automatic speech recognition (ASR systems and run them within the YARP platform. The toolkit is designed (i to allow non-ASR experts to easily create their own ASR system and run it on iCub and (ii to build deep learning-based models specifically addressing the main challenges an ASR system faces in the context of verbal human–iCub interactions. The toolkit mostly consists of Python, C++ code and shell scripts integrated in YARP. As additional contribution, a second codebase (written in Matlab is provided for more expert ASR users who want to experiment with bio-inspired and developmental learning-inspired ASR systems. Specifically, we provide code for two distinct kinds of speech recognition: “articulatory” and “unsupervised” speech recognition. The first is largely inspired by influential neurobiological theories of speech perception which assume speech perception to be mediated by brain motor cortex activities. Our articulatory systems have been shown to outperform strong deep learning-based baselines. The second type of recognition systems, the “unsupervised” systems, do not use any supervised information (contrary to most ASR systems, including our articulatory systems. To some extent, they mimic an infant who has to discover the basic speech units of a language by herself. In addition, we provide resources consisting of pre-trained deep learning models for ASR, and a 2.5-h speech dataset of spoken commands, the VoCub dataset, which can be used to adapt an ASR system to the typical acoustic environments in which iCub operates.
Assessing spoken word recognition in children who are deaf or hard of hearing: A translational approach

OpenAIRE

Kirk, Karen Iler; Prusick, Lindsay; French, Brian; Gotch, Chad; Eisenberg, Laurie S.; Young, Nancy

2012-01-01

Under natural conditions, listeners use both auditory and visual speech cues to extract meaning from speech signals containing many sources of variability. However, traditional clinical tests of spoken word recognition routinely employ isolated words or sentences produced by a single talker in an auditory-only presentation format. The more central cognitive processes used during multimodal integration, perceptual normalization and lexical discrimination that may contribute to individual varia...
Intensive treatment with ultrasound visual feedback for speech sound errors in childhood apraxia

Directory of Open Access Journals (Sweden)

Jonathan L Preston

2016-08-01

Full Text Available Ultrasound imaging is an adjunct to traditional speech therapy that has shown to be beneficial in the remediation of speech sound errors. Ultrasound biofeedback can be utilized during therapy to provide clients additional knowledge about their tongue shapes when attempting to produce sounds that are in error. The additional feedback may assist children with childhood apraxia of speech in stabilizing motor patterns, thereby facilitating more consistent and accurate productions of sounds and syllables. However, due to its specialized nature, ultrasound visual feedback is a technology that is not widely available to clients. Short-term intensive treatment programs are one option that can be utilized to expand access to ultrasound biofeedback. Schema-based motor learning theory suggests that short-term intensive treatment programs (massed practice may assist children in acquiring more accurate motor patterns. In this case series, three participants ages 10-14 diagnosed with childhood apraxia of speech attended 16 hours of speech therapy over a two-week period to address residual speech sound errors. Two participants had distortions on rhotic sounds, while the third participant demonstrated lateralization of sibilant sounds. During therapy, cues were provided to assist participants in obtaining a tongue shape that facilitated a correct production of the erred sound. Additional practice without ultrasound was also included. Results suggested that all participants showed signs of acquisition of sounds in error. Generalization and retention results were mixed. One participant showed generalization and retention of sounds that were treated; one showed generalization but limited retention; and the third showed no evidence of generalization or retention. Individual characteristics that may facilitate generalization are discussed. Short-term intensive treatment programs using ultrasound biofeedback may result in the acquisition of more accurate motor
Temporal and speech processing skills in normal hearing individuals exposed to occupational noise.

Science.gov (United States)

Kumar, U Ajith; Ameenudin, Syed; Sangamanatha, A V

2012-01-01

Prolonged exposure to high levels of occupational noise can cause damage to hair cells in the cochlea and result in permanent noise-induced cochlear hearing loss. Consequences of cochlear hearing loss on speech perception and psychophysical abilities have been well documented. Primary goal of this research was to explore temporal processing and speech perception Skills in individuals who are exposed to occupational noise of more than 80 dBA and not yet incurred clinically significant threshold shifts. Contribution of temporal processing skills to speech perception in adverse listening situation was also evaluated. A total of 118 participants took part in this research. Participants comprised three groups of train drivers in the age range of 30-40 (n= 13), 41 50 ( = 13), 41-50 (n = 9), and 51-60 (n = 6) years and their non-noise-exposed counterparts (n = 30 in each age group). Participants of all the groups including the train drivers had hearing sensitivity within 25 dB HL in the octave frequencies between 250 and 8 kHz. Temporal processing was evaluated using gap detection, modulation detection, and duration pattern tests. Speech recognition was tested in presence multi-talker babble at -5dB SNR. Differences between experimental and control groups were analyzed using ANOVA and independent sample t-tests. Results showed a trend of reduced temporal processing skills in individuals with noise exposure. These deficits were observed despite normal peripheral hearing sensitivity. Speech recognition scores in the presence of noise were also significantly poor in noise-exposed group. Furthermore, poor temporal processing skills partially accounted for the speech recognition difficulties exhibited by the noise-exposed individuals. These results suggest that noise can cause significant distortions in the processing of suprathreshold temporal cues which may add to difficulties in hearing in adverse listening conditions.
Temporal and speech processing skills in normal hearing individuals exposed to occupational noise

Directory of Open Access Journals (Sweden)

U Ajith Kumar

2012-01-01

Full Text Available Prolonged exposure to high levels of occupational noise can cause damage to hair cells in the cochlea and result in permanent noise-induced cochlear hearing loss. Consequences of cochlear hearing loss on speech perception and psychophysical abilities have been well documented. Primary goal of this research was to explore temporal processing and speech perception Skills in individuals who are exposed to occupational noise of more than 80 dBA and not yet incurred clinically significant threshold shifts. Contribution of temporal processing skills to speech perception in adverse listening situation was also evaluated. A total of 118 participants took part in this research. Participants comprised three groups of train drivers in the age range of 30-40 (n= 13, 41 50 ( = 13, 41-50 (n = 9, and 51-60 (n = 6 years and their non-noise-exposed counterparts (n = 30 in each age group. Participants of all the groups including the train drivers had hearing sensitivity within 25 dB HL in the octave frequencies between 250 and 8 kHz. Temporal processing was evaluated using gap detection, modulation detection, and duration pattern tests. Speech recognition was tested in presence multi-talker babble at -5dB SNR. Differences between experimental and control groups were analyzed using ANOVA and independent sample t-tests. Results showed a trend of reduced temporal processing skills in individuals with noise exposure. These deficits were observed despite normal peripheral hearing sensitivity. Speech recognition scores in the presence of noise were also significantly poor in noise-exposed group. Furthermore, poor temporal processing skills partially accounted for the speech recognition difficulties exhibited by the noise-exposed individuals. These results suggest that noise can cause significant distortions in the processing of suprathreshold temporal cues which may add to difficulties in hearing in adverse listening conditions.
Visual Input Enhances Selective Speech Envelope Tracking in Auditory Cortex at a ‘Cocktail Party’

Science.gov (United States)

Golumbic, Elana Zion; Cogan, Gregory B.; Schroeder, Charles E.; Poeppel, David

2013-01-01

Our ability to selectively attend to one auditory signal amidst competing input streams, epitomized by the ‘Cocktail Party’ problem, continues to stimulate research from various approaches. How this demanding perceptual feat is achieved from a neural systems perspective remains unclear and controversial. It is well established that neural responses to attended stimuli are enhanced compared to responses to ignored ones, but responses to ignored stimuli are nonetheless highly significant, leading to interference in performance. We investigated whether congruent visual input of an attended speaker enhances cortical selectivity in auditory cortex, leading to diminished representation of ignored stimuli. We recorded magnetoencephalographic (MEG) signals from human participants as they attended to segments of natural continuous speech. Using two complementary methods of quantifying the neural response to speech, we found that viewing a speaker’s face enhances the capacity of auditory cortex to track the temporal speech envelope of that speaker. This mechanism was most effective in a ‘Cocktail Party’ setting, promoting preferential tracking of the attended speaker, whereas without visual input no significant attentional modulation was observed. These neurophysiological results underscore the importance of visual input in resolving perceptual ambiguity in a noisy environment. Since visual cues in speech precede the associated auditory signals, they likely serve a predictive role in facilitating auditory processing of speech, perhaps by directing attentional resources to appropriate points in time when to-be-attended acoustic input is expected to arrive. PMID:23345218
Spectrotemporal Modulation Detection and Speech Perception by Cochlear Implant Users.

Science.gov (United States)

Won, Jong Ho; Moon, Il Joon; Jin, Sunhwa; Park, Heesung; Woo, Jihwan; Cho, Yang-Sun; Chung, Won-Ho; Hong, Sung Hwa

2015-01-01

Spectrotemporal modulation (STM) detection performance was examined for cochlear implant (CI) users. The test involved discriminating between an unmodulated steady noise and a modulated stimulus. The modulated stimulus presents frequency modulation patterns that change in frequency over time. In order to examine STM detection performance for different modulation conditions, two different temporal modulation rates (5 and 10 Hz) and three different spectral modulation densities (0.5, 1.0, and 2.0 cycles/octave) were employed, producing a total 6 different STM stimulus conditions. In order to explore how electric hearing constrains STM sensitivity for CI users differently from acoustic hearing, normal-hearing (NH) and hearing-impaired (HI) listeners were also tested on the same tasks. STM detection performance was best in NH subjects, followed by HI subjects. On average, CI subjects showed poorest performance, but some CI subjects showed high levels of STM detection performance that was comparable to acoustic hearing. Significant correlations were found between STM detection performance and speech identification performance in quiet and in noise. In order to understand the relative contribution of spectral and temporal modulation cues to speech perception abilities for CI users, spectral and temporal modulation detection was performed separately and related to STM detection and speech perception performance. The results suggest that that slow spectral modulation rather than slow temporal modulation may be important for determining speech perception capabilities for CI users. Lastly, test-retest reliability for STM detection was good with no learning. The present study demonstrates that STM detection may be a useful tool to evaluate the ability of CI sound processing strategies to deliver clinically pertinent acoustic modulation information.
The Influence of Cue Reliability and Cue Representation on Spatial Reorientation in Young Children

Science.gov (United States)

Lyons, Ian M.; Huttenlocher, Janellen; Ratliff, Kristin R.

2014-01-01

Previous studies of children's reorientation have focused on cue representation (e.g., whether cues are geometric) as a predictor of performance but have not addressed cue reliability (the regularity of the relation between a given cue and an outcome) as a predictor of performance. Here we address both factors within the same series of…
Speech misperception: speaking and seeing interfere differently with hearing.

Directory of Open Access Journals (Sweden)

Takemi Mochida

Full Text Available Speech perception is thought to be linked to speech motor production. This linkage is considered to mediate multimodal aspects of speech perception, such as audio-visual and audio-tactile integration. However, direct coupling between articulatory movement and auditory perception has been little studied. The present study reveals a clear dissociation between the effects of a listener's own speech action and the effects of viewing another's speech movements on the perception of auditory phonemes. We assessed the intelligibility of the syllables [pa], [ta], and [ka] when listeners silently and simultaneously articulated syllables that were congruent/incongruent with the syllables they heard. The intelligibility was compared with a condition where the listeners simultaneously watched another's mouth producing congruent/incongruent syllables, but did not articulate. The intelligibility of [ta] and [ka] were degraded by articulating [ka] and [ta] respectively, which are associated with the same primary articulator (tongue as the heard syllables. But they were not affected by articulating [pa], which is associated with a different primary articulator (lips from the heard syllables. In contrast, the intelligibility of [ta] and [ka] was degraded by watching the production of [pa]. These results indicate that the articulatory-induced distortion of speech perception occurs in an articulator-specific manner while visually induced distortion does not. The articulator-specific nature of the auditory-motor interaction in speech perception suggests that speech motor processing directly contributes to our ability to hear speech.
Motor Training: Comparison of Visual and Auditory Coded Proprioceptive Cues

Directory of Open Access Journals (Sweden)

Philip Jepson

2012-05-01

Full Text Available Self-perception of body posture and movement is achieved through multi-sensory integration, particularly the utilisation of vision, and proprioceptive information derived from muscles and joints. Disruption to these processes can occur following a neurological accident, such as stroke, leading to sensory and physical impairment. Rehabilitation can be helped through use of augmented visual and auditory biofeedback to stimulate neuro-plasticity, but the effective design and application of feedback, particularly in the auditory domain, is non-trivial. Simple auditory feedback was tested by comparing the stepping accuracy of normal subjects when given a visual spatial target (step length and an auditory temporal target (step duration. A baseline measurement of step length and duration was taken using optical motion capture. Subjects (n=20 took 20 ‘training’ steps (baseline ±25% using either an auditory target (950 Hz tone, bell-shaped gain envelope or visual target (spot marked on the floor and were then asked to replicate the target step (length or duration corresponding to training with all feedback removed. Visual cues (mean percentage error=11.5%; SD ± 7.0%; auditory cues (mean percentage error = 12.9%; SD ± 11.8%. Visual cues elicit a high degree of accuracy both in training and follow-up un-cued tasks; despite the novelty of the auditory cues present for subjects, the mean accuracy of subjects approached that for visual cues, and initial results suggest a limited amount of practice using auditory cues can improve performance.
THE ONTOGENESIS OF SPEECH DEVELOPMENT

Directory of Open Access Journals (Sweden)

T. E. Braudo

2017-01-01

Full Text Available The purpose of this article is to acquaint the specialists, working with children having developmental disorders, with age-related norms for speech development. Many well-known linguists and psychologists studied speech ontogenesis (logogenesis. Speech is a higher mental function, which integrates many functional systems. Speech development in infants during the first months after birth is ensured by the innate hearing and emerging ability to fix the gaze on the face of an adult. Innate emotional reactions are also being developed during this period, turning into nonverbal forms of communication. At about 6 months a baby starts to pronounce some syllables; at 7–9 months – repeats various sounds combinations, pronounced by adults. At 10–11 months a baby begins to react on the words, referred to him/her. The first words usually appear at an age of 1 year; this is the start of the stage of active speech development. At this time it is acceptable, if a child confuses or rearranges sounds, distorts or misses them. By the age of 1.5 years a child begins to understand abstract explanations of adults. Significant vocabulary enlargement occurs between 2 and 3 years; grammatical structures of the language are being formed during this period (a child starts to use phrases and sentences. Preschool age (3–7 y. o. is characterized by incorrect, but steadily improving pronunciation of sounds and phonemic perception. The vocabulary increases; abstract speech and retelling are being formed. Children over 7 y. o. continue to improve grammar, writing and reading skills. The described stages may not have strict age boundaries, as soon as they are dependent not only on environment, but also on the child’s mental constitution, heredity and character.
Distributed acoustic cues for caller identity in macaque vocalization.

Science.gov (United States)

Fukushima, Makoto; Doyle, Alex M; Mullarkey, Matthew P; Mishkin, Mortimer; Averbeck, Bruno B

2015-12-01

Individual primates can be identified by the sound of their voice. Macaques have demonstrated an ability to discern conspecific identity from a harmonically structured 'coo' call. Voice recognition presumably requires the integrated perception of multiple acoustic features. However, it is unclear how this is achieved, given considerable variability across utterances. Specifically, the extent to which information about caller identity is distributed across multiple features remains elusive. We examined these issues by recording and analysing a large sample of calls from eight macaques. Single acoustic features, including fundamental frequency, duration and Weiner entropy, were informative but unreliable for the statistical classification of caller identity. A combination of multiple features, however, allowed for highly accurate caller identification. A regularized classifier that learned to identify callers from the modulation power spectrum of calls found that specific regions of spectral-temporal modulation were informative for caller identification. These ranges are related to acoustic features such as the call's fundamental frequency and FM sweep direction. We further found that the low-frequency spectrotemporal modulation component contained an indexical cue of the caller body size. Thus, cues for caller identity are distributed across identifiable spectrotemporal components corresponding to laryngeal and supralaryngeal components of vocalizations, and the integration of those cues can enable highly reliable caller identification. Our results demonstrate a clear acoustic basis by which individual macaque vocalizations can be recognized.

Cortical activity patterns predict robust speech discrimination ability in noise

Science.gov (United States)

Shetake, Jai A.; Wolf, Jordan T.; Cheung, Ryan J.; Engineer, Crystal T.; Ram, Satyananda K.; Kilgard, Michael P.

2012-01-01

The neural mechanisms that support speech discrimination in noisy conditions are poorly understood. In quiet conditions, spike timing information appears to be used in the discrimination of speech sounds. In this study, we evaluated the hypothesis that spike timing is also used to distinguish between speech sounds in noisy conditions that significantly degrade neural responses to speech sounds. We tested speech sound discrimination in rats and recorded primary auditory cortex (A1) responses to speech sounds in background noise of different intensities and spectral compositions. Our behavioral results indicate that rats, like humans, are able to accurately discriminate consonant sounds even in the presence of background noise that is as loud as the speech signal. Our neural recordings confirm that speech sounds evoke degraded but detectable responses in noise. Finally, we developed a novel neural classifier that mimics behavioral discrimination. The classifier discriminates between speech sounds by comparing the A1 spatiotemporal activity patterns evoked on single trials with the average spatiotemporal patterns evoked by known sounds. Unlike classifiers in most previous studies, this classifier is not provided with the stimulus onset time. Neural activity analyzed with the use of relative spike timing was well correlated with behavioral speech discrimination in quiet and in noise. Spike timing information integrated over longer intervals was required to accurately predict rat behavioral speech discrimination in noisy conditions. The similarity of neural and behavioral discrimination of speech in noise suggests that humans and rats may employ similar brain mechanisms to solve this problem. PMID:22098331
Speech Production and Speech Discrimination by Hearing-Impaired Children.

Science.gov (United States)

Novelli-Olmstead, Tina; Ling, Daniel

1984-01-01

Seven hearing impaired children (five to seven years old) assigned to the Speakers group made highly significant gains in speech production and auditory discrimination of speech, while Listeners made only slight speech production gains and no gains in auditory discrimination. Combined speech and auditory training was more effective than auditory…
The Effective Use of Symbols in Teaching Word Recognition to Children with Severe Learning Difficulties: A Comparison of Word Alone, Integrated Picture Cueing and the Handle Technique.

Science.gov (United States)

Sheehy, Kieron

2002-01-01

A comparison is made between a new technique (the Handle Technique), Integrated Picture Cueing, and a Word Alone Method. Results show using a new combination of teaching strategies enabled logographic symbols to be used effectively in teaching word recognition to 12 children with severe learning difficulties. (Contains references.) (Author/CR)
Stuttering Frequency, Speech Rate, Speech Naturalness, and Speech Effort During the Production of Voluntary Stuttering.

Science.gov (United States)

Davidow, Jason H; Grossman, Heather L; Edge, Robin L

2018-05-01

Voluntary stuttering techniques involve persons who stutter purposefully interjecting disfluencies into their speech. Little research has been conducted on the impact of these techniques on the speech pattern of persons who stutter. The present study examined whether changes in the frequency of voluntary stuttering accompanied changes in stuttering frequency, articulation rate, speech naturalness, and speech effort. In total, 12 persons who stutter aged 16-34 years participated. Participants read four 300-syllable passages during a control condition, and three voluntary stuttering conditions that involved attempting to produce purposeful, tension-free repetitions of initial sounds or syllables of a word for two or more repetitions (i.e., bouncing). The three voluntary stuttering conditions included bouncing on 5%, 10%, and 15% of syllables read. Friedman tests and follow-up Wilcoxon signed ranks tests were conducted for the statistical analyses. Stuttering frequency, articulation rate, and speech naturalness were significantly different between the voluntary stuttering conditions. Speech effort did not differ between the voluntary stuttering conditions. Stuttering frequency was significantly lower during the three voluntary stuttering conditions compared to the control condition, and speech effort was significantly lower during two of the three voluntary stuttering conditions compared to the control condition. Due to changes in articulation rate across the voluntary stuttering conditions, it is difficult to conclude, as has been suggested previously, that voluntary stuttering is the reason for stuttering reductions found when using voluntary stuttering techniques. Additionally, future investigations should examine different types of voluntary stuttering over an extended period of time to determine their impact on stuttering frequency, speech rate, speech naturalness, and speech effort.
Comprehension of synthetic speech and digitized natural speech by adults with aphasia.

Science.gov (United States)

Hux, Karen; Knollman-Porter, Kelly; Brown, Jessica; Wallace, Sarah E

2017-09-01

Using text-to-speech technology to provide simultaneous written and auditory content presentation may help compensate for chronic reading challenges if people with aphasia can understand synthetic speech output; however, inherent auditory comprehension challenges experienced by people with aphasia may make understanding synthetic speech difficult. This study's purpose was to compare the preferences and auditory comprehension accuracy of people with aphasia when listening to sentences generated with digitized natural speech, Alex synthetic speech (i.e., Macintosh platform), or David synthetic speech (i.e., Windows platform). The methodology required each of 20 participants with aphasia to select one of four images corresponding in meaning to each of 60 sentences comprising three stimulus sets. Results revealed significantly better accuracy given digitized natural speech than either synthetic speech option; however, individual participant performance analyses revealed three patterns: (a) comparable accuracy regardless of speech condition for 30% of participants, (b) comparable accuracy between digitized natural speech and one, but not both, synthetic speech option for 45% of participants, and (c) greater accuracy with digitized natural speech than with either synthetic speech option for remaining participants. Ranking and Likert-scale rating data revealed a preference for digitized natural speech and David synthetic speech over Alex synthetic speech. Results suggest many individuals with aphasia can comprehend synthetic speech options available on popular operating systems. Further examination of synthetic speech use to support reading comprehension through text-to-speech technology is thus warranted. Copyright © 2017 Elsevier Inc. All rights reserved.
Common neural substrates support speech and non-speech vocal tract gestures

OpenAIRE

Chang, Soo-Eun; Kenney, Mary Kay; Loucks, Torrey M.J.; Poletto, Christopher J.; Ludlow, Christy L.

2009-01-01

The issue of whether speech is supported by the same neural substrates as non-speech vocal-tract gestures has been contentious. In this fMRI study we tested whether producing non-speech vocal tract gestures in humans shares the same functional neuroanatomy as non-sense speech syllables. Production of non-speech vocal tract gestures, devoid of phonological content but similar to speech in that they had familiar acoustic and somatosensory targets, were compared to the production of speech sylla...
Fusion of audio and visual cues for laughter detection

NARCIS (Netherlands)

Petridis, Stavros; Pantic, Maja

Past research on automatic laughter detection has focused mainly on audio-based detection. Here we present an audio- visual approach to distinguishing laughter from speech and we show that integrating the information from audio and video channels leads to improved performance over single-modal
Habituation of adult sea lamprey repeatedly exposed to damage-released alarm and predator cues

Science.gov (United States)

Imre, Istvan; Di Rocco, Richard T.; Brown, Grant E.; Johnson, Nicholas

2016-01-01

Predation is an unforgiving selective pressure affecting the life history, morphology and behaviour of prey organisms. Selection should favour organisms that have the ability to correctly assess the information content of alarm cues. This study investigated whether adult sea lamprey Petromyzon marinus habituate to conspecific damage-released alarm cues (fresh and decayed sea lamprey extract), a heterospecific damage-released alarm cue (white sucker Catostomus commersoniiextract), predator cues (Northern water snake Nerodia sipedon washing, human saliva and 2-phenylethylamine hydrochloride (PEA HCl)) and a conspecific damage-released alarm cue and predator cue combination (fresh sea lamprey extract and human saliva) after they were pre-exposed 4 times or 8 times, respectively, to a given stimulus the previous night. Consistent with our prediction, adult sea lamprey maintained an avoidance response to conspecific damage-released alarm cues (fresh and decayed sea lamprey extract), a predator cue presented at high relative concentration (PEA HCl) and a conspecific damage-released alarm cue and predator cue combination (fresh sea lamprey extract plus human saliva), irrespective of previous exposure level. As expected, adult sea lamprey habituated to a sympatric heterospecific damage-released alarm cue (white sucker extract) and a predator cue presented at lower relative concentration (human saliva). Adult sea lamprey did not show any avoidance of the Northern water snake washing and the Amazon sailfin catfish extract (heterospecific control). This study suggests that conspecific damage-released alarm cues and PEA HCl present the best options as natural repellents in an integrated management program aimed at controlling the abundance of sea lamprey in the Laurentian Great Lakes.
Introductory speeches

International Nuclear Information System (INIS)

2001-01-01

This CD is multimedia presentation of programme safety upgrading of Bohunice V1 NPP. This chapter consist of introductory commentary and 4 introductory speeches (video records): (1) Introductory speech of Vincent Pillar, Board chairman and director general of Slovak electric, Plc. (SE); (2) Introductory speech of Stefan Schmidt, director of SE - Bohunice Nuclear power plants; (3) Introductory speech of Jan Korec, Board chairman and director general of VUJE Trnava, Inc. - Engineering, Design and Research Organisation, Trnava; Introductory speech of Dietrich Kuschel, Senior vice-president of FRAMATOME ANP Project and Engineering
Predicting speech intelligibility in conditions with nonlinearly processed noisy speech

DEFF Research Database (Denmark)

Jørgensen, Søren; Dau, Torsten

2013-01-01

The speech-based envelope power spectrum model (sEPSM; [1]) was proposed in order to overcome the limitations of the classical speech transmission index (STI) and speech intelligibility index (SII). The sEPSM applies the signal-tonoise ratio in the envelope domain (SNRenv), which was demonstrated...... to successfully predict speech intelligibility in conditions with nonlinearly processed noisy speech, such as processing with spectral subtraction. Moreover, a multiresolution version (mr-sEPSM) was demonstrated to account for speech intelligibility in various conditions with stationary and fluctuating...
Exploring Australian speech-language pathologists' use and perceptions ofnon-speech oral motor exercises.

Science.gov (United States)

Rumbach, Anna F; Rose, Tanya A; Cheah, Mynn

2018-01-29

To explore Australian speech-language pathologists' use of non-speech oral motor exercises, and rationales for using/not using non-speech oral motor exercises in clinical practice. A total of 124 speech-language pathologists practising in Australia, working with paediatric and/or adult clients with speech sound difficulties, completed an online survey. The majority of speech-language pathologists reported that they did not use non-speech oral motor exercises when working with paediatric or adult clients with speech sound difficulties. However, more than half of the speech-language pathologists working with adult clients who have dysarthria reported using non-speech oral motor exercises with this population. The most frequently reported rationale for using non-speech oral motor exercises in speech sound difficulty management was to improve awareness/placement of articulators. The majority of speech-language pathologists agreed there is no clear clinical or research evidence base to support non-speech oral motor exercise use with clients who have speech sound difficulties. This study provides an overview of Australian speech-language pathologists' reported use and perceptions of non-speech oral motor exercises' applicability and efficacy in treating paediatric and adult clients who have speech sound difficulties. The research findings provide speech-language pathologists with insight into how and why non-speech oral motor exercises are currently used, and adds to the knowledge base regarding Australian speech-language pathology practice of non-speech oral motor exercises in the treatment of speech sound difficulties. Implications for Rehabilitation Non-speech oral motor exercises refer to oral motor activities which do not involve speech, but involve the manipulation or stimulation of oral structures including the lips, tongue, jaw, and soft palate. Non-speech oral motor exercises are intended to improve the function (e.g., movement, strength) of oral structures. The
Perception of the multisensory coherence of fluent audiovisual speech in infancy: its emergence and the role of experience.

Science.gov (United States)

Lewkowicz, David J; Minar, Nicholas J; Tift, Amy H; Brandon, Melissa

2015-02-01

To investigate the developmental emergence of the perception of the multisensory coherence of native and non-native audiovisual fluent speech, we tested 4-, 8- to 10-, and 12- to 14-month-old English-learning infants. Infants first viewed two identical female faces articulating two different monologues in silence and then in the presence of an audible monologue that matched the visible articulations of one of the faces. Neither the 4-month-old nor 8- to 10-month-old infants exhibited audiovisual matching in that they did not look longer at the matching monologue. In contrast, the 12- to 14-month-old infants exhibited matching and, consistent with the emergence of perceptual expertise for the native language, perceived the multisensory coherence of native-language monologues earlier in the test trials than that of non-native language monologues. Moreover, the matching of native audible and visible speech streams observed in the 12- to 14-month-olds did not depend on audiovisual synchrony, whereas the matching of non-native audible and visible speech streams did depend on synchrony. Overall, the current findings indicate that the perception of the multisensory coherence of fluent audiovisual speech emerges late in infancy, that audiovisual synchrony cues are more important in the perception of the multisensory coherence of non-native speech than that of native audiovisual speech, and that the emergence of this skill most likely is affected by perceptual narrowing. Copyright © 2014 Elsevier Inc. All rights reserved.
Decoding Speech With Integrated Hybrid Signals Recorded From the Human Ventral Motor Cortex

Directory of Open Access Journals (Sweden)

Kenji Ibayashi

2018-04-01

Full Text Available Restoration of speech communication for locked-in patients by means of brain computer interfaces (BCIs is currently an important area of active research. Among the neural signals obtained from intracranial recordings, single/multi-unit activity (SUA/MUA, local field potential (LFP, and electrocorticography (ECoG are good candidates for an input signal for BCIs. However, the question of which signal or which combination of the three signal modalities is best suited for decoding speech production remains unverified. In order to record SUA, LFP, and ECoG simultaneously from a highly localized area of human ventral sensorimotor cortex (vSMC, we fabricated an electrode the size of which was 7 by 13 mm containing sparsely arranged microneedle and conventional macro contacts. We determined which signal modality is the most capable of decoding speech production, and tested if the combination of these signals could improve the decoding accuracy of spoken phonemes. Feature vectors were constructed from spike frequency obtained from SUAs and event-related spectral perturbation derived from ECoG and LFP signals, then input to the decoder. The results showed that the decoding accuracy for five spoken vowels was highest when features from multiple signals were combined and optimized for each subject, and reached 59% when averaged across all six subjects. This result suggests that multi-scale signals convey complementary information for speech articulation. The current study demonstrated that simultaneous recording of multi-scale neuronal activities could raise decoding accuracy even though the recording area is limited to a small portion of cortex, which is advantageous for future implementation of speech-assisting BCIs.
[Improving speech comprehension using a new cochlear implant speech processor].

Science.gov (United States)

Müller-Deile, J; Kortmann, T; Hoppe, U; Hessel, H; Morsnowski, A

2009-06-01

The aim of this multicenter clinical field study was to assess the benefits of the new Freedom 24 sound processor for cochlear implant (CI) users implanted with the Nucleus 24 cochlear implant system. The study included 48 postlingually profoundly deaf experienced CI users who demonstrated speech comprehension performance with their current speech processor on the Oldenburg sentence test (OLSA) in quiet conditions of at least 80% correct scores and who were able to perform adaptive speech threshold testing using the OLSA in noisy conditions. Following baseline measures of speech comprehension performance with their current speech processor, subjects were upgraded to the Freedom 24 speech processor. After a take-home trial period of at least 2 weeks, subject performance was evaluated by measuring the speech reception threshold with the Freiburg multisyllabic word test and speech intelligibility with the Freiburg monosyllabic word test at 50 dB and 70 dB in the sound field. The results demonstrated highly significant benefits for speech comprehension with the new speech processor. Significant benefits for speech comprehension were also demonstrated with the new speech processor when tested in competing background noise.In contrast, use of the Abbreviated Profile of Hearing Aid Benefit (APHAB) did not prove to be a suitably sensitive assessment tool for comparative subjective self-assessment of hearing benefits with each processor. Use of the preprocessing algorithm known as adaptive dynamic range optimization (ADRO) in the Freedom 24 led to additional improvements over the standard upgrade map for speech comprehension in quiet and showed equivalent performance in noise. Through use of the preprocessing beam-forming algorithm BEAM, subjects demonstrated a highly significant improved signal-to-noise ratio for speech comprehension thresholds (i.e., signal-to-noise ratio for 50% speech comprehension scores) when tested with an adaptive procedure using the Oldenburg
Speech coding

Energy Technology Data Exchange (ETDEWEB)

Ravishankar, C., Hughes Network Systems, Germantown, MD

1998-05-08

Speech is the predominant means of communication between human beings and since the invention of the telephone by Alexander Graham Bell in 1876, speech services have remained to be the core service in almost all telecommunication systems. Original analog methods of telephony had the disadvantage of speech signal getting corrupted by noise, cross-talk and distortion Long haul transmissions which use repeaters to compensate for the loss in signal strength on transmission links also increase the associated noise and distortion. On the other hand digital transmission is relatively immune to noise, cross-talk and distortion primarily because of the capability to faithfully regenerate digital signal at each repeater purely based on a binary decision. Hence end-to-end performance of the digital link essentially becomes independent of the length and operating frequency bands of the link Hence from a transmission point of view digital transmission has been the preferred approach due to its higher immunity to noise. The need to carry digital speech became extremely important from a service provision point of view as well. Modem requirements have introduced the need for robust, flexible and secure services that can carry a multitude of signal types (such as voice, data and video) without a fundamental change in infrastructure. Such a requirement could not have been easily met without the advent of digital transmission systems, thereby requiring speech to be coded digitally. The term Speech Coding is often referred to techniques that represent or code speech signals either directly as a waveform or as a set of parameters by analyzing the speech signal. In either case, the codes are transmitted to the distant end where speech is reconstructed or synthesized using the received set of codes. A more generic term that is applicable to these techniques that is often interchangeably used with speech coding is the term voice coding. This term is more generic in the sense that the
Experienced speech-language pathologists' responses to ethical dilemmas: an integrated approach to ethical reasoning.

Science.gov (United States)

Kenny, Belinda; Lincoln, Michelle; Balandin, Susan

2010-05-01

To investigate the approaches of experienced speech-language pathologists (SLPs) to ethical reasoning and the processes they use to resolve ethical dilemmas. Ten experienced SLPs participated in in-depth interviews. A narrative approach was used to guide participants' descriptions of how they resolved ethical dilemmas. Individual narrative transcriptions were analyzed by using the participant's words to develop an ethical story that described and interpreted their responses to dilemmas. Key concepts from individual stories were then coded into group themes to reflect participants' reasoning processes. Five major themes reflected participants' approaches to ethical reasoning: (a) focusing on the well-being of the client, (b) fulfilling professional roles and responsibilities, (c) attending to professional relationships, (d) managing resources, and (e) integrating personal and professional values. SLPs demonstrated a range of ethical reasoning processes: applying bioethical principles, casuistry, and narrative reasoning when managing ethical dilemmas in the workplace. The results indicate that experienced SLPs adopted an integrated approach to ethical reasoning. They supported clients' rights to make health care choices. Bioethical principles, casuistry, and narrative reasoning provided useful frameworks for facilitating health professionals' application of codes of ethics to complex professional practice issues.
Heads First: Visual Aftereffects Reveal Hierarchical Integration of Cues to Social Attention.

Directory of Open Access Journals (Sweden)

Sarah Cooney

Full Text Available Determining where another person is attending is an important skill for social interaction that relies on various visual cues, including the turning direction of the head and body. This study reports a novel high-level visual aftereffect that addresses the important question of how these sources of information are combined in gauging social attention. We show that adapting to images of heads turned 25° to the right or left produces a perceptual bias in judging the turning direction of subsequently presented bodies. In contrast, little to no change in the judgment of head orientation occurs after adapting to extremely oriented bodies. The unidirectional nature of the aftereffect suggests that cues from the human body signaling social attention are combined in a hierarchical fashion and is consistent with evidence from single-cell recording studies in nonhuman primates showing that information about head orientation can override information about body posture when both are visible.
PROMPT: articulatietherapie vanuit tactiel-kinesthetische input

NARCIS (Netherlands)

Drs M.F. Raaijmakers; Drs Sj. van der Meulen

2005-01-01

PROMPT is a tactile-kinesthetic approach for assessment and treatment of speech production disorders. PROMPT uses tactile-kinethetic cues to facilitate motor speech behaviors. Therapy is structured from basic motor speech patterns with much tactile-lkinesthetic cueing, towards complex motor speech
The influence of spectral and spatial characteristics of early reflections on speech intelligibility

DEFF Research Database (Denmark)

Arweiler, Iris; Buchholz, Jörg; Dau, Torsten

The auditory system employs different strategies to facilitate speech intelligibility in complex listening conditions. One of them is the integration of early reflections (ER’s) with the direct sound (DS) to increase the effective speech level. So far the underlying mechanisms of ER processing have...... of listeners that speech intelligibility improved with added ER energy, but less than with added DS energy. An efficiency factor was introduced to quantify this effect. The difference in speech intelligibility could be mainly ascribed to the differences in the spectrum between the speech signals....... binaural). The direction-dependency could be explained by the spectral changes introduced by the pinna, head, and torso. The results will be important with regard to the influence of signal processing strategies in modern hearing aids on speech intelligibility, because they might alter the spectral...
Effects of self-relevant cues and cue valence on autobiographical memory specificity in dysphoria.

Science.gov (United States)

Matsumoto, Noboru; Mochizuki, Satoshi

2017-04-01

Reduced autobiographical memory specificity (rAMS) is a characteristic memory bias observed in depression. To corroborate the capture hypothesis in the CaRFAX (capture and rumination, functional avoidance, executive capacity and control) model, we investigated the effects of self-relevant cues and cue valence on rAMS using an adapted Autobiographical Memory Test conducted with a nonclinical population. Hierarchical linear modelling indicated that the main effects of depression and self-relevant cues elicited rAMS. Moreover, the three-way interaction among valence, self-relevance, and depression scores was significant. A simple slope test revealed that dysphoric participants experienced rAMS in response to highly self-relevant positive cues and low self-relevant negative cues. These results partially supported the capture hypothesis in nonclinical dysphoria. It is important to consider cue valence in future studies examining the capture hypothesis.

The analysis of speech acts patterns in two Egyptian inaugural speeches

Directory of Open Access Journals (Sweden)

Imad Hayif Sameer

2017-09-01

Full Text Available The theory of speech acts, which clarifies what people do when they speak, is not about individual words or sentences that form the basic elements of human communication, but rather about particular speech acts that are performed when uttering words. A speech act is the attempt at doing something purely by speaking. Many things can be done by speaking. Speech acts are studied under what is called speech act theory, and belong to the domain of pragmatics. In this paper, two Egyptian inaugural speeches from El-Sadat and El-Sisi, belonging to different periods were analyzed to find out whether there were differences within this genre in the same culture or not. The study showed that there was a very small difference between these two speeches which were analyzed according to Searle’s theory of speech acts. In El Sadat’s speech, commissives came to occupy the first place. Meanwhile, in El–Sisi’s speech, assertives occupied the first place. Within the speeches of one culture, we can find that the differences depended on the circumstances that surrounded the elections of the Presidents at the time. Speech acts were tools they used to convey what they wanted and to obtain support from their audiences.
The relationship between level of autistic traits and local bias in the context of the McGurk effect.

Directory of Open Access Journals (Sweden)

Yuta eUjiie

2015-06-01

Full Text Available The McGurk effect is a well-known illustration that demonstrates the influence of visual information on hearing in the context of speech perception. Some studies have reported that individuals with autism spectrum disorder (ASD display abnormal processing of audio-visual speech integration, while other studies showed contradictory results. Based on the dimensional model of ASD, we administered two analog studies to examine the link between level of autistic traits, as assessed by the Autism Spectrum Quotient (AQ, and the McGurk effect among a sample of university students. In the first experiment, we found that autistic traits correlated negatively with fused (McGurk responses. Then, we manipulated presentation types of visual stimuli to examine whether the local bias toward visual speech cues modulated individual differences in the McGurk effect. The presentation included four types of visual images, comprising no image, mouth only, mouth and eyes, and full face. The results revealed that global facial information facilitates the influence of visual speech cues on McGurk stimuli. Moreover, individual differences between groups with low and high levels of autistic traits appeared when the full-face visual speech cue with an incongruent voice condition was presented. These results suggest that individual differences in the McGurk effect might be due to a weak ability to process global facial information in individuals with high levels of autistic traits.
Speech Problems

Science.gov (United States)

... Staying Safe Videos for Educators Search English Español Speech Problems KidsHealth / For Teens / Speech Problems What's in ... a person's ability to speak clearly. Some Common Speech and Language Disorders Stuttering is a problem that ...
Alternative Speech Communication System for Persons with Severe Speech Disorders

Science.gov (United States)

Selouani, Sid-Ahmed; Sidi Yakoub, Mohammed; O'Shaughnessy, Douglas

2009-12-01

Assistive speech-enabled systems are proposed to help both French and English speaking persons with various speech disorders. The proposed assistive systems use automatic speech recognition (ASR) and speech synthesis in order to enhance the quality of communication. These systems aim at improving the intelligibility of pathologic speech making it as natural as possible and close to the original voice of the speaker. The resynthesized utterances use new basic units, a new concatenating algorithm and a grafting technique to correct the poorly pronounced phonemes. The ASR responses are uttered by the new speech synthesis system in order to convey an intelligible message to listeners. Experiments involving four American speakers with severe dysarthria and two Acadian French speakers with sound substitution disorders (SSDs) are carried out to demonstrate the efficiency of the proposed methods. An improvement of the Perceptual Evaluation of the Speech Quality (PESQ) value of 5% and more than 20% is achieved by the speech synthesis systems that deal with SSD and dysarthria, respectively.
Consistency between verbal and non-verbal affective cues: a clue to speaker credibility.

Science.gov (United States)

Gillis, Randall L; Nilsen, Elizabeth S

2017-06-01

Listeners are exposed to inconsistencies in communication; for example, when speakers' words (i.e. verbal) are discrepant with their demonstrated emotions (i.e. non-verbal). Such inconsistencies introduce ambiguity, which may render a speaker to be a less credible source of information. Two experiments examined whether children make credibility discriminations based on the consistency of speakers' affect cues. In Experiment 1, school-age children (7- to 8-year-olds) preferred to solicit information from consistent speakers (e.g. those who provided a negative statement with negative affect), over novel speakers, to a greater extent than they preferred to solicit information from inconsistent speakers (e.g. those who provided a negative statement with positive affect) over novel speakers. Preschoolers (4- to 5-year-olds) did not demonstrate this preference. Experiment 2 showed that school-age children's ratings of speakers were influenced by speakers' affect consistency when the attribute being judged was related to information acquisition (speakers' believability, "weird" speech), but not general characteristics (speakers' friendliness, likeability). Together, findings suggest that school-age children are sensitive to, and use, the congruency of affect cues to determine whether individuals are credible sources of information.
A Danish open-set speech corpus for competing-speech studies

DEFF Research Database (Denmark)

Nielsen, Jens Bo; Dau, Torsten; Neher, Tobias

2014-01-01

Studies investigating speech-on-speech masking effects commonly use closed-set speech materials such as the coordinate response measure [Bolia et al. (2000). J. Acoust. Soc. Am. 107, 1065-1066]. However, these studies typically result in very low (i.e., negative) speech recognition thresholds (SRTs......) when the competing speech signals are spatially separated. To achieve higher SRTs that correspond more closely to natural communication situations, an open-set, low-context, multi-talker speech corpus was developed. Three sets of 268 unique Danish sentences were created, and each set was recorded...... with one of three professional female talkers. The intelligibility of each sentence in the presence of speech-shaped noise was measured. For each talker, 200 approximately equally intelligible sentences were then selected and systematically distributed into 10 test lists. Test list homogeneity was assessed...
Speech entrainment enables patients with Broca’s aphasia to produce fluent speech

Science.gov (United States)

Hubbard, H. Isabel; Hudspeth, Sarah Grace; Holland, Audrey L.; Bonilha, Leonardo; Fromm, Davida; Rorden, Chris

2012-01-01

A distinguishing feature of Broca’s aphasia is non-fluent halting speech typically involving one to three words per utterance. Yet, despite such profound impairments, some patients can mimic audio-visual speech stimuli enabling them to produce fluent speech in real time. We call this effect ‘speech entrainment’ and reveal its neural mechanism as well as explore its usefulness as a treatment for speech production in Broca’s aphasia. In Experiment 1, 13 patients with Broca’s aphasia were tested in three conditions: (i) speech entrainment with audio-visual feedback where they attempted to mimic a speaker whose mouth was seen on an iPod screen; (ii) speech entrainment with audio-only feedback where patients mimicked heard speech; and (iii) spontaneous speech where patients spoke freely about assigned topics. The patients produced a greater variety of words using audio-visual feedback compared with audio-only feedback and spontaneous speech. No difference was found between audio-only feedback and spontaneous speech. In Experiment 2, 10 of the 13 patients included in Experiment 1 and 20 control subjects underwent functional magnetic resonance imaging to determine the neural mechanism that supports speech entrainment. Group results with patients and controls revealed greater bilateral cortical activation for speech produced during speech entrainment compared with spontaneous speech at the junction of the anterior insula and Brodmann area 47, in Brodmann area 37, and unilaterally in the left middle temporal gyrus and the dorsal portion of Broca’s area. Probabilistic white matter tracts constructed for these regions in the normal subjects revealed a structural network connected via the corpus callosum and ventral fibres through the extreme capsule. Unilateral areas were connected via the arcuate fasciculus. In Experiment 3, all patients included in Experiment 1 participated in a 6-week treatment phase using speech entrainment to improve speech production
Social Robotics in Therapy of Apraxia of Speech

Directory of Open Access Journals (Sweden)

José Carlos Castillo

2018-01-01

Full Text Available Apraxia of speech is a motor speech disorder in which messages from the brain to the mouth are disrupted, resulting in an inability for moving lips or tongue to the right place to pronounce sounds correctly. Current therapies for this condition involve a therapist that in one-on-one sessions conducts the exercises. Our aim is to work in the line of robotic therapies in which a robot is able to perform partially or autonomously a therapy session, endowing a social robot with the ability of assisting therapists in apraxia of speech rehabilitation exercises. Therefore, we integrate computer vision and machine learning techniques to detect the mouth pose of the user and, on top of that, our social robot performs autonomously the different steps of the therapy using multimodal interaction.
Satisficing in split-second decision making is characterized by strategic cue discounting.

Science.gov (United States)

Oh, Hanna; Beck, Jeffrey M; Zhu, Pingping; Sommer, Marc A; Ferrari, Silvia; Egner, Tobias

2016-12-01

Much of our real-life decision making is bounded by uncertain information, limitations in cognitive resources, and a lack of time to allocate to the decision process. It is thought that humans overcome these limitations through satisficing, fast but "good-enough" heuristic decision making that prioritizes some sources of information (cues) while ignoring others. However, the decision-making strategies we adopt under uncertainty and time pressure, for example during emergencies that demand split-second choices, are presently unknown. To characterize these decision strategies quantitatively, the present study examined how people solve a novel multicue probabilistic classification task under varying time pressure, by tracking shifts in decision strategies using variational Bayesian inference. We found that under low time pressure, participants correctly weighted and integrated all available cues to arrive at near-optimal decisions. With increasingly demanding, subsecond time pressures, however, participants systematically discounted a subset of the cue information by dropping the least informative cue(s) from their decision making process. Thus, the human cognitive apparatus copes with uncertainty and severe time pressure by adopting a "drop-the-worst" cue decision making strategy that minimizes cognitive time and effort investment while preserving the consideration of the most diagnostic cue information, thus maintaining "good-enough" accuracy. This advance in our understanding of satisficing strategies could form the basis of predicting human choices in high time pressure scenarios. (PsycINFO Database Record (c) 2016 APA, all rights reserved).
Robust speech dereverberation with a neural network-based post-filter that exploits multi-conditional training of binaural cues

DEFF Research Database (Denmark)

May, Tobias

2018-01-01

-frequency (T-F) units. A multi-conditional training (MCT) procedure was used to simulate the uncertainties of short-term binaural cues in response to room reverberation by mixing the direct part of head related impulse responses (HRIRs) with diffuse noise. Despite being trained with only anechoic HRIRs...
New human-centered linear and nonlinear motion cueing algorithms for control of simulator motion systems

Science.gov (United States)

Telban, Robert J.

While the performance of flight simulator motion system hardware has advanced substantially, the development of the motion cueing algorithm, the software that transforms simulated aircraft dynamics into realizable motion commands, has not kept pace. To address this, new human-centered motion cueing algorithms were developed. A revised "optimal algorithm" uses time-invariant filters developed by optimal control, incorporating human vestibular system models. The "nonlinear algorithm" is a novel approach that is also formulated by optimal control, but can also be updated in real time. It incorporates a new integrated visual-vestibular perception model that includes both visual and vestibular sensation and the interaction between the stimuli. A time-varying control law requires the matrix Riccati equation to be solved in real time by a neurocomputing approach. Preliminary pilot testing resulted in the optimal algorithm incorporating a new otolith model, producing improved motion cues. The nonlinear algorithm vertical mode produced a motion cue with a time-varying washout, sustaining small cues for longer durations and washing out large cues more quickly compared to the optimal algorithm. The inclusion of the integrated perception model improved the responses to longitudinal and lateral cues. False cues observed with the NASA adaptive algorithm were absent. As a result of unsatisfactory sensation, an augmented turbulence cue was added to the vertical mode for both the optimal and nonlinear algorithms. The relative effectiveness of the algorithms, in simulating aircraft maneuvers, was assessed with an eleven-subject piloted performance test conducted on the NASA Langley Visual Motion Simulator (VMS). Two methods, the quasi-objective NASA Task Load Index (TLX), and power spectral density analysis of pilot control, were used to assess pilot workload. TLX analysis reveals, in most cases, less workload and variation among pilots with the nonlinear algorithm. Control input
A distributed approach to speech resource collection

CSIR Research Space (South Africa)

Molapo, R

2013-12-01

Full Text Available The authors describe the integration of several tools to enable the end-to-end development of an Automatic Speech Recognition system in a typical under-resourced language. The authors analyse the data acquired by each of the tools and develop an ASR...
GALLAUDET'S NEW HEARING AND SPEECH CENTER.

Science.gov (United States)

FRISINA, D. ROBERT

THIS REPROT DESCRIBES THE DESIGN OF A NEW SPEECH AND HEARING CENTER AND ITS INTEGRATION INTO THE OVERALL ARCHITECTURAL SCHEME OF THE CAMPUS. THE CIRCULAR SHAPE WAS SELECTED TO COMPLEMENT THE SURROUNDING STRUCTURES AND COMPENSATE FOR DIFFERENCES IN SITE, WHILE PROVIDING THE ACOUSTICAL ADVANTAGES OF NON-PARALLEL WALLS, AND FACILITATING TRAFFIC FLOW.…
The attention-getting capacity of whines and child-directed speech.

Science.gov (United States)

Chang, Rosemarie Sokol; Thompson, Nicholas S

2010-06-03

The current study tested the ability of whines and child-directed speech to attract the attention of listeners involved in a story repetition task. Twenty non-parents and 17 parents were presented with two dull stories, each playing to a separate ear, and asked to repeat one of the stories verbatim. The story that participants were instructed to ignore was interrupted occasionally with the reader whining and using child-directed speech. While repeating the passage, participants were monitored for Galvanic skin response, heart rate, and blood pressure. Based on 4 measures, participants tuned in more to whining, and to a lesser extent child-directed speech, than neutral speech segments that served as a control. Participants, regardless of gender or parental status, made more mistakes when presented with the whine or child-directed speech, they recalled hearing those vocalizations, they recognized more words from the whining segment than the neutral control segment, and they exhibited higher Galvanic skin response during the presence of whines and child- directed speech than neutral speech segments. Whines and child-directed speech appear to be integral members of a suite of vocalizations designed to get the attention of attachment partners by playing to an auditory sensitivity among humans. Whines in particular may serve the function of eliciting care at a time when caregivers switch from primarily mothers to greater care from other caregivers.
The Attention-Getting Capacity of Whines and Child-Directed Speech

Directory of Open Access Journals (Sweden)

Rosemarie Sokol Chang

2010-04-01

Full Text Available The current study tested the ability of whines and child-directed speech to attract the attention of listeners involved in a story repetition task. Twenty non-parents and 17 parents were presented with two dull stories, each playing to a separate ear, and asked to repeat one of the stories verbatim. The story that participants were instructed to ignore was interrupted occasionally with the reader whining and using child-directed speech. While repeating the passage, participants were monitored for Galvanic skin response, heart rate, and blood pressure. Based on 4 measures, participants tuned in more to whining, and to a lesser extent child-directed speech, than neutral speech segments that served as a control. Participants, regardless of gender or parental status, made more mistakes when presented with the whine or child-directed speech, they recalled hearing those vocalizations, they recognized more words from the whining segment than the neutral control segment, and they exhibited higher Galvanic skin response during the presence of whines and child-directed speech than neutral speech segments. Whines and child-directed speech appear to be integral members of a suite of vocalizations designed to get the attention of attachment partners by playing to an auditory sensitivity among humans. Whines in particular may serve the function of eliciting care at a time when caregivers switch from primarily mothers to greater care from other caregivers.
Modeling Sluggishness in Binaural Unmasking of Speech for Maskers With Time-Varying Interaural Phase Differences.

Science.gov (United States)

Hauth, Christopher F; Brand, Thomas

2018-01-01

In studies investigating binaural processing in human listeners, relatively long and task-dependent time constants of a binaural window ranging from 10 ms to 250 ms have been observed. Such time constants are often thought to reflect "binaural sluggishness." In this study, the effect of binaural sluggishness on binaural unmasking of speech in stationary speech-shaped noise is investigated in 10 listeners with normal hearing. In order to design a masking signal with temporally varying binaural cues, the interaural phase difference of the noise was modulated sinusoidally with frequencies ranging from 0.25 Hz to 64 Hz. The lowest, that is the best, speech reception thresholds (SRTs) were observed for the lowest modulation frequency. SRTs increased with increasing modulation frequency up to 4 Hz. For higher modulation frequencies, SRTs remained constant in the range of 1 dB to 1.5 dB below the SRT determined in the diotic situation. The outcome of the experiment was simulated using a short-term binaural speech intelligibility model, which combines an equalization-cancellation (EC) model with the speech intelligibility index. This model segments the incoming signal into 23.2-ms time frames in order to predict release from masking in modulated noises. In order to predict the results from this study, the model required a further time constant applied to the EC mechanism representing binaural sluggishness. The best agreement with perceptual data was achieved using a temporal window of 200 ms in the EC mechanism.
Multimodal Speech Capture System for Speech Rehabilitation and Learning.

Science.gov (United States)

Sebkhi, Nordine; Desai, Dhyey; Islam, Mohammad; Lu, Jun; Wilson, Kimberly; Ghovanloo, Maysam

2017-11-01

Speech-language pathologists (SLPs) are trained to correct articulation of people diagnosed with motor speech disorders by analyzing articulators' motion and assessing speech outcome while patients speak. To assist SLPs in this task, we are presenting the multimodal speech capture system (MSCS) that records and displays kinematics of key speech articulators, the tongue and lips, along with voice, using unobtrusive methods. Collected speech modalities, tongue motion, lips gestures, and voice are visualized not only in real-time to provide patients with instant feedback but also offline to allow SLPs to perform post-analysis of articulators' motion, particularly the tongue, with its prominent but hardly visible role in articulation. We describe the MSCS hardware and software components, and demonstrate its basic visualization capabilities by a healthy individual repeating the words "Hello World." A proof-of-concept prototype has been successfully developed for this purpose, and will be used in future clinical studies to evaluate its potential impact on accelerating speech rehabilitation by enabling patients to speak naturally. Pattern matching algorithms to be applied to the collected data can provide patients with quantitative and objective feedback on their speech performance, unlike current methods that are mostly subjective, and may vary from one SLP to another.
Speech Motor Control in Fluent and Dysfluent Speech Production of an Individual with Apraxia of Speech and Broca's Aphasia

Science.gov (United States)

van Lieshout, Pascal H. H. M.; Bose, Arpita; Square, Paula A.; Steele, Catriona M.

2007-01-01

Apraxia of speech (AOS) is typically described as a motor-speech disorder with clinically well-defined symptoms, but without a clear understanding of the underlying problems in motor control. A number of studies have compared the speech of subjects with AOS to the fluent speech of controls, but only a few have included speech movement data and if…
Multiple reward-cue contingencies favor expectancy over uncertainty in shaping the reward-cue attentional salience.

Science.gov (United States)

De Tommaso, Matteo; Mastropasqua, Tommaso; Turatto, Massimo

2018-01-25

Reward-predicting cues attract attention because of their motivational value. A debated question regards the conditions under which the cue's attentional salience is governed more by reward expectancy rather than by reward uncertainty. To help shedding light on this relevant issue, here, we manipulated expectancy and uncertainty using three levels of reward-cue contingency, so that, for example, a high level of reward expectancy (p = .8) was compared with the highest level of reward uncertainty (p = .5). In Experiment 1, the best reward-cue during conditioning was preferentially attended in a subsequent visual search task. This result was replicated in Experiment 2, in which the cues were matched in terms of response history. In Experiment 3, we implemented a hybrid procedure consisting of two phases: an omission contingency procedure during conditioning, followed by a visual search task as in the previous experiments. Crucially, during both phases, the reward-cues were never task relevant. Results confirmed that, when multiple reward-cue contingencies are explored by a human observer, expectancy is the major factor controlling both the attentional and the oculomotor salience of the reward-cue.
Targeting extinction and reconsolidation mechanisms to combat the impact of drug cues on addiction.

Science.gov (United States)

Taylor, Jane R; Olausson, Peter; Quinn, Jennifer J; Torregrossa, Mary M

2009-01-01

Drug addiction is a progressive and compulsive disorder, where recurrent craving and relapse to drug-seeking occur even after long periods of abstinence. A major contributing factor to relapse is drug-associated cues. Here we review behavioral and pharmacological studies outlining novel methods of effective and persistent reductions in cue-induced relapse behavior in animal models. We focus on extinction and reconsolidation of cue-drug associations as the memory processes that are the most likely targets for interventions. Extinction involves the formation of new inhibitory memories rather than memory erasure; thus, it should be possible to facilitate the extinction of cue-drug memories to reduce relapse. We propose that context-dependency of extinction might be altered by mnemonic agents, thereby enhancing the efficacy of cue-exposure therapy as treatment strategy. In contrast, interfering with memory reconsolidation processes can disrupt the integrity or strength of specific cue-drug memories. Reconsolidation is argued to be a distinct process that occurs over a brief time period after memory is reactivated/retrieved - when the memory becomes labile and vulnerable to disruption. Reconsolidation is thought to be an independent, perhaps opposing, process to extinction and disruption of reconsolidation has recently been shown to directly affect subsequent cue-drug memory retrieval in an animal model of relapse. We hypothesize that a combined approach aimed at both enhancing the consolidation of cue-drug extinction and interfering with the reconsolidation of cue-drug memories will have a greater potential for persistently inhibiting cue-induced relapse than either treatment alone.

Can you hear me yet? An intracranial investigation of speech and non-speech audiovisual interactions in human cortex.

Science.gov (United States)

Rhone, Ariane E; Nourski, Kirill V; Oya, Hiroyuki; Kawasaki, Hiroto; Howard, Matthew A; McMurray, Bob

In everyday conversation, viewing a talker's face can provide information about the timing and content of an upcoming speech signal, resulting in improved intelligibility. Using electrocorticography, we tested whether human auditory cortex in Heschl's gyrus (HG) and on superior temporal gyrus (STG) and motor cortex on precentral gyrus (PreC) were responsive to visual/gestural information prior to the onset of sound and whether early stages of auditory processing were sensitive to the visual content (speech syllable versus non-speech motion). Event-related band power (ERBP) in the high gamma band was content-specific prior to acoustic onset on STG and PreC, and ERBP in the beta band differed in all three areas. Following sound onset, we found with no evidence for content-specificity in HG, evidence for visual specificity in PreC, and specificity for both modalities in STG. These results support models of audio-visual processing in which sensory information is integrated in non-primary cortical areas.
A Framework for Speech Enhancement with Ad Hoc Microphone Arrays

DEFF Research Database (Denmark)

Tavakoli, Vincent Mohammad; Jensen, Jesper Rindom; Christensen, Mads Græsbøll

2016-01-01

Speech enhancement is vital for improved listening practices. Ad hoc microphone arrays are promising assets for this purpose. Most well-established enhancement techniques with conventional arrays can be adapted into ad hoc scenarios. Despite recent efforts to introduce various ad hoc speech...... enhancement apparatus, a common framework for integration of conventional methods into this new scheme is still missing. This paper establishes such an abstraction based on inter and intra sub-array speech coherencies. Along with measures for signal quality at the input of sub-arrays, a measure of coherency...... is proposed both for sub-array selection in local enhancement approaches, and also for selecting a proper global reference when more than one sub-array are used. Proposed methods within this framework are evaluated with regard to quantitative and qualitative measures, including array gains, the speech...
Electrophysiological evidence for differences between fusion and combination illusions in audiovisual speech perception

DEFF Research Database (Denmark)

Baart, Martijn; Lindborg, Alma Cornelia; Andersen, Tobias S

2017-01-01

Incongruent audiovisual speech stimuli can lead to perceptual illusions such as fusions or combinations. Here, we investigated the underlying audiovisual integration process by measuring ERPs. We observed that visual speech-induced suppression of P2 amplitude (which is generally taken as a measure...... of audiovisual integration) for fusions was comparable to suppression obtained with fully congruent stimuli, whereas P2 suppression for combinations was larger. We argue that these effects arise because the phonetic incongruency is solved differently for both types of stimuli. This article is protected...
The habenula governs the attribution of incentive salience to reward predictive cues

Science.gov (United States)

Danna, Carey L.; Shepard, Paul D.; Elmer, Greg I.

2013-01-01

The attribution of incentive salience to reward associated cues is critical for motivation and the pursuit of rewards. Disruptions in the integrity of the neural systems controlling these processes can lead to avolition and anhedonia, symptoms that cross the diagnostic boundaries of many neuropsychiatric illnesses. Here, we consider whether the habenula (Hb), a region recently demonstrated to encode negatively valenced events, also modulates the attribution of incentive salience to a neutral cue predicting a food reward. The Pavlovian autoshaping paradigm was used in the rat as an investigative tool to dissociate Pavlovian learning processes imparting strictly predictive value from learning that attributes incentive motivational value. Electrolytic lesions of the fasciculus retroflexus (fr), the sole pathway through which descending Hb efferents are conveyed, significantly increased incentive salience as measured by conditioned approaches to a cue light predictive of reward. Conversely, generation of a fictive Hb signal via fr stimulation during CS+ presentation significantly decreased the incentive salience of the predictive cue. Neither manipulation altered the reward predictive value of the cue as measured by conditioned approach to the food. Our results provide new evidence supporting a significant role for the Hb in governing the attribution of incentive motivational salience to reward predictive cues and further imply that pathological changes in Hb activity could contribute to the aberrant pursuit of debilitating goals or avolition and depression-like symptoms. PMID:24368898
The habenula governs the attribution of incentive salience to reward predictive cues.

Directory of Open Access Journals (Sweden)

Carey L. Danna

2013-12-01

Full Text Available The attribution of incentive salience to reward associated cues is critical for motivation and the pursuit of rewards. Disruptions in the integrity of the neural systems controlling these processes can lead to avolition and anhedonia, symptoms that cross the diagnostic boundaries of many neuropsychiatric illnesses. Here, we consider whether the habenula (Hb, a region recently demonstrated to encode negatively valenced events, also modulates the attribution of incentive salience to a neutral cue predicting a food reward. The Pavlovian autoshaping paradigm was used in the rat as an investigative tool to dissociate Pavlovian learning processes imparting strictly predictive value from learning that attributes incentive motivational value. Electrolytic lesions of the fasciculus retroflexus (fr, the sole pathway through which descending Hb efferents are conveyed, significantly increased incentive salience as measured by conditioned approaches to a cue light predictive of reward. Conversely, generation of a fictive Hb signal via fr stimulation during CS+ presentation significantly decreased the incentive salience of the predictive cue. Neither manipulation altered the reward predictive value of the cue as measured by conditioned approach to the food. Our results provide new evidence supporting a significant role for the Hb in governing the attribution of incentive motivational salience to reward predictive cues and further imply that pathological changes in Hb activity could contribute to the aberrant pursuit of debilitating goals or avolition and depression-like symptoms.
Orthography and Modality Influence Speech Production in Adults and Children.

Science.gov (United States)

Saletta, Meredith; Goffman, Lisa; Hogan, Tiffany P

2016-12-01

The acquisition of literacy skills influences the perception and production of spoken language. We examined if orthography influences implicit processing in speech production in child readers and in adult readers with low and high reading proficiency. Children (n = 17), adults with typical reading skills (n = 17), and adults demonstrating low reading proficiency (n = 18) repeated or read aloud nonwords varying in orthographic transparency. Analyses of implicit linguistic processing (segmental accuracy and speech movement stability) were conducted. The accuracy and articulatory stability of productions of the nonwords were assessed before and after repetition or reading. Segmental accuracy results indicate that all 3 groups demonstrated greater learning when they were able to read, rather than just hear, the nonwords. Speech movement results indicate that, for adults with poor reading skills, exposure to the nonwords in a transparent spelling reduces the articulatory variability of speech production. Reading skill was correlated with speech movement stability in the groups of adults. In children and adults, orthography interacts with speech production; all participants integrate orthography into their lexical representations. Adults with poor reading skills do not use the same reading or speaking strategies as children with typical reading skills.
Perception of the Multisensory Coherence of Fluent Audiovisual Speech in Infancy: Its Emergence & the Role of Experience

Science.gov (United States)

Lewkowicz, David J.; Minar, Nicholas J.; Tift, Amy H.; Brandon, Melissa

2014-01-01

To investigate the developmental emergence of the ability to perceive the multisensory coherence of native and non-native audiovisual fluent speech, we tested 4-, 8–10, and 12–14 month-old English-learning infants. Infants first viewed two identical female faces articulating two different monologues in silence and then in the presence of an audible monologue that matched the visible articulations of one of the faces. Neither the 4-month-old nor the 8–10 month-old infants exhibited audio-visual matching in that neither group exhibited greater looking at the matching monologue. In contrast, the 12–14 month-old infants exhibited matching and, consistent with the emergence of perceptual expertise for the native language, they perceived the multisensory coherence of native-language monologues earlier in the test trials than of non-native language monologues. Moreover, the matching of native audible and visible speech streams observed in the 12–14 month olds did not depend on audio-visual synchrony whereas the matching of non-native audible and visible speech streams did depend on synchrony. Overall, the current findings indicate that the perception of the multisensory coherence of fluent audiovisual speech emerges late in infancy, that audio-visual synchrony cues are more important in the perception of the multisensory coherence of non-native than native audiovisual speech, and that the emergence of this skill most likely is affected by perceptual narrowing. PMID:25462038
Binaural enhancement for bilateral cochlear implant users.

Science.gov (United States)

Brown, Christopher A

2014-01-01

Bilateral cochlear implant (BCI) users receive limited binaural cues and, thus, show little improvement to speech intelligibility from spatial cues. The feasibility of a method for enhancing the binaural cues available to BCI users is investigated. This involved extending interaural differences of levels, which typically are restricted to high frequencies, into the low-frequency region. Speech intelligibility was measured in BCI users listening over headphones and with direct stimulation, with a target talker presented to one side of the head in the presence of a masker talker on the other side. Spatial separation was achieved by applying either naturally occurring binaural cues or enhanced cues. In this listening configuration, BCI patients showed greater speech intelligibility with the enhanced binaural cues than with naturally occurring binaural cues. In some situations, it is possible for BCI users to achieve greater speech intelligibility when binaural cues are enhanced by applying interaural differences of levels in the low-frequency region.
Cross-modal cueing in audiovisual spatial attention

DEFF Research Database (Denmark)

Blurton, Steven Paul; Greenlee, Mark W.; Gondan, Matthias

2015-01-01

effects have been reported for endogenous visual cues while exogenous cues seem to be mostly ineffective. In three experiments, we investigated cueing effects on the processing of audiovisual signals. In Experiment 1 we used endogenous cues to investigate their effect on the detection of auditory, visual......, and audiovisual targets presented with onset asynchrony. Consistent cueing effects were found in all target conditions. In Experiment 2 we used exogenous cues and found cueing effects only for visual target detection, but not auditory target detection. In Experiment 3 we used predictive exogenous cues to examine...
Effect of pictorial depth cues, binocular disparity cues and motion parallax depth cues on lightness perception in three-dimensional virtual scenes.

Directory of Open Access Journals (Sweden)

Michiteru Kitazaki

2008-09-01

Full Text Available Surface lightness perception is affected by scene interpretation. There is some experimental evidence that perceived lightness under bi-ocular viewing conditions is different from perceived lightness in actual scenes but there are also reports that viewing conditions have little or no effect on perceived color. We investigated how mixes of depth cues affect perception of lightness in three-dimensional rendered scenes containing strong gradients of illumination in depth.Observers viewed a virtual room (4 m width x 5 m height x 17.5 m depth with checkerboard walls and floor. In four conditions, the room was presented with or without binocular disparity (BD depth cues and with or without motion parallax (MP depth cues. In all conditions, observers were asked to adjust the luminance of a comparison surface to match the lightness of test surfaces placed at seven different depths (8.5-17.5 m in the scene. We estimated lightness versus depth profiles in all four depth cue conditions. Even when observers had only pictorial depth cues (no MP, no BD, they partially but significantly discounted the illumination gradient in judging lightness. Adding either MP or BD led to significantly greater discounting and both cues together produced the greatest discounting. The effects of MP and BD were approximately additive. BD had greater influence at near distances than far.These results suggest the surface lightness perception is modulated by three-dimensional perception/interpretation using pictorial, binocular-disparity, and motion-parallax cues additively. We propose a two-stage (2D and 3D processing model for lightness perception.
Contributions of speech-language therapy to the integration of individuals with Down syndrome in the workplace.

Science.gov (United States)

Barbosa, Talita Maria Monteiro Farias; Lima, Ivonaldo Leidson Barbosa; Alves, Giorvan Ânderson Dos Santos; Delgado, Isabelle Cahino

2018-03-01

To analyze the contributions of speech-language therapy in the integration of young individuals with Down syndrome (DS) into the workplace, with reference to their professionalization. A questionnaire was distributed to eight undergraduate students (tutors) who participated in a project with individuals with DS, five mothers of individuals with DS, and five employees from the institution in which the present study was conducted. The questionnaire assessed the communication, memory, behavior, social interaction, autonomy and independence of the participants with DS, called "trainees". The trainees were employed in one of five routine work sectors at the university that conducted the present study. The data collected in this descriptive and cross-sectional study were analyzed quantitatively and qualitatively. The Research Ethics Committee of the affiliated institute approved the project. Mothers and tutors rated the trainees' language skills as "good". However, their ratings differed from those of the participating employees. After the trainees with DS were placed in a work environment, significant changes were observed in their communication and autonomy. There was no improvement in the trainees' independence, but after training noticeable changes were observed in their social behavior and autonomy. Speech-language therapy during vocational training led to positive changes in the social behavior of individuals with DS, as evidenced by an increase in their autonomy and communication.
The development of multisensory speech perception continues into the late childhood years.

Science.gov (United States)

Ross, Lars A; Molholm, Sophie; Blanco, Daniella; Gomez-Ramirez, Manuel; Saint-Amour, Dave; Foxe, John J

2011-06-01

Observing a speaker's articulations substantially improves the intelligibility of spoken speech, especially under noisy listening conditions. This multisensory integration of speech inputs is crucial to effective communication. Appropriate development of this ability has major implications for children in classroom and social settings, and deficits in it have been linked to a number of neurodevelopmental disorders, especially autism. It is clear from structural imaging studies that there is a prolonged maturational course within regions of the perisylvian cortex that persists into late childhood, and these regions have been firmly established as being crucial to speech and language functions. Given this protracted maturational timeframe, we reasoned that multisensory speech processing might well show a similarly protracted developmental course. Previous work in adults has shown that audiovisual enhancement in word recognition is most apparent within a restricted range of signal-to-noise ratios (SNRs). Here, we investigated when these properties emerge during childhood by testing multisensory speech recognition abilities in typically developing children aged between 5 and 14 years, and comparing them with those of adults. By parametrically varying SNRs, we found that children benefited significantly less from observing visual articulations, displaying considerably less audiovisual enhancement. The findings suggest that improvement in the ability to recognize speech-in-noise and in audiovisual integration during speech perception continues quite late into the childhood years. The implication is that a considerable amount of multisensory learning remains to be achieved during the later schooling years, and that explicit efforts to accommodate this learning may well be warranted. European Journal of Neuroscience © 2011 Federation of European Neuroscience Societies and Blackwell Publishing Ltd. No claim to original US government works.
Neurophysiological evidence for the interplay of speech segmentation and word-referent mapping during novel word learning.

Science.gov (United States)

François, Clément; Cunillera, Toni; Garcia, Enara; Laine, Matti; Rodriguez-Fornells, Antoni

2017-04-01

Learning a new language requires the identification of word units from continuous speech (the speech segmentation problem) and mapping them onto conceptual representation (the word to world mapping problem). Recent behavioral studies have revealed that the statistical properties found within and across modalities can serve as cues for both processes. However, segmentation and mapping have been largely studied separately, and thus it remains unclear whether both processes can be accomplished at the same time and if they share common neurophysiological features. To address this question, we recorded EEG of 20 adult participants during both an audio alone speech segmentation task and an audiovisual word-to-picture association task. The participants were tested for both the implicit detection of online mismatches (structural auditory and visual semantic violations) as well as for the explicit recognition of words and word-to-picture associations. The ERP results from the learning phase revealed a delayed learning-related fronto-central negativity (FN400) in the audiovisual condition compared to the audio alone condition. Interestingly, while online structural auditory violations elicited clear MMN/N200 components in the audio alone condition, visual-semantic violations induced meaning-related N400 modulations in the audiovisual condition. The present results support the idea that speech segmentation and meaning mapping can take place in parallel and act in synergy to enhance novel word learning. Copyright © 2016 Elsevier Ltd. All rights reserved.
Auditory-visual speech integration by prelinguistic infants: perception of an emergent consonant in the McGurk effect.

Science.gov (United States)

Burnham, Denis; Dodd, Barbara

2004-12-01

The McGurk effect, in which auditory [ba] dubbed onto [ga] lip movements is perceived as "da" or "tha," was employed in a real-time task to investigate auditory-visual speech perception in prelingual infants. Experiments 1A and 1B established the validity of real-time dubbing for producing the effect. In Experiment 2, 4 1/2-month-olds were tested in a habituation-test paradigm, in which an auditory-visual stimulus was presented contingent upon visual fixation of a live face. The experimental group was habituated to a McGurk stimulus (auditory [ba] visual [ga]), and the control group to matching auditory-visual [ba]. Each group was then presented with three auditory-only test trials, [ba], [da], and [(delta)a] (as in then). Visual-fixation durations in test trials showed that the experimental group treated the emergent percept in the McGurk effect, [da] or [(delta)a], as familiar (even though they had not heard these sounds previously) and [ba] as novel. For control group infants [da] and [(delta)a] were no more familiar than [ba]. These results are consistent with infants' perception of the McGurk effect, and support the conclusion that prelinguistic infants integrate auditory and visual speech information. Copyright 2004 Wiley Periodicals, Inc.
Preconditioning of Spatial and Auditory Cues: Roles of the Hippocampus, Frontal Cortex, and Cue-Directed Attention

Directory of Open Access Journals (Sweden)

Andrew C. Talk

2016-12-01

Full Text Available Loss of function of the hippocampus or frontal cortex is associated with reduced performance on memory tasks, in which subjects are incidentally exposed to cues at specific places in the environment and are subsequently asked to recollect the location at which the cue was experienced. Here, we examined the roles of the rodent hippocampus and frontal cortex in cue-directed attention during encoding of memory for the location of a single incidentally experienced cue. During a spatial sensory preconditioning task, rats explored an elevated platform while an auditory cue was incidentally presented at one corner. The opposite corner acted as an unpaired control location. The rats demonstrated recollection of location by avoiding the paired corner after the auditory cue was in turn paired with shock. Damage to either the dorsal hippocampus or the frontal cortex impaired this memory ability. However, we also found that hippocampal lesions enhanced attention directed towards the cue during the encoding phase, while frontal cortical lesions reduced cue-directed attention. These results suggest that the deficit in spatial sensory preconditioning caused by frontal cortical damage may be mediated by inattention to the location of cues during the latent encoding phase, while deficits following hippocampal damage must be related to other mechanisms such as generation of neural plasticity.
Preconditioning of Spatial and Auditory Cues: Roles of the Hippocampus, Frontal Cortex, and Cue-Directed Attention

Science.gov (United States)

Talk, Andrew C.; Grasby, Katrina L.; Rawson, Tim; Ebejer, Jane L.

2016-01-01

Loss of function of the hippocampus or frontal cortex is associated with reduced performance on memory tasks, in which subjects are incidentally exposed to cues at specific places in the environment and are subsequently asked to recollect the location at which the cue was experienced. Here, we examined the roles of the rodent hippocampus and frontal cortex in cue-directed attention during encoding of memory for the location of a single incidentally experienced cue. During a spatial sensory preconditioning task, rats explored an elevated platform while an auditory cue was incidentally presented at one corner. The opposite corner acted as an unpaired control location. The rats demonstrated recollection of location by avoiding the paired corner after the auditory cue was in turn paired with shock. Damage to either the dorsal hippocampus or the frontal cortex impaired this memory ability. However, we also found that hippocampal lesions enhanced attention directed towards the cue during the encoding phase, while frontal cortical lesions reduced cue-directed attention. These results suggest that the deficit in spatial sensory preconditioning caused by frontal cortical damage may be mediated by inattention to the location of cues during the latent encoding phase, while deficits following hippocampal damage must be related to other mechanisms such as generation of neural plasticity. PMID:27999366
Enhancement of speech signals - with a focus on voiced speech models

DEFF Research Database (Denmark)

Nørholm, Sidsel Marie

This thesis deals with speech enhancement, i.e., noise reduction in speech signals. This has applications in, e.g., hearing aids and teleconference systems. We consider a signal-driven approach to speech enhancement where a model of the speech is assumed and filters are generated based...... on this model. The basic model used in this thesis is the harmonic model which is a commonly used model for describing the voiced part of the speech signal. We show that it can be beneficial to extend the model to take inharmonicities or the non-stationarity of speech into account. Extending the model...
Grasp cueing and joint attention.

Science.gov (United States)

Tschentscher, Nadja; Fischer, Martin H

2008-10-01

We studied how two different hand posture cues affect joint attention in normal observers. Visual targets appeared over lateralized objects, with different delays after centrally presented hand postures. Attention was cued by either hand direction or the congruency between hand aperture and object size. Participants pressed a button when they detected a target. Direction cues alone facilitated target detection following short delays but aperture cues alone were ineffective. In contrast, when hand postures combined direction and aperture cues, aperture congruency effects without directional congruency effects emerged and persisted, but only for power grips. These results suggest that parallel parameter specification makes joint attention mechanisms exquisitely sensitive to the timing and content of contextual cues.
Blind speech separation system for humanoid robot with FastICA for audio filtering and separation

Science.gov (United States)

Budiharto, Widodo; Santoso Gunawan, Alexander Agung

2016-07-01

Nowadays, there are many developments in building intelligent humanoid robot, mainly in order to handle voice and image. In this research, we propose blind speech separation system using FastICA for audio filtering and separation that can be used in education or entertainment. Our main problem is to separate the multi speech sources and also to filter irrelevant noises. After speech separation step, the results will be integrated with our previous speech and face recognition system which is based on Bioloid GP robot and Raspberry Pi 2 as controller. The experimental results show the accuracy of our blind speech separation system is about 88% in command and query recognition cases.
A Double Dissociation between Anterior and Posterior Superior Temporal Gyrus for Processing Audiovisual Speech Demonstrated by Electrocorticography.

Science.gov (United States)

Ozker, Muge; Schepers, Inga M; Magnotti, John F; Yoshor, Daniel; Beauchamp, Michael S

2017-06-01

Human speech can be comprehended using only auditory information from the talker's voice. However, comprehension is improved if the talker's face is visible, especially if the auditory information is degraded as occurs in noisy environments or with hearing loss. We explored the neural substrates of audiovisual speech perception using electrocorticography, direct recording of neural activity using electrodes implanted on the cortical surface. We observed a double dissociation in the responses to audiovisual speech with clear and noisy auditory component within the superior temporal gyrus (STG), a region long known to be important for speech perception. Anterior STG showed greater neural activity to audiovisual speech with clear auditory component, whereas posterior STG showed similar or greater neural activity to audiovisual speech in which the speech was replaced with speech-like noise. A distinct border between the two response patterns was observed, demarcated by a landmark corresponding to the posterior margin of Heschl's gyrus. To further investigate the computational roles of both regions, we considered Bayesian models of multisensory integration, which predict that combining the independent sources of information available from different modalities should reduce variability in the neural responses. We tested this prediction by measuring the variability of the neural responses to single audiovisual words. Posterior STG showed smaller variability than anterior STG during presentation of audiovisual speech with noisy auditory component. Taken together, these results suggest that posterior STG but not anterior STG is important for multisensory integration of noisy auditory and visual speech.

An experimental Dutch keyboard-to-speech system for the speech impaired

NARCIS (Netherlands)

Deliege, R.J.H.

1989-01-01

An experimental Dutch keyboard-to-speech system has been developed to explor the possibilities and limitations of Dutch speech synthesis in a communication aid for the speech impaired. The system uses diphones and a formant synthesizer chip for speech synthesis. Input to the system is in
Impact of DCS-facilitated cue exposure therapy on brain activation to cocaine cues in cocaine dependence.

Science.gov (United States)

Prisciandaro, James J; Myrick, Hugh; Henderson, Scott; McRae-Clark, Aimee L; Santa Ana, Elizabeth J; Saladin, Michael E; Brady, Kathleen T

2013-09-01

The development of addiction is marked by a pathological associative learning process that imbues incentive salience to stimuli associated with drug use. Recent efforts to treat addiction have targeted this learning process using cue exposure therapy augmented with d-cycloserine (DCS), a glutamatergic agent hypothesized to enhance extinction learning. To better understand the impact of DCS-facilitated extinction on neural reactivity to drug cues, the present study reports fMRI findings from a randomized, double-blind, placebo-controlled trial of DCS-facilitated cue exposure for cocaine dependence. Twenty-five participants completed two MRI sessions (before and after intervention), with a cocaine-cue reactivity fMRI task. The intervention consisted of 50mg of DCS or placebo, combined with two sessions of cocaine cue exposure and skills training. Participants demonstrated cocaine cue activation in a variety of brain regions at baseline. From the pre- to post-study scan, participants experienced decreased activation to cues in a number of regions (e.g., accumbens, caudate, frontal poles). Unexpectedly, placebo participants experienced decreases in activation to cues in the left angular and middle temporal gyri and the lateral occipital cortex, while DCS participants did not. Three trials of DCS-facilitated cue exposure therapy for cocaine dependence have found that DCS either increases or does not significantly impact response to cocaine cues. The present study adds to this literature by demonstrating that DCS may prevent extinction to cocaine cues in temporal and occipital brain regions. Although consistent with past research, results from the present study should be considered preliminary until replicated in larger samples. Copyright © 2013 Elsevier Ireland Ltd. All rights reserved.
Speech Function and Speech Role in Carl Fredricksen's Dialogue on Up Movie

OpenAIRE

Rehana, Ridha; Silitonga, Sortha

2013-01-01

One aim of this article is to show through a concrete example how speech function and speech role used in movie. The illustrative example is taken from the dialogue of Up movie. Central to the analysis proper form of dialogue on Up movie that contain of speech function and speech role; i.e. statement, offer, question, command, giving, and demanding. 269 dialogue were interpreted by actor, and it was found that the use of speech function and speech role.
Experimental comparison between speech transmission index, rapid speech transmission index, and speech intelligibility index.

Science.gov (United States)

Larm, Petra; Hongisto, Valtteri

2006-02-01

During the acoustical design of, e.g., auditoria or open-plan offices, it is important to know how speech can be perceived in various parts of the room. Different objective methods have been developed to measure and predict speech intelligibility, and these have been extensively used in various spaces. In this study, two such methods were compared, the speech transmission index (STI) and the speech intelligibility index (SII). Also the simplification of the STI, the room acoustics speech transmission index (RASTI), was considered. These quantities are all based on determining an apparent speech-to-noise ratio on selected frequency bands and summing them using a specific weighting. For comparison, some data were needed on the possible differences of these methods resulting from the calculation scheme and also measuring equipment. Their prediction accuracy was also of interest. Measurements were made in a laboratory having adjustable noise level and absorption, and in a real auditorium. It was found that the measurement equipment, especially the selection of the loudspeaker, can greatly affect the accuracy of the results. The prediction accuracy of the RASTI was found acceptable, if the input values for the prediction are accurately known, even though the studied space was not ideally diffuse.
Food and conspecific chemical cues modify visual behavior of zebrafish, Danio rerio.

Science.gov (United States)

Stephenson, Jessica F; Partridge, Julian C; Whitlock, Kathleen E

2012-06-01

Animals use the different qualities of olfactory and visual sensory information to make decisions. Ethological and electrophysiological evidence suggests that there is cross-modal priming between these sensory systems in fish. We present the first experimental study showing that ecologically relevant chemical mixtures alter visual behavior, using adult male and female zebrafish, Danio rerio. Neutral-density filters were used to attenuate the light reaching the tank to an initial light intensity of 2.3×10(16) photons/s/m2. Fish were exposed to food cue and to alarm cue. The light intensity was then increased by the removal of one layer of filter (nominal absorbance 0.3) every minute until, after 10 minutes, the light level was 15.5×10(16) photons/s/m2. Adult male and female zebrafish responded to a moving visual stimulus at lower light levels if they had been first exposed to food cue, or to conspecific alarm cue. These results suggest the need for more integrative studies of sensory biology.
Integrating mechanisms of visual guidance in naturalistic language production.

Science.gov (United States)

Coco, Moreno I; Keller, Frank

2015-05-01

Situated language production requires the integration of visual attention and linguistic processing. Previous work has not conclusively disentangled the role of perceptual scene information and structural sentence information in guiding visual attention. In this paper, we present an eye-tracking study that demonstrates that three types of guidance, perceptual, conceptual, and structural, interact to control visual attention. In a cued language production experiment, we manipulate perceptual (scene clutter) and conceptual guidance (cue animacy) and measure structural guidance (syntactic complexity of the utterance). Analysis of the time course of language production, before and during speech, reveals that all three forms of guidance affect the complexity of visual responses, quantified in terms of the entropy of attentional landscapes and the turbulence of scan patterns, especially during speech. We find that perceptual and conceptual guidance mediate the distribution of attention in the scene, whereas structural guidance closely relates to scan pattern complexity. Furthermore, the eye-voice span of the cued object and its perceptual competitor are similar; its latency mediated by both perceptual and structural guidance. These results rule out a strict interpretation of structural guidance as the single dominant form of visual guidance in situated language production. Rather, the phase of the task and the associated demands of cross-modal cognitive processing determine the mechanisms that guide attention.
Using Pitch, Amplitude Modulation, and Spatial Cues for Separation of Harmonic Instruments from Stereo Music Recordings

Directory of Open Access Journals (Sweden)

Bryan Pardo

2007-01-01

Full Text Available Recent work in blind source separation applied to anechoic mixtures of speech allows for improved reconstruction of sources that rarely overlap in a time-frequency representation. While the assumption that speech mixtures do not overlap significantly in time-frequency is reasonable, music mixtures rarely meet this constraint, requiring new approaches. We introduce a method that uses spatial cues from anechoic, stereo music recordings and assumptions regarding the structure of musical source signals to effectively separate mixtures of tonal music. We discuss existing techniques to create partial source signal estimates from regions of the mixture where source signals do not overlap significantly. We use these partial signals within a new demixing framework, in which we estimate harmonic masks for each source, allowing the determination of the number of active sources in important time-frequency frames of the mixture. We then propose a method for distributing energy from time-frequency frames of the mixture to multiple source signals. This allows dealing with mixtures that contain time-frequency frames in which multiple harmonic sources are active without requiring knowledge of source characteristics.
Robust Speech/Non-Speech Classification in Heterogeneous Multimedia Content

NARCIS (Netherlands)

Huijbregts, M.A.H.; de Jong, Franciska M.G.

In this paper we present a speech/non-speech classification method that allows high quality classification without the need to know in advance what kinds of audible non-speech events are present in an audio recording and that does not require a single parameter to be tuned on in-domain data. Because
Audio-Visual Speech Recognition Using MPEG-4 Compliant Visual Features

Directory of Open Access Journals (Sweden)

Petar S. Aleksic

2002-11-01

Full Text Available We describe an audio-visual automatic continuous speech recognition system, which significantly improves speech recognition performance over a wide range of acoustic noise levels, as well as under clean audio conditions. The system utilizes facial animation parameters (FAPs supported by the MPEG-4 standard for the visual representation of speech. We also describe a robust and automatic algorithm we have developed to extract FAPs from visual data, which does not require hand labeling or extensive training procedures. The principal component analysis (PCA was performed on the FAPs in order to decrease the dimensionality of the visual feature vectors, and the derived projection weights were used as visual features in the audio-visual automatic speech recognition (ASR experiments. Both single-stream and multistream hidden Markov models (HMMs were used to model the ASR system, integrate audio and visual information, and perform a relatively large vocabulary (approximately 1000 words speech recognition experiments. The experiments performed use clean audio data and audio data corrupted by stationary white Gaussian noise at various SNRs. The proposed system reduces the word error rate (WER by 20% to 23% relatively to audio-only speech recognition WERs, at various SNRs (0Ã¢Â€Â“30 dB with additive white Gaussian noise, and by 19% relatively to audio-only speech recognition WER under clean audio conditions.
An Eye Tracking Comparison of External Pointing Cues and Internal Continuous Cues in Learning with Complex Animations

Science.gov (United States)

Boucheix, Jean-Michel; Lowe, Richard K.

2010-01-01

Two experiments used eye tracking to investigate a novel cueing approach for directing learner attention to low salience, high relevance aspects of a complex animation. In the first experiment, comprehension of a piano mechanism animation containing spreading-colour cues was compared with comprehension obtained with arrow cues or no cues. Eye…
Relative Weighting of Semantic and Syntactic Cues in Native and Non-Native Listeners' Recognition of English Sentences.

Science.gov (United States)

Shi, Lu-Feng; Koenig, Laura L

2016-01-01

Non-native listeners do not recognize English sentences as effectively as native listeners, especially in noise. It is not entirely clear to what extent such group differences arise from differences in relative weight of semantic versus syntactic cues. This study quantified the use and weighting of these contextual cues via Boothroyd and Nittrouer's j and k factors. The j represents the probability of recognizing sentences with or without context, whereas the k represents the degree to which context improves recognition performance. Four groups of 13 normal-hearing young adult listeners participated. One group consisted of native English monolingual (EMN) listeners, whereas the other three consisted of non-native listeners contrasting in their language dominance and first language: English-dominant Russian-English, Russian-dominant Russian-English, and Spanish-dominant Spanish-English bilinguals. All listeners were presented three sets of four-word sentences: high-predictability sentences included both semantic and syntactic cues, low-predictability sentences included syntactic cues only, and zero-predictability sentences included neither semantic nor syntactic cues. Sentences were presented at 65 dB SPL binaurally in the presence of speech-spectrum noise at +3 dB SNR. Listeners orally repeated each sentence and recognition was calculated for individual words as well as the sentence as a whole. Comparable j values across groups for high-predictability, low-predictability, and zero-predictability sentences suggested that all listeners, native and non-native, utilized contextual cues to recognize English sentences. Analysis of the k factor indicated that non-native listeners took advantage of syntax as effectively as EMN listeners. However, only English-dominant bilinguals utilized semantics to the same extent as EMN listeners; semantics did not provide a significant benefit for the two non-English-dominant groups. When combined, semantics and syntax benefitted EMN
Hearing and seeing meaning in noise: Alpha, beta, and gamma oscillations predict gestural enhancement of degraded speech comprehension.

Science.gov (United States)

Drijvers, Linda; Özyürek, Asli; Jensen, Ole

2018-05-01

During face-to-face communication, listeners integrate speech with gestures. The semantic information conveyed by iconic gestures (e.g., a drinking gesture) can aid speech comprehension in adverse listening conditions. In this magnetoencephalography (MEG) study, we investigated the spatiotemporal neural oscillatory activity associated with gestural enhancement of degraded speech comprehension. Participants watched videos of an actress uttering clear or degraded speech, accompanied by a gesture or not and completed a cued-recall task after watching every video. When gestures semantically disambiguated degraded speech comprehension, an alpha and beta power suppression and a gamma power increase revealed engagement and active processing in the hand-area of the motor cortex, the extended language network (LIFG/pSTS/STG/MTG), medial temporal lobe, and occipital regions. These observed low- and high-frequency oscillatory modulations in these areas support general unification, integration and lexical access processes during online language comprehension, and simulation of and increased visual attention to manual gestures over time. All individual oscillatory power modulations associated with gestural enhancement of degraded speech comprehension predicted a listener's correct disambiguation of the degraded verb after watching the videos. Our results thus go beyond the previously proposed role of oscillatory dynamics in unimodal degraded speech comprehension and provide first evidence for the role of low- and high-frequency oscillations in predicting the integration of auditory and visual information at a semantic level. © 2018 The Authors Human Brain Mapping Published by Wiley Periodicals, Inc.
New Methods for Prosodic Transcription: Capturing Variability as a Source of Information

Directory of Open Access Journals (Sweden)

Jennifer Cole

2016-06-01

RPT has, it has the potential to provide a level of detail that will be useful in modelling systematic context-governed variation in the implementation of prosodic categories, with applications in automatic speech synthesis and recognition, as well as modelling human speech production and perception. We discuss how RPT and cue specification, particularly when combined, can improve the efficiency and reliability of prosodic transcription and how they can be integrated with expert phonological transcription.
Intelligibility of speech of children with speech and sound disorders

OpenAIRE

Ivetac, Tina

2014-01-01

The purpose of this study is to examine speech intelligibility of children with primary speech and sound disorders aged 3 to 6 years in everyday life. The research problem is based on the degree to which parents or guardians, immediate family members (sister, brother, grandparents), extended family members (aunt, uncle, cousin), child's friends, other acquaintances, child's teachers and strangers understand the speech of children with speech sound disorders. We examined whether the level ...
Plant responsiveness to root-root communication of stress cues.

Science.gov (United States)

Falik, Omer; Mordoch, Yonat; Ben-Natan, Daniel; Vanunu, Miriam; Goldstein, Oron; Novoplansky, Ariel

2012-07-01

Phenotypic plasticity is based on the organism's ability to perceive, integrate and respond to multiple signals and cues informative of environmental opportunities and perils. A growing body of evidence demonstrates that plants are able to adapt to imminent threats by perceiving cues emitted from their damaged neighbours. Here, the hypothesis was tested that unstressed plants are able to perceive and respond to stress cues emitted from their drought- and osmotically stressed neighbours and to induce stress responses in additional unstressed plants. Split-root Pisum sativum, Cynodon dactylon, Digitaria sanguinalis and Stenotaphrum secundatum plants were subjected to osmotic stress or drought while sharing one of their rooting volumes with an unstressed neighbour, which in turn shared its other rooting volume with additional unstressed neighbours. Following the kinetics of stomatal aperture allowed testing for stress responses in both the stressed plants and their unstressed neighbours. In both P. sativum plants and the three wild clonal grasses, infliction of osmotic stress or drought caused stomatal closure in both the stressed plants and in their unstressed neighbours. While both continuous osmotic stress and drought induced prolonged stomatal closure and limited acclimation in stressed plants, their unstressed neighbours habituated to the stress cues and opened their stomata 3-24 h after the beginning of stress induction. The results demonstrate a novel type of plant communication, by which plants might be able to increase their readiness to probable future osmotic and drought stresses. Further work is underway to decipher the identity and mode of operation of the involved communication vectors and to assess the potential ecological costs and benefits of emitting and perceiving drought and osmotic stress cues under various ecological scenarios.
The Influence of Surface and Deep Cues on Primary and Secondary School Students' Assessment of Relevance in Web Menus

Science.gov (United States)

Rouet, Jean-Francois; Ros, Christine; Goumi, Antonine; Macedo-Rouet, Monica; Dinet, Jerome

2011-01-01

Two experiments investigated primary and secondary school students' Web menu selection strategies using simulated Web search tasks. It was hypothesized that students' selections of websites depend on their perception and integration of multiple relevance cues. More specifically, students should be able to disentangle superficial cues (e.g.,…
Introspective responses to cues and motivation to reduce cigarette smoking influence state and behavioral responses to cue exposure.

Science.gov (United States)

Veilleux, Jennifer C; Skinner, Kayla D

2016-09-01

In the current study, we aimed to extend smoking cue-reactivity research by evaluating delay discounting as an outcome of cigarette cue exposure. We also separated introspection in response to cues (e.g., self-reporting craving and affect) from cue exposure alone, to determine if introspection changes behavioral responses to cigarette cues. Finally, we included measures of quit motivation and resistance to smoking to assess motivational influences on cue exposure. Smokers were invited to participate in an online cue-reactivity study. Participants were randomly assigned to view smoking images or neutral images, and were randomized to respond to cues with either craving and affect questions (e.g., introspection) or filler questions. Following cue exposure, participants completed a delay discounting task and then reported state affect, craving, and resistance to smoking, as well as an assessment of quit motivation. We found that after controlling for trait impulsivity, participants who introspected on craving and affect showed higher delay discounting, irrespective of cue type, but we found no effect of response condition on subsequent craving (e.g., craving reactivity). We also found that motivation to quit interacted with experimental conditions to predict state craving and state resistance to smoking. Although asking about craving during cue exposure did not increase later craving, it resulted in greater delaying of discounted rewards. Overall, our findings suggest the need to further assess the implications of introspection and motivation on behavioral outcomes of cue exposure. Copyright © 2016 Elsevier Ltd. All rights reserved.
Effects of cue-exposure treatment on neural cue reactivity in alcohol dependence: a randomized trial.

Science.gov (United States)

Vollstädt-Klein, Sabine; Loeber, Sabine; Kirsch, Martina; Bach, Patrick; Richter, Anne; Bühler, Mira; von der Goltz, Christoph; Hermann, Derik; Mann, Karl; Kiefer, Falk

2011-06-01

In alcohol-dependent patients, alcohol-associated cues elicit brain activation in mesocorticolimbic networks involved in relapse mechanisms. Cue-exposure based extinction training (CET) has been shown to be efficacious in the treatment of alcoholism; however, it has remained unexplored whether CET mediates its therapeutic effects via changes of activity in mesolimbic networks in response to alcohol cues. In this study, we assessed CET treatment effects on cue-induced responses using functional magnetic resonance imaging (fMRI). In a randomized controlled trial, abstinent alcohol-dependent patients were randomly assigned to a CET group (n = 15) or a control group (n = 15). All patients underwent an extended detoxification treatment comprising medically supervised detoxification, health education, and supportive therapy. The CET patients additionally received nine CET sessions over 3 weeks, exposing the patient to his/her preferred alcoholic beverage. Cue-induced fMRI activation to alcohol cues was measured at pretreatment and posttreatment. Compared with pretreatment, fMRI cue-reactivity reduction was greater in the CET relative to the control group, especially in the anterior cingulate gyrus and the insula, as well as limbic and frontal regions. Before treatment, increased cue-induced fMRI activation was found in limbic and reward-related brain regions and in visual areas. After treatment, the CET group showed less activation than the control group in the left ventral striatum. The study provides first evidence that an exposure-based psychotherapeutic intervention in the treatment of alcoholism impacts on brain areas relevant for addiction memory and attentional focus to alcohol-associated cues and affects mesocorticolimbic reward pathways suggested to be pathophysiologically involved in addiction. Copyright © 2011 Society of Biological Psychiatry. Published by Elsevier Inc. All rights reserved.
Global Repetition Influences Contextual Cueing

Science.gov (United States)

Zang, Xuelian; Zinchenko, Artyom; Jia, Lina; Li, Hong

2018-01-01

Our visual system has a striking ability to improve visual search based on the learning of repeated ambient regularities, an effect named contextual cueing. Whereas most of the previous studies investigated contextual cueing effect with the same number of repeated and non-repeated search displays per block, the current study focused on whether a global repetition frequency formed by different presentation ratios between the repeated and non-repeated configurations influence contextual cueing effect. Specifically, the number of repeated and non-repeated displays presented in each block was manipulated: 12:12, 20:4, 4:20, and 4:4 in Experiments 1–4, respectively. The results revealed a significant contextual cueing effect when the global repetition frequency is high (≥1:1 ratio) in Experiments 1, 2, and 4, given that processing of repeated displays was expedited relative to non-repeated displays. Nevertheless, the contextual cueing effect reduced to a non-significant level when the repetition frequency reduced to 4:20 in Experiment 3. These results suggested that the presentation frequency of repeated relative to the non-repeated displays could influence the strength of contextual cueing. In other words, global repetition statistics could be a crucial factor to mediate contextual cueing effect. PMID:29636716
The integration of prosodic speech in high functioning autism: a preliminary FMRI study.

Directory of Open Access Journals (Sweden)

Isabelle Hesling

2010-07-01

Full Text Available Autism is a neurodevelopmental disorder characterized by a specific triad of symptoms such as abnormalities in social interaction, abnormalities in communication and restricted activities and interests. While verbal autistic subjects may present a correct mastery of the formal aspects of speech, they have difficulties in prosody (music of speech, leading to communication disorders. Few behavioural studies have revealed a prosodic impairment in children with autism, and among the few fMRI studies aiming at assessing the neural network involved in language, none has specifically studied prosodic speech. The aim of the present study was to characterize specific prosodic components such as linguistic prosody (intonation, rhythm and emphasis and emotional prosody and to correlate them with the neural network underlying them.We used a behavioural test (Profiling Elements of the Prosodic System, PEPS and fMRI to characterize prosodic deficits and investigate the neural network underlying prosodic processing. Results revealed the existence of a link between perceptive and productive prosodic deficits for some prosodic components (rhythm, emphasis and affect in HFA and also revealed that the neural network involved in prosodic speech perception exhibits abnormal activation in the left SMG as compared to controls (activation positively correlated with intonation and emphasis and an absence of deactivation patterns in regions involved in the default mode.These prosodic impairments could not only result from activation patterns abnormalities but also from an inability to adequately use the strategy of the default network inhibition, both mechanisms that have to be considered for decreasing task performance in High Functioning Autism.

Some links on this page may take you to non-federal websites. Their policies may differ from this site.