Face-to-face communication involves both hearing and seeing speech. Heard and seen speech inputs interact during audiovisual speech perception. Specifically, seeing the speaker's mouth and lip movements improves identification of acoustic speech stimuli, especially in noisy conditions. In addition, visual speech may even change the auditory percept. This occurs when mismatching auditory speech is dubbed onto visual articulation. Research on the brain mechanisms of audiovisual perception a...
Peelle, Jonathan E; Sommers, Mitchell S
During face-to-face conversational speech listeners must efficiently process a rapid and complex stream of multisensory information. Visual speech can serve as a critical complement to auditory information because it provides cues to both the timing of the incoming acoustic signal (the amplitude envelope, influencing attention and perceptual sensitivity) and its content (place and manner of articulation, constraining lexical selection). Here we review behavioral and neurophysiological evidence regarding listeners' use of visual speech information. Multisensory integration of audiovisual speech cues improves recognition accuracy, particularly for speech in noise. Even when speech is intelligible based solely on auditory information, adding visual information may reduce the cognitive demands placed on listeners through increasing the precision of prediction. Electrophysiological studies demonstrate that oscillatory cortical entrainment to speech in auditory cortex is enhanced when visual speech is present, increasing sensitivity to important acoustic cues. Neuroimaging studies also suggest increased activity in auditory cortex when congruent visual information is available, but additionally emphasize the involvement of heteromodal regions of posterior superior temporal sulcus as playing a role in integrative processing. We interpret these findings in a framework of temporally-focused lexical competition in which visual speech information affects auditory processing to increase sensitivity to acoustic information through an early integration mechanism, and a late integration stage that incorporates specific information about a speaker's articulators to constrain the number of possible candidates in a spoken utterance. Ultimately it is words compatible with both auditory and visual information that most strongly determine successful speech perception during everyday listening. Thus, audiovisual speech perception is accomplished through multiple stages of integration
Audiovisual information is integrated in speech perception. One manifestation of this is the McGurk illusion in which watching the articulating face alters the auditory phonetic percept. Understanding this phenomenon fully requires a computational model with predictive power. Here, we describe...... ordinal models that can account for the McGurk illusion. We compare this type of models to the Fuzzy Logical Model of Perception (FLMP) in which the response categories are not ordered. While the FLMP generally fit the data better than the ordinal model it also employs more free parameters in complex...... experiments when the number of response categories are high as it is for speech perception in general. Testing the predictive power of the models using a form of cross-validation we found that ordinal models perform better than the FLMP. Based on these findings we suggest that ordinal models generally have...
Eskelund, Kasper; Dau, Torsten
auditory speech percept? In two experiments, which both combine behavioral and neurophysiological measures, an uncovering of the relation between perception of faces and of audiovisual integration is attempted. Behavioral findings suggest a strong effect of face perception, whereas the MMN results are less......Speech perception integrates signal from ear and eye. This is witnessed by a wide range of audiovisual integration effects, such as ventriloquism and the McGurk illusion. Some behavioral evidence suggest that audiovisual integration of specific aspects is special for speech perception. However, our...... mismatch negativity response (MMN). MMN has the property of being evoked when an acoustic stimulus deviates from a learned pattern of stimuli. In three experimental studies, this effect is utilized to track when a coinciding visual signal alters auditory speech perception. Visual speech emanates from the...
Andersen, Tobias; Tiippana, K.; Laarni, J.;
recent reports have challenged this view. Here we study the effect of visual spatial attention on the McGurk effect. By presenting a movie of two faces symmetrically displaced to each side of a central fixation point and dubbed with a single auditory speech track, we were able to discern the influences......Auditory and visual information is integrated when perceiving speech, as evidenced by the McGurk effect in which viewing an incongruent talking face categorically alters auditory speech perception. Audiovisual integration in speech perception has long been considered automatic and pre-attentive but...... from each of the faces and from the voice on the auditory speech percept. We found that directing visual spatial attention towards a face increased the influence of that face on auditory perception. However, the influence of the voice on auditory perception did not change suggesting that audiovisual...
Andersen, Tobias; Tiippana, K.; Laarni, J.; Kojo, I.; Sams, M.
Auditory and visual information is integrated when perceiving speech, as evidenced by the McGurk effect in which viewing an incongruent talking face categorically alters auditory speech perception. Audiovisual integration in speech perception has long been considered automatic and pre-attentive but recent reports have challenged this view. Here we study the effect of visual spatial attention on the McGurk effect. By presenting a movie of two faces symmetrically displaced to each side of a cen...
Barkhuysen, Pashiera; Krahmer, E.J.; Swerts, M.G.J.
In this article we report on two experiments about the perception of audiovisual cues to emotional speech. The article addresses two questions: (1) how do visual cues from a speaker's face to emotion relate to auditory cues, and (2) what is the recognition speed for various facial cues to emotion? B
Barkhuysen, Pashiera; Krahmer, Emiel; Swerts, Marc
In this article we report on two experiments about the perception of audiovisual cues to emotional speech. The article addresses two questions: (1) how do visual cues from a speaker's face to emotion relate to auditory cues, and (2) what is the recognition speed for various facial cues to emotion? Both experiments reported below are based on tests…
Eskelund, Kasper; Andersen, Tobias
Speech perception is audiovisual as evidenced by the McGurk effect in which watching incongruent articulatory mouth movements can change the phonetic auditory speech percept. This type of audiovisual integration may be specific to speech or be applied to all stimuli in general. To investigate this...... audiovisual integration specific to speech perception. However, the results of Tuomainen et al. might have been influenced by another effect. When observers were naïve, they had little motivation to look at the face. When informed, they knew that the face was relevant for the task and this could increase...... noise were measured for naïve and informed participants. We found that the threshold for detecting speech in audiovisual stimuli was lower than for auditory-only stimuli. But there was no detection advantage for observers informed of the speech nature of the auditory signal. This may indicate that...
Gentilucci, Maurizio; Cattaneo, Luigi
Two experiments aimed to determine whether features of both the visual and acoustical inputs are always merged into the perceived representation of speech and whether this audiovisual integration is based on either cross-modal binding functions or on imitation. In a McGurk paradigm, observers were required to repeat aloud a string of phonemes uttered by an actor (acoustical presentation of phonemic string) whose mouth, in contrast, mimicked pronunciation of a different string (visual presentation). In a control experiment participants read the same printed strings of letters. This condition aimed to analyze the pattern of voice and the lip kinematics controlling for imitation. In the control experiment and in the congruent audiovisual presentation, i.e. when the articulation mouth gestures were congruent with the emission of the string of phones, the voice spectrum and the lip kinematics varied according to the pronounced strings of phonemes. In the McGurk paradigm the participants were unaware of the incongruence between visual and acoustical stimuli. The acoustical analysis of the participants' spoken responses showed three distinct patterns: the fusion of the two stimuli (the McGurk effect), repetition of the acoustically presented string of phonemes, and, less frequently, of the string of phonemes corresponding to the mouth gestures mimicked by the actor. However, the analysis of the latter two responses showed that the formant 2 of the participants' voice spectra always differed from the value recorded in the congruent audiovisual presentation. It approached the value of the formant 2 of the string of phonemes presented in the other modality, which was apparently ignored. The lip kinematics of the participants repeating the string of phonemes acoustically presented were influenced by the observation of the lip movements mimicked by the actor, but only when pronouncing a labial consonant. The data are discussed in favor of the hypothesis that features of both
Full Text Available To take a step towards real-life-like experimental setups, we simultaneously recorded magnetoencephalographic (MEG signals and subject’s gaze direction during audiovisual speech perception. The stimuli were utterances of /apa/ dubbed onto two side-by-side female faces articulating /apa/ (congruent and /aka/ (incongruent in synchrony, repeated once every 3 s. Subjects (N = 10 were free to decide which face they viewed, and responses were averaged to two categories according to the gaze direction. The right-hemisphere 100-ms response to the onset of the second vowel (N100m’ was a fifth smaller to incongruent than congruent stimuli. The results demonstrate the feasibility of realistic viewing conditions with gaze-based averaging of MEG signals.
Meronen, Auli; Tiippana, Kaisa; Westerholm, Jari; Ahonen, Timo
Purpose: The effect of the signal-to-noise ratio (SNR) on the perception of audiovisual speech in children with and without developmental language disorder (DLD) was investigated by varying the noise level and the sound intensity of acoustic speech. The main hypotheses were that the McGurk effect (in which incongruent visual speech alters the…
Buchan, Julie; Paré, Martin; Yurick, Micheal; Munhall, Kevin
In natural conversation, visual and auditory information about speech not only provide linguistic information but also provide information about the identity and the emotional state of the speaker. Thus, listeners must process a wide range of information in parallel to understand the full meaning in a message. In this series of studies, we examined how different types of visual information conveyed by a speaker's face are processed by measuring the gaze patterns exhibited by subjects watching audiovisual recordings of spoken sentences. In three experiments, subjects were asked to judge the emotion and the identity of the speaker, and to report the words that they heard under different auditory conditions. As in previous studies, eye and mouth regions dominated the distribution of the gaze fixations. It was hypothesized that the eyes would attract more fixations for more social judgment tasks, rather than tasks which rely more on verbal comprehension. Our results support this hypothesis. In addition, the location of gaze on the face did not influence the accuracy of the perception of speech in noise.
Full Text Available Seeing articulatory movements influences perception of auditory speech. This is often reflected in a shortened latency of auditory event-related potentials (ERPs generated in the auditory cortex. The present study addressed whether this early neural correlate of audiovisual interaction is modulated by attention. We recorded ERPs in 15 subjects while they were presented with auditory, visual and audiovisual spoken syllables. Audiovisual stimuli consisted of incongruent auditory and visual components known to elicit a McGurk effect, i.e. a visually driven alteration in the auditory speech percept. In a Dual task condition, participants were asked to identify spoken syllables whilst monitoring a rapid visual stream of pictures for targets, i.e., they had to divide their attention. In a Single task condition, participants identified the syllables without any other tasks, i.e., they were asked to ignore the pictures and focus their attention fully on the spoken syllables. The McGurk effect was weaker in the Dual task than in the Single task condition, indicating an effect of attentional load on audiovisual speech perception. Early auditory ERP components, N1 and P2, peaked earlier to audiovisual stimuli than to auditory stimuli when attention was fully focused on syllables, indicating neurophysiological audiovisual interaction. This latency decrement was reduced when attention was loaded, suggesting that attention influences early neural processing of audiovisual speech. We conclude that reduced attention weakens the interaction between vision and audition in speech.
Knowland, Victoria C. P.; Mercure, Evelyne; Karmiloff-Smith, Annette; Dick, Fred; Thomas, Michael S. C.
Being able to see a talking face confers a considerable advantage for speech perception in adulthood. However, behavioural data currently suggest that children fail to make full use of these available visual speech cues until age 8 or 9. This is particularly surprising given the potential utility of multiple informational cues during language…
Eskelund, Kasper; Andersen, Tobias
, observers might only have been motivated to look at the face when informed and audio and video thus seemed related. Since Tuomainen et al. did not control for this, the influence of motivation is unknown. The current experiment repeated the original methods while controlling eye movements. 4 observers...... observers did look near the mouth. We conclude that eye-movements did not influence the results of Tuomainen et al. and that their results thus can be taken as evidence of a speech specific mode of audiovisual integration underlying the McGurk illusion....
Altvater-Mackensen, Nicole; Mani, Nivedita; Grossmann, Tobias
Recent studies suggest that infants' audiovisual speech perception is influenced by articulatory experience (Mugitani et al., 2008; Yeung & Werker, 2013). The current study extends these findings by testing if infants' emerging ability to produce native sounds in babbling impacts their audiovisual speech perception. We tested 44 6-month-olds…
Venezia, Jonathan H; Thurman, Steven M; Matchin, William; George, Sahara E; Hickok, Gregory
Recent influential models of audiovisual speech perception suggest that visual speech aids perception by generating predictions about the identity of upcoming speech sounds. These models place stock in the assumption that visual speech leads auditory speech in time. However, it is unclear whether and to what extent temporally-leading visual speech information contributes to perception. Previous studies exploring audiovisual-speech timing have relied upon psychophysical procedures that require artificial manipulation of cross-modal alignment or stimulus duration. We introduce a classification procedure that tracks perceptually relevant visual speech information in time without requiring such manipulations. Participants were shown videos of a McGurk syllable (auditory /apa/ + visual /aka/ = perceptual /ata/) and asked to perform phoneme identification (/apa/ yes-no). The mouth region of the visual stimulus was overlaid with a dynamic transparency mask that obscured visual speech in some frames but not others randomly across trials. Variability in participants' responses (~35 % identification of /apa/ compared to ~5 % in the absence of the masker) served as the basis for classification analysis. The outcome was a high resolution spatiotemporal map of perceptually relevant visual features. We produced these maps for McGurk stimuli at different audiovisual temporal offsets (natural timing, 50-ms visual lead, and 100-ms visual lead). Briefly, temporally-leading (~130 ms) visual information did influence auditory perception. Moreover, several visual features influenced perception of a single speech sound, with the relative influence of each feature depending on both its temporal relation to the auditory signal and its informational content. PMID:26669309
Speech perception is facilitated by seeing the articulatory mouth movements of the talker. This is due to perceptual audiovisual integration, which also causes the McGurk−MacDonald illusion, and for which a comprehensive computational account is still lacking. Decades of research have largely...... focused on the fuzzy logical model of perception (FLMP), which provides excellent fits to experimental observations but also has been criticized for being too flexible, post hoc and difficult to interpret. The current study introduces the early maximum likelihood estimation (MLE) model of audiovisual...... integration to speech perception along with three model variations. In early MLE, integration is based on a continuous internal representation before categorization, which can make the model more parsimonious by imposing constraints that reflect experimental designs. The study also shows that cross...
Kim, Heejung; Hahm, Jarang; Lee, Hyekyoung; Kang, Eunjoo; Kang, Hyejin; Lee, Dong Soo
The human brain naturally integrates audiovisual information to improve speech perception. However, in noisy environments, understanding speech is difficult and may require much effort. Although the brain network is supposed to be engaged in speech perception, it is unclear how speech-related brain regions are connected during natural bimodal audiovisual or unimodal speech perception with counterpart irrelevant noise. To investigate the topological changes of speech-related brain networks at all possible thresholds, we used a persistent homological framework through hierarchical clustering, such as single linkage distance, to analyze the connected component of the functional network during speech perception using functional magnetic resonance imaging. For speech perception, bimodal (audio-visual speech cue) or unimodal speech cues with counterpart irrelevant noise (auditory white-noise or visual gum-chewing) were delivered to 15 subjects. In terms of positive relationship, similar connected components were observed in bimodal and unimodal speech conditions during filtration. However, during speech perception by congruent audiovisual stimuli, the tighter couplings of left anterior temporal gyrus-anterior insula component and right premotor-visual components were observed than auditory or visual speech cue conditions, respectively. Interestingly, visual speech is perceived under white noise by tight negative coupling in the left inferior frontal region-right anterior cingulate, left anterior insula, and bilateral visual regions, including right middle temporal gyrus, right fusiform components. In conclusion, the speech brain network is tightly positively or negatively connected, and can reflect efficient or effortful processes during natural audiovisual integration or lip-reading, respectively, in speech perception. PMID:25495216
Full Text Available Gender and age have been found to affect adults’ audio-visual (AV speech perception. However, research on adult aging focuses on adults over 60 years, who have an increasing likelihood for cognitive and sensory decline, which may confound positive effects of age-related AV-experience and its interaction with gender. Observed age and gender differences in AV speech perception may also depend on measurement sensitivity and AV task difficulty. Consequently both AV benefit and visual influence were used to measure visual contribution for gender-balanced groups of young (20-30 years and middle-aged adults (50-60 years with task difficulty varied using AV syllables from different talkers in alternative auditory backgrounds. Females had better speech-reading performance than males. Whereas no gender differences in AV benefit or visual influence were observed for young adults, visually influenced responses were significantly greater for middle-aged females than middle-aged males. That speech-reading performance did not influence AV benefit may be explained by visual speech extraction and AV integration constituting independent abilities. Contrastingly, the gender difference in visually influenced responses in middle adulthood may reflect an experience-related shift in females’ general AV perceptual strategy. Although young females’ speech-reading proficiency may not readily contribute to greater visual influence, between young and middle-adulthood recurrent confirmation of the contribution of visual cues induced by speech-reading proficiency may gradually shift females AV perceptual strategy towards more visually dominated responses.
Full Text Available Speech perception under audiovisual conditions is well known to confer benefits to perception such as increased speed and accuracy. Here, we investigated how audiovisual training might benefit or impede auditory perceptual learning speech degraded by vocoding. In Experiments 1 and 3, participants learned paired associations between vocoded spoken nonsense words and nonsense pictures in a protocol with a fixed number of trials. In Experiment 1, paired-associates (PA audiovisual (AV training of one group of participants was compared with audio-only (AO training of another group. When tested under AO conditions, the AV-trained group was significantly more accurate than the AO-trained group. In addition, pre- and post-training AO forced-choice consonant identification with untrained nonsense words showed that AV-trained participants had learned significantly more than AO participants. The pattern of results pointed to their having learned at the level of the auditory phonetic features of the vocoded stimuli. Experiment 2, a no-training control with testing and re-testing on the AO consonant identification, showed that the controls were as accurate as the AO-trained participants in Experiment 1 but less accurate than the AV-trained participants. In Experiment 3, PA training alternated AV and AO conditions on a list-by-list basis within participants, and training was to criterion (92% correct. PA training with AO stimuli was reliably more effective than training with AV stimuli. We explain these discrepant results in terms of the so-called "reverse hierarchy theory" of perceptual learning and in terms of the diverse multisensory and unisensory processing resources available to speech perception. We propose that early audiovisual speech integration can potentially impede auditory perceptual learning; but visual top-down access to relevant auditory features can promote auditory perceptual learning.
Full Text Available Previous studies have found that perception in older people benefits from multisensory over uni-sensory information. As normal speech recognition is affected by both the auditory input and the visual lip-movements of the speaker, we investigated the efficiency of audio and visual integration in an older population by manipulating the relative reliability of the auditory and visual information in speech. We also investigated the role of the semantic context of the sentence to assess whether audio-visual integration is affected by top-down semantic processing. We presented participants with audio-visual sentences in which the visual component was either blurred or not blurred. We found that there was a greater cost in recall performance for semantically meaningless speech in the audio-visual blur compared to audio-visual no blur condition and this effect was specific to the older group. Our findings have implications for understanding how aging affects efficient multisensory integration for the perception of speech and suggests that multisensory inputs may benefit speech perception in older adults when the semantic content of the speech is unpredictable.
Full Text Available Background and Aim: Neuroimaging techniques in audiovisual speech processing are innovative approach to neuroscience investigation that steadily influences the deep survey of highly mechanisms involved in this process. The purpose of this study was to evaluate brain activity via functional magnetic resonance imaging throughout audiovisual speech perception in Persian language.Methods: Functional MRI was used to assess 19 normal 20-30 year old women while they had been presented syllable /ka/ visually and /pa/ auditory using block design method, in which it would provide two series of imaging, functional and T1-weighted. Subsequently, the results were analyzed and compared by FSL software.Results: The results of this study pointed out that both middle and cortical regions of brain are activated in visual stimuli and its middle regions are activated in response to auditory stimuli. Hence, left anterior supramarginal, some parts of motor speech system including insular and cingulate cortex-precentral cortex were stimulated with visual stimulus and left posterior supramarginal as well as right supramarginal gyrus were stimulated with auditory stimulus. Moreover, in this investigation, McGurk effect was behaviorally proven in fifteen subjects.Conclusion: It was hypothesized that the activation of unique region, supramarginal gyrus, with both audio and visual stimuli indicated the presence of commonplace region for phonologic processing of sensory inputs. In addition, auditory stimuli develop more intense activity; and on the other hand, broaden-maximum voxel-as well as extra regions are demonstrated in response to visual stimuli. These points represent the unfamiliarity of normal individual brain to percept visual speech stimuli.
Full Text Available (1 To evaluate the recognition of words, phonemes and lexical tones in audiovisual (AV and auditory-only (AO modes in Mandarin-speaking adults with cochlear implants (CIs; (2 to understand the effect of presentation levels on AV speech perception; (3 to learn the effect of hearing experience on AV speech perception.Thirteen deaf adults (age = 29.1±13.5 years; 8 male, 5 female who had used CIs for >6 months and 10 normal-hearing (NH adults participated in this study. Seven of them were prelingually deaf, and 6 postlingually deaf. The Mandarin Monosyllablic Word Recognition Test was used to assess recognition of words, phonemes and lexical tones in AV and AO conditions at 3 presentation levels: speech detection threshold (SDT, speech recognition threshold (SRT and 10 dB SL (re:SRT.The prelingual group had better phoneme recognition in the AV mode than in the AO mode at SDT and SRT (both p = 0.016, and so did the NH group at SDT (p = 0.004. Mode difference was not noted in the postlingual group. None of the groups had significantly different tone recognition in the 2 modes. The prelingual and postlingual groups had significantly better phoneme and tone recognition than the NH one at SDT in the AO mode (p = 0.016 and p = 0.002 for phonemes; p = 0.001 and p<0.001 for tones but were outperformed by the NH group at 10 dB SL (re:SRT in both modes (both p<0.001 for phonemes; p<0.001 and p = 0.002 for tones. The recognition scores had a significant correlation with group with age and sex controlled (p<0.001.Visual input may help prelingually deaf implantees to recognize phonemes but may not augment Mandarin tone recognition. The effect of presentation level seems minimal on CI users' AV perception. This indicates special considerations in developing audiological assessment protocols and rehabilitation strategies for implantees who speak tonal languages.
Williams, Joshua T; Darcy, Isabelle; Newman, Sharlene D
The aim of the present study was to characterize effects of learning a sign language on the processing of a spoken language. Specifically, audiovisual phoneme comprehension was assessed before and after 13 weeks of sign language exposure. L2 ASL learners performed this task in the fMRI scanner. Results indicated that L2 American Sign Language (ASL) learners' behavioral classification of the speech sounds improved with time compared to hearing nonsigners. Results indicated increased activation in the supramarginal gyrus (SMG) after sign language exposure, which suggests concomitant increased phonological processing of speech. A multiple regression analysis indicated that learner's rating on co-sign speech use and lipreading ability was correlated with SMG activation. This pattern of results indicates that the increased use of mouthing and possibly lipreading during sign language acquisition may concurrently improve audiovisual speech processing in budding hearing bimodal bilinguals. PMID:26740404
Stevenson, Ryan A.; Siemann, Justin K.; Woynaroski, Tiffany G.; Schneider, Brittany C.; Eberly, Haley E.; Camarata, Stephen M.; Wallace, Mark T.
Atypical communicative abilities are a core marker of Autism Spectrum Disorders (ASD). A number of studies have shown that, in addition to auditory comprehension differences, individuals with autism frequently show atypical responses to audiovisual speech, suggesting a multisensory contribution to these communicative differences from their…
Thomas, Sharon M.; Jordan, Timothy R.
Seeing a talker's face influences auditory speech recognition, but the visible input essential for this influence has yet to be established. Using a new seamless editing technique, the authors examined effects of restricting visible movement to oral or extraoral areas of a talking face. In Experiment 1, visual speech identification and visual…
visual lip features is used. Phoneme-related receptive fields result on the SOM basis; they are speaker dependent and show individual locations and strain. Overlapping main slopes indicate a high similarity of respective units; distortion or extra peaks originate from the influence of other units....... Dependent on the training data, these other units may also be contextually immediate neighboring units. The poster demonstrates the idea with text material spoken by one individual subject using a set of simple audio-visual features. The data material for the training process consists of 44 labeled...
Van der Burg, Erik; Goodbourn, Patrick T.
The brain is adaptive. The speed of propagation through air, and of low-level sensory processing, differs markedly between auditory and visual stimuli; yet the brain can adapt to compensate for the resulting cross-modal delays. Studies investigating temporal recalibration to audiovisual speech have used prolonged adaptation procedures, suggesting that adaptation is sluggish. Here, we show that adaptation to asynchronous audiovisual speech occurs rapidly. Participants viewed a brief clip of an...
Megnin-Viggars, Odette; Goswami, Usha
Visual speech inputs can enhance auditory speech information, particularly in noisy or degraded conditions. The natural statistics of audiovisual speech highlight the temporal correspondence between visual and auditory prosody, with lip, jaw, cheek and head movements conveying information about the speech envelope. Low-frequency spatial and…
Martin, Jean-Remy; Kösem, Anne; van Wassenhove, Virginie
The effect of stimulation history on the perception of a current event can yield two opposite effects, namely: adaptation or hysteresis. The perception of the current event thus goes in the opposite or in the same direction as prior stimulation, respectively. In audiovisual (AV) synchrony perception, adaptation effects have primarily been reported. Here, we tested if perceptual hysteresis could also be observed over adaptation in AV timing perception by varying different experimental conditio...
Klitsch, Julia Ulrike
This dissertation investigates speech perception in three different groups of native adult speakers of Dutch; an aphasic and two age-varying control groups. By means of two different experiments it is examined if the availability of visual articulatory information is beneficial to the auditory speec
Traditionally, second language (L2) instruction has emphasised auditory-based instruction methods. However, this approach is restrictive in the sense that speech perception by humans is not just an auditory phenomenon but a multimodal one, and specifically, a visual one as well. In the past decade, experimental studies have shown that the…
D'Souza, Dean; D'Souza, Hana; Johnson, Mark H; Karmiloff-Smith, Annette
Typically-developing (TD) infants can construct unified cross-modal percepts, such as a speaking face, by integrating auditory-visual (AV) information. This skill is a key building block upon which higher-level skills, such as word learning, are built. Because word learning is seriously delayed in most children with neurodevelopmental disorders, we assessed the hypothesis that this delay partly results from a deficit in integrating AV speech cues. AV speech integration has rarely been investigated in neurodevelopmental disorders, and never previously in infants. We probed for the McGurk effect, which occurs when the auditory component of one sound (/ba/) is paired with the visual component of another sound (/ga/), leading to the perception of an illusory third sound (/da/ or /tha/). We measured AV integration in 95 infants/toddlers with Down, fragile X, or Williams syndrome, whom we matched on Chronological and Mental Age to 25 TD infants. We also assessed a more basic AV perceptual ability: sensitivity to matching vs. mismatching AV speech stimuli. Infants with Williams syndrome failed to demonstrate a McGurk effect, indicating poor AV speech integration. Moreover, while the TD children discriminated between matching and mismatching AV stimuli, none of the other groups did, hinting at a basic deficit or delay in AV speech processing, which is likely to constrain subsequent language development. PMID:27498221
Van der Burg, Erik; Goodbourn, Patrick T
The brain is adaptive. The speed of propagation through air, and of low-level sensory processing, differs markedly between auditory and visual stimuli; yet the brain can adapt to compensate for the resulting cross-modal delays. Studies investigating temporal recalibration to audiovisual speech have used prolonged adaptation procedures, suggesting that adaptation is sluggish. Here, we show that adaptation to asynchronous audiovisual speech occurs rapidly. Participants viewed a brief clip of an actor pronouncing a single syllable. The voice was either advanced or delayed relative to the corresponding lip movements, and participants were asked to make a synchrony judgement. Although we did not use an explicit adaptation procedure, we demonstrate rapid recalibration based on a single audiovisual event. We find that the point of subjective simultaneity on each trial is highly contingent upon the modality order of the preceding trial. We find compelling evidence that rapid recalibration generalizes across different stimuli, and different actors. Finally, we demonstrate that rapid recalibration occurs even when auditory and visual events clearly belong to different actors. These results suggest that rapid temporal recalibration to audiovisual speech is primarily mediated by basic temporal factors, rather than higher-order factors such as perceived simultaneity and source identity. PMID:25716790
Alsius, Agnès; Navarra, Jordi; Campbell, Ruth; Soto-Faraco, Salvador
One of the most commonly cited examples of human multisensory integration occurs during exposure to natural speech, when the vocal and the visual aspects of the signal are integrated in a unitary percept. Audiovisual association of facial gestures and vocal sounds has been demonstrated in nonhuman primates and in prelinguistic children, arguing for a general basis for this capacity. One critical question, however, concerns the role of attention in such multisensory integration. Although both behavioral and neurophysiological studies have converged on a preattentive conceptualization of audiovisual speech integration, this mechanism has rarely been measured under conditions of high attentional load, when the observers' attention resources are depleted. We tested the extent to which audiovisual integration was modulated by the amount of available attentional resources by measuring the observers' susceptibility to the classic McGurk illusion in a dual-task paradigm. The proportion of visually influenced responses was severely, and selectively, reduced if participants were concurrently performing an unrelated visual or auditory task. In contrast with the assumption that crossmodal speech integration is automatic, our results suggest that these multisensory binding processes are subject to attentional demands. PMID:15886102
Tobias Søren Andersen
Full Text Available Lesions to Broca’s area cause aphasia characterised by a severe impairment of the ability to speak, with comparatively intact speech perception. However, some studies have found effects on speech perception under adverse listening conditions, indicating that Broca’s area is also involved in speech perception. While these studies have focused on auditory speech perception other studies have shown that Broca’s area is activated by visual speech perception. Furthermore, one preliminary report found that a patient with Broca’s aphasia did not experience the McGurk illusion suggesting that an intact Broca’s area is necessary for audiovisual integration of speech. Here we describe a patient with Broca’s aphasia who experienced the McGurk illusion. This indicates that an intact Broca’s area is not necessary for audiovisual integration of speech. The McGurk illusions this patient experienced were atypical, which could be due to Broca’s area having a more subtle role in audiovisual integration of speech. The McGurk illusions of a control subject with Wernicke’s aphasia were, however, also atypical. This indicates that the atypical McGurk illusions were due to deficits in speech processing that are not specific to Broca’s aphasia.
Hillock-Dunn, Andrea; Grantham, D Wesley; Wallace, Mark T
During a typical communication exchange, both auditory and visual cues contribute to speech comprehension. The influence of vision on speech perception can be measured behaviorally using a task where incongruent auditory and visual speech stimuli are paired to induce perception of a novel token reflective of multisensory integration (i.e., the McGurk effect). This effect is temporally constrained in adults, with illusion perception decreasing as the temporal offset between the auditory and visual stimuli increases. Here, we used the McGurk effect to investigate the development of the temporal characteristics of audiovisual speech binding in 7-24 year-olds. Surprisingly, results indicated that although older participants perceived the McGurk illusion more frequently, no age-dependent change in the temporal boundaries of audiovisual speech binding was observed. PMID:26920938
Lee, Hweeling; Noppeney, Uta
This psychophysics study used musicians as a model to investigate whether musical expertise shapes the temporal integration window for audiovisual speech, sinewave speech, or music. Musicians and non-musicians judged the audiovisual synchrony of speech, sinewave analogs of speech, and music stimuli at 13 audiovisual stimulus onset asynchronies (±360, ±300 ±240, ±180, ±120, ±60, and 0 ms). Further, we manipulated the duration of the stimuli by presenting sentences/melodies or syllables/tones. ...
D'Ausilio, Alessandro; Bartoli, Eleonora; Maffongelli, Laura; Berry, Jeffrey James; Fadiga, Luciano
Audiovisual speech perception is likely based on the association between auditory and visual information into stable audiovisual maps. Conflicting audiovisual inputs generate perceptual illusions such as the McGurk effect. Audiovisual mismatch effects could be either driven by the detection of violations in the standard audiovisual statistics or via the sensorimotor reconstruction of the distal articulatory event that generated the audiovisual ambiguity. In order to disambiguate between the two hypotheses we exploit the fact that the tongue is hidden to vision. For this reason, tongue movement encoding can solely be learned via speech production but not via others׳ speech perception alone. Here we asked participants to identify speech sounds while matching or mismatching visual representations of tongue movements which were shown. Vision of congruent tongue movements facilitated auditory speech identification with respect to incongruent trials. This result suggests that direct visual experience of an articulator movement is not necessary for the generation of audiovisual mismatch effects. Furthermore, we suggest that audiovisual integration in speech may benefit from speech production learning. PMID:25172391
Eskelund, Kasper; Tuomainen, Jyrki; Andersen, Tobias
signal. Here we show that identification of phonetic content and detection can be dissociated as speech-specific and non-specific audiovisual integration effects. To this end, we employed synthetically modified stimuli, sine wave speech (SWS), which is an impoverished speech signal that only observers...
Crosse, Michael J; Lalor, Edmund C
Visual speech can greatly enhance a listener's comprehension of auditory speech when they are presented simultaneously. Efforts to determine the neural underpinnings of this phenomenon have been hampered by the limited temporal resolution of hemodynamic imaging and the fact that EEG and magnetoencephalographic data are usually analyzed in response to simple, discrete stimuli. Recent research has shown that neuronal activity in human auditory cortex tracks the envelope of natural speech. Here, we exploit this finding by estimating a linear forward-mapping between the speech envelope and EEG data and show that the latency at which the envelope of natural speech is represented in cortex is shortened by >10 ms when continuous audiovisual speech is presented compared with audio-only speech. In addition, we use a reverse-mapping approach to reconstruct an estimate of the speech stimulus from the EEG data and, by comparing the bimodal estimate with the sum of the unimodal estimates, find no evidence of any nonlinear additive effects in the audiovisual speech condition. These findings point to an underlying mechanism that could account for enhanced comprehension during audiovisual speech. Specifically, we hypothesize that low-level acoustic features that are temporally coherent with the preceding visual stream may be synthesized into a speech object at an earlier latency, which may provide an extended period of low-level processing before extraction of semantic information. PMID:24401714
Woynaroski, Tiffany G.; Kwakye, Leslie D.; Foss-Feig, Jennifer H.; Stevenson, Ryan A.; Stone, Wendy L.; Wallace, Mark T.
This study examined unisensory and multisensory speech perception in 8-17 year old children with autism spectrum disorders (ASD) and typically developing controls matched on chronological age, sex, and IQ. Consonant-vowel syllables were presented in visual only, auditory only, matched audiovisual, and mismatched audiovisual ("McGurk")…
Jeanne A Guiraud
Full Text Available The language difficulties often seen in individuals with autism might stem from an inability to integrate audiovisual information, a skill important for language development. We investigated whether 9-month-old siblings of older children with autism, who are at an increased risk of developing autism, are able to integrate audiovisual speech cues. We used an eye-tracker to record where infants looked when shown a screen displaying two faces of the same model, where one face is articulating/ba/and the other/ga/, with one face congruent with the syllable sound being presented simultaneously, the other face incongruent. This method was successful in showing that infants at low risk can integrate audiovisual speech: they looked for the same amount of time at the mouths in both the fusible visual/ga/- audio/ba/and the congruent visual/ba/- audio/ba/displays, indicating that the auditory and visual streams fuse into a McGurk-type of syllabic percept in the incongruent condition. It also showed that low-risk infants could perceive a mismatch between auditory and visual cues: they looked longer at the mouth in the mismatched, non-fusible visual/ba/- audio/ga/display compared with the congruent visual/ga/- audio/ga/display, demonstrating that they perceive an uncommon, and therefore interesting, speech-like percept when looking at the incongruent mouth (repeated ANOVA: displays x fusion/mismatch conditions interaction: F(1,16 = 17.153, p = 0.001. The looking behaviour of high-risk infants did not differ according to the type of display, suggesting difficulties in matching auditory and visual information (repeated ANOVA, displays x conditions interaction: F(1,25 = 0.09, p = 0.767, in contrast to low-risk infants (repeated ANOVA: displays x conditions x low/high-risk groups interaction: F(1,41 = 4.466, p = 0.041. In some cases this reduced ability might lead to the poor communication skills characteristic of autism.
Holt, Lori L.; Lotto, Andrew J.
Speech perception (SP) most commonly refers to the perceptual mapping from the highly variable acoustic speech signal to a linguistic representation, whether it be phonemes, diphones, syllables, or words. This is an example of categorization, in that potentially discriminable speech sounds are assigned to functionally equivalent classes. In this tutorial, we present some of the main challenges to our understanding of the categorization of speech sounds and the conceptualization of SP that has...
Mroueh, Youssef; Marcheret, Etienne; Goel, Vaibhava
In this paper, we present methods in deep multimodal learning for fusing speech and visual modalities for Audio-Visual Automatic Speech Recognition (AV-ASR). First, we study an approach where uni-modal deep networks are trained separately and their final hidden layers fused to obtain a joint feature space in which another deep network is built. While the audio network alone achieves a phone error rate (PER) of $41\\%$ under clean condition on the IBM large vocabulary audio-visual studio datase...
Full Text Available Audiovisual text-to-speech systems convert a written text into an audiovisual speech signal. Typically, the visual mode of the synthetic speech is synthesized separately from the audio, the latter being either natural or synthesized speech. However, the perception of mismatches between these two information streams requires experimental exploration since it could degrade the quality of the output. In order to increase the intermodal coherence in synthetic 2D photorealistic speech, we extended the well-known unit selection audio synthesis technique to work with multimodal segments containing original combinations of audio and video. Subjective experiments confirm that the audiovisual signals created by our multimodal synthesis strategy are indeed perceived as being more synchronous than those of systems in which both modes are not intrinsically coherent. Furthermore, it is shown that the degree of coherence between the auditory mode and the visual mode has an influence on the perceived quality of the synthetic visual speech fragment. In addition, the audio quality was found to have only a minor influence on the perceived visual signal's quality.
Lebib, Riadh; Papo, David; Douiri, Abdel; de Bode, Stella; Gillon Dowens, Margaret; Baudonnière, Pierre-Marie
Lipreading reliably improve speech perception during face-to-face conversation. Within the range of good dubbing, however, adults tolerate some audiovisual (AV) discrepancies and lipreading, then, can give rise to confusion. We used event-related brain potentials (ERPs) to study the perceptual strategies governing the intermodal processing of dynamic and bimodal speech stimuli, either congruently dubbed or not. Electrophysiological analyses revealed that non-coherent audiovisual dubbings modulated in amplitude an endogenous ERP component, the N300, we compared to a 'N400-like effect' reflecting the difficulty to integrate these conflicting pieces of information. This result adds further support for the existence of a cerebral system underlying 'integrative processes' lato sensu. Further studies should take advantage of this 'N400-like effect' with AV speech stimuli to open new perspectives in the domain of psycholinguistics. PMID:15531091
Full Text Available An increasing number of neuroscience papers capitalize on the assumption published in this journal that visual speech would be typically 150 ms ahead of auditory speech. It happens that the estimation of audiovisual asynchrony in the reference paper is valid only in very specific cases, for isolated consonant-vowel syllables or at the beginning of a speech utterance, in what we call "preparatory gestures". However, when syllables are chained in sequences, as they are typically in most parts of a natural speech utterance, asynchrony should be defined in a different way. This is what we call "comodulatory gestures" providing auditory and visual events more or less in synchrony. We provide audiovisual data on sequences of plosive-vowel syllables (pa, ta, ka, ba, da, ga, ma, na showing that audiovisual synchrony is actually rather precise, varying between 20 ms audio lead and 70 ms audio lag. We show how more complex speech material should result in a range typically varying between 40 ms audio lead and 200 ms audio lag, and we discuss how this natural coordination is reflected in the so-called temporal integration window for audiovisual speech perception. Finally we present a toy model of auditory and audiovisual predictive coding, showing that visual lead is actually not necessary for visual prediction.
Full Text Available Humans rely on multiple sensory modalities to determine the emotional state of others. In fact, such multisensory perception may be one of the mechanisms explaining the ease and efficiency by which others’ emotions are recognized. But how and when exactly do the different modalities interact? One aspect in multisensory perception that has received increasing interest in recent years is the concept of crossmodal prediction. In emotion perception, as in most other settings, visual information precedes the auditory one. Thereby, leading in visual information can facilitate subsequent auditory processing. While this mechanism has often been described in audiovisual speech perception, it has not been addressed so far in audiovisual emotion perception. Based on the current state of the art in (a crossmodal prediction and (b multisensory emotion perception research, we propose that it is essential to consider the former in order to fully understand the latter. Focusing on electroencephalographic (EEG and magnetoencephalographic (MEG studies, we provide a brief overview of the current research in both fields. In discussing these findings, we suggest that emotional visual information may allow for a more reliable prediction of auditory information compared to non-emotional visual information. In support of this hypothesis, we present a re-analysis of a previous data set that shows an inverse correlation between the N1 response in the EEG and the duration of visual emotional but not non-emotional information. If the assumption that emotional content allows for more reliable predictions can be corroborated in future studies, crossmodal prediction is a crucial factor in our understanding of multisensory emotion perception.
Hwee Ling eLee
Full Text Available This psychophysics study used musicians as a model to investigate whether musical expertise shapes the temporal integration window for audiovisual speech, sinewave speech or music. Musicians and non-musicians judged the audiovisual synchrony of speech, sinewave analogues of speech, and music stimuli at 13 audiovisual stimulus onset asynchronies (±360, ±300 ±240, ±180, ±120, ±60, and 0 ms. Further, we manipulated the duration of the stimuli by presenting sentences/melodies or syllables/tones. Critically, musicians relative to non-musicians exhibited significantly narrower temporal integration windows for both music and sinewave speech. Further, the temporal integration window for music decreased with the amount of music practice, but not with age of acquisition. In other words, the more musicians practiced piano in the past three years, the more sensitive they became to the temporal misalignment of visual and auditory signals. Collectively, our findings demonstrate that music practicing fine-tunes the audiovisual temporal integration window to various extents depending on the stimulus class. While the effect of piano practicing was most pronounced for music, it also generalized to other stimulus classes such as sinewave speech and to a marginally significant degree to natural speech.
The general aim of this thesis was to test the effects of paralinguistic (emotional) and prior contextual (topical) cues on perception of poorly specified visual, auditory, and audiovisual speech. The specific purposes were to (1) examine if facially displayed emotions can facilitate speechreading performance; (2) to study the mechanism for such facilitation; (3) to map information-processing factors that are involved in processing of poorly specified speech; and (4) to present a comprehensiv...
We present a modular framework for articulatory animation synthesis using speech motion capture data obtained with electromagnetic articulography (EMA). Adapting a skeletal animation approach, the articulatory motion data is applied to a three-dimensional (3D) model of the vocal tract, creating a portable resource that can be integrated in an audiovisual (AV) speech synthesis platform to provide realistic animation of the tongue and teeth for a virtual character. The framework also provides an interface to articulatory animation synthesis, as well as an example application to illustrate its use with a 3D game engine. We rely on cross-platform, open-source software and open standards to provide a lightweight, accessible, and portable workflow.
Full Text Available Events encoded in separate sensory modalities, such as audition and vision, can seem to be synchronous across a relatively broad range of physical timing differences. This may suggest that the precision of audio-visual timing judgments is inherently poor. Here we show that this is not necessarily true. We contrast timing sensitivity for isolated streams of audio and visual speech, and for streams of audio and visual speech accompanied by additional, temporally offset, visual speech streams. We find that the precision with which synchronous streams of audio and visual speech are identified is enhanced by the presence of additional streams of asynchronous visual speech. Our data suggest that timing perception is shaped by selective grouping processes, which can result in enhanced precision in temporally cluttered environments. The imprecision suggested by previous studies might therefore be a consequence of examining isolated pairs of audio and visual events. We argue that when an isolated pair of cross-modal events is presented, they tend to group perceptually and to seem synchronous as a consequence. We have revealed greater precision by providing multiple visual signals, possibly allowing a single auditory speech stream to group selectively with the most synchronous visual candidate. The grouping processes we have identified might be important in daily life, such as when we attempt to follow a conversation in a crowded room.
Evitts, Paul M.; Portugal, Lindsay; Van Dine, Ami; Holler, Aline
Background: There is minimal research on the contribution of visual information on speech intelligibility for individuals with a laryngectomy (IWL). Aims: The purpose of this project was to determine the effects of mode of presentation (audio-only, audio-visual) on alaryngeal speech intelligibility. Method: Twenty-three naive listeners were…
Kubicek, Claudia; Hillairet de Boisferon, Anne; Dupierrix, Eve; Pascalis, Olivier; Lœvenbruck, Hélène; Gervain, Judit; Schwarzer, Gudrun
The present study examined when and how the ability to cross-modally match audio-visual fluent speech develops in 4.5-, 6- and 12-month-old German-learning infants. In Experiment 1, 4.5- and 6-month-old infants' audio-visual matching ability of native (German) and non-native (French) fluent speech was assessed by presenting auditory and visual speech information sequentially, that is, in the absence of temporal synchrony cues. The results showed that 4.5-month-old infants were capable of matc...
Choudhury, N.; Amer, I; Daniels, M; Wareing, MJ
Introduction Aural microsuction is a common ear, nose and throat procedure used in the outpatient setting. Some patients, however, find it difficult to tolerate owing to discomfort, pain or noise. This study evaluated the effect of audiovisual distraction on patients’ pain perception and overall satisfaction. Methods A prospective study was conducted for patients attending our aural care clinic requiring aural toileting of bilateral mastoid cavities over a three-month period. All microsuction...
Andersen, Tobias; Starrfelt, Randi
perception. While these studies have focused on auditory speech perception other studies have shown that Broca's area is activated by visual speech perception. Furthermore, one preliminary report found that a patient with Broca's aphasia did not experience the McGurk illusion suggesting that an intact Broca......Lesions to Broca's area cause aphasia characterized by a severe impairment of the ability to speak, with comparatively intact speech perception. However, some studies have found effects on speech perception under adverse listening conditions, indicating that Broca's area is also involved in speech...
Vouloumanos, Athena; Gelfand, Hanna M.
The ability to decode atypical and degraded speech signals as intelligible is a hallmark of speech perception. Human adults can perceive sounds as speech even when they are generated by a variety of nonhuman sources including computers and parrots. We examined how infants perceive the speech-like vocalizations of a parrot. Further, we examined how…
Maganti, Hari Krishna; Gatica-Perez, Daniel; McCowan, Iain A.
We address the problem of distant speech acquisition in multi-party meetings, using multiple microphones and cameras. Microphone array beamforming techniques present a potential alternative to close-talking microphones by providing speech enhancement through spatial filtering and directional discrimination. Beamforming techniques rely on the knowledge of a speaker location. In this paper, we present an integrated approach, in which an audio-visual multi-person tracker is used to track active ...
Adank, Patti; Nuttall, Helen E.; Banks, Briony; Kennedy-Higgins, Daniel
The recognition of unfamiliar regional and foreign accents represents a challenging task for the speech perception system (Floccia et al., 2006; Adank et al., 2009). Despite the frequency with which we encounter such accents, the neural mechanisms supporting successful perception of accented speech are poorly understood. Nonetheless, candidate neural substrates involved in processing speech in challenging listening conditions, including accented speech, are beginning to be identified. This re...
Attigodu, Ganesh; Berthommier, Frédéric; Nahorna, Olha; Schwartz, Jean-Luc
In a previous set of experiments we showed that audio-visual fusion during the McGurk effect may be modulated by context. A short context (2 to 4 syllables) composed of incoherent auditory and visual material significantly decreases the McGurk effect. We interpreted this as showing the existence of an audiovisual "binding" stage controlling the fusion process, and we also showed the existence of a "rebinding" process when an incoherent material is followed by a short coherent material. In thi...
Lidestam, Björn; Moradi, Shahram; Pettersson, Rasmus; Ricklefs, Theodor
The effects of audiovisual versus auditory training for speech-in-noise identification were examined in 60 young participants. The training conditions were audiovisual training, auditory-only training, and no training (n = 20 each). In the training groups, gated consonants and words were presented at 0 dB signal-to-noise ratio; stimuli were either audiovisual or auditory-only. The no-training group watched a movie clip without performing a speech identification task. Speech-in-noise identific...
Aleksic Petar S
Full Text Available We describe an audio-visual automatic continuous speech recognition system, which significantly improves speech recognition performance over a wide range of acoustic noise levels, as well as under clean audio conditions. The system utilizes facial animation parameters (FAPs supported by the MPEG-4 standard for the visual representation of speech. We also describe a robust and automatic algorithm we have developed to extract FAPs from visual data, which does not require hand labeling or extensive training procedures. The principal component analysis (PCA was performed on the FAPs in order to decrease the dimensionality of the visual feature vectors, and the derived projection weights were used as visual features in the audio-visual automatic speech recognition (ASR experiments. Both single-stream and multistream hidden Markov models (HMMs were used to model the ASR system, integrate audio and visual information, and perform a relatively large vocabulary (approximately 1000 words speech recognition experiments. The experiments performed use clean audio data and audio data corrupted by stationary white Gaussian noise at various SNRs. The proposed system reduces the word error rate (WER by 20% to 23% relatively to audio-only speech recognition WERs, at various SNRs (0–30 dB with additive white Gaussian noise, and by 19% relatively to audio-only speech recognition WER under clean audio conditions.
Bruderer, Alison G; Danielson, D Kyle; Kandhadai, Padmapriya; Werker, Janet F
The influence of speech production on speech perception is well established in adults. However, because adults have a long history of both perceiving and producing speech, the extent to which the perception-production linkage is due to experience is unknown. We addressed this issue by asking whether articulatory configurations can influence infants' speech perception performance. To eliminate influences from specific linguistic experience, we studied preverbal, 6-mo-old infants and tested the discrimination of a nonnative, and hence never-before-experienced, speech sound distinction. In three experimental studies, we used teething toys to control the position and movement of the tongue tip while the infants listened to the speech sounds. Using ultrasound imaging technology, we verified that the teething toys consistently and effectively constrained the movement and positioning of infants' tongues. With a looking-time procedure, we found that temporarily restraining infants' articulators impeded their discrimination of a nonnative consonant contrast but only when the relevant articulator was selectively restrained to prevent the movements associated with producing those sounds. Our results provide striking evidence that even before infants speak their first words and without specific listening experience, sensorimotor information from the articulators influences speech perception. These results transform theories of speech perception by suggesting that even at the initial stages of development, oral-motor movements influence speech sound discrimination. Moreover, an experimentally induced "impairment" in articulator movement can compromise speech perception performance, raising the question of whether long-term oral-motor impairments may impact perceptual development. PMID:26460030
Lim, Jung-Hui; Oh, Do-Kwan; Lee, Soo-Young
In noisy environment the human speech perception utilizes visual lip-reading as well as audio phonetic classification. This audio-visual integration may be done by combining the two sensory features at the early stage. Also, the top-down attention may integrate the two modalities. For the sensory feature fusion we introduce mapping functions between the audio and visual manifolds. Especially, we present an algorithm to provide one-to-many mapping function for the videoto- audio mapping. The top-down attention is also presented to integrate both the sensory features and classification results of both modalities, which is able to explain McGurk effect. Each classifier is separately implemented by the Hidden-Markov Model (HMM), but the two classifiers are combined at the top level and interact by the top-down attention.
Tong Ming; Bian Zhengzhong; Li Xiaohui; Dai Qijun; Chen Yanpu
The perceptual effect of the phase information in speech has been studied by auditorysubjective tests. On the condition that the phase spectrum in speech is changed while amplitudespectrum is unchanged, the tests show that: (1) If the envelop of the reconstructed speech signalis unchanged, there is indistinctive auditory perception between the original speech and thereconstructed speech; (2) The auditory perception effect of the reconstructed speech mainly lieson the amplitude of the derivative of the additive phase; (3) td is the maximum relative time shiftbetween different frequency components of the reconstructed speech signal. The speech qualityis excellent while td ＜10ms; good while 10ms＜ td ＜20ms; common while 20ms＜ td ＜35ms, andpoor while td ＞35ms.
Ross, Lars A; Molholm, Sophie; Blanco, Daniella; Gomez-Ramirez, Manuel; Saint-Amour, Dave; Foxe, John J
Observing a speaker's articulations substantially improves the intelligibility of spoken speech, especially under noisy listening conditions. This multisensory integration of speech inputs is crucial to effective communication. Appropriate development of this ability has major implications for children in classroom and social settings, and deficits in it have been linked to a number of neurodevelopmental disorders, especially autism. It is clear from structural imaging studies that there is a prolonged maturational course within regions of the perisylvian cortex that persists into late childhood, and these regions have been firmly established as being crucial to speech and language functions. Given this protracted maturational timeframe, we reasoned that multisensory speech processing might well show a similarly protracted developmental course. Previous work in adults has shown that audiovisual enhancement in word recognition is most apparent within a restricted range of signal-to-noise ratios (SNRs). Here, we investigated when these properties emerge during childhood by testing multisensory speech recognition abilities in typically developing children aged between 5 and 14 years, and comparing them with those of adults. By parametrically varying SNRs, we found that children benefited significantly less from observing visual articulations, displaying considerably less audiovisual enhancement. The findings suggest that improvement in the ability to recognize speech-in-noise and in audiovisual integration during speech perception continues quite late into the childhood years. The implication is that a considerable amount of multisensory learning remains to be achieved during the later schooling years, and that explicit efforts to accommodate this learning may well be warranted. PMID:21615556
Full Text Available OBJECTIVE: To analyze speech reading through Internet video calls by profoundly hearing-impaired individuals and cochlear implant (CI users. METHODS: Speech reading skills of 14 deaf adults and 21 CI users were assessed using the Hochmair Schulz Moser (HSM sentence test. We presented video simulations using different video resolutions (1280 × 720, 640 × 480, 320 × 240, 160 × 120 px, frame rates (30, 20, 10, 7, 5 frames per second (fps, speech velocities (three different speakers, webcameras (Logitech Pro9000, C600 and C500 and image/sound delays (0-500 ms. All video simulations were presented with and without sound and in two screen sizes. Additionally, scores for live Skype™ video connection and live face-to-face communication were assessed. RESULTS: Higher frame rate (>7 fps, higher camera resolution (>640 × 480 px and shorter picture/sound delay (<100 ms were associated with increased speech perception scores. Scores were strongly dependent on the speaker but were not influenced by physical properties of the camera optics or the full screen mode. There is a significant median gain of +8.5%pts (p = 0.009 in speech perception for all 21 CI-users if visual cues are additionally shown. CI users with poor open set speech perception scores (n = 11 showed the greatest benefit under combined audio-visual presentation (median speech perception +11.8%pts, p = 0.032. CONCLUSION: Webcameras have the potential to improve telecommunication of hearing-impaired individuals.
Jesse, A.; Janse, E.
Older listeners are more affected than younger listeners in their recognition of speech in adverse conditions, such as when they also hear a single-competing speaker. In the present study, we investigated with a speeded response task whether older listeners with various degrees of hearing loss benefit under such conditions from also seeing the speaker they intend to listen to. We also tested, at the same time, whether older adults need postperceptual processing to obtain an audiovisual benefi...
Full Text Available The paper deals with analytical review, covering the latest achievements in the field of audio-visual (AV fusion (integration of multimodal information. We discuss the main challenges and report on approaches to address them. One of the most important tasks of the AV integration is to understand how the modalities interact and influence each other. The paper addresses this problem in the context of AV speech processing and speech recognition. In the first part of the review we set out the basic principles of AV speech recognition and give the classification of audio and visual features of speech. Special attention is paid to the systematization of the existing techniques and the AV data fusion methods. In the second part we provide a consolidated list of tasks and applications that use the AV fusion based on carried out analysis of research area. We also indicate used methods, techniques, audio and video features. We propose classification of the AV integration, and discuss the advantages and disadvantages of different approaches. We draw conclusions and offer our assessment of the future in the field of AV fusion. In the further research we plan to implement a system of audio-visual Russian continuous speech recognition using advanced methods of multimodal fusion.
Pisoni, David B.
The goal was to acquire new knowledge about speech perception and production in severe environments such as high masking noise, increased cognitive load or sustained attentional demands. Changes were examined in speech production under these adverse conditions through acoustic analysis techniques. One set of studies focused on the effects of noise on speech production. The experiments in this group were designed to generate a database of speech obtained in noise and in quiet. A second set of experiments was designed to examine the effects of cognitive load on the acoustic-phonetic properties of speech. Talkers were required to carry out a demanding perceptual motor task while they read lists of test words. A final set of experiments explored the effects of vocal fatigue on the acoustic-phonetic properties of speech. Both cognitive load and vocal fatigue are present in many applications where speech recognition technology is used, yet their influence on speech production is poorly understood.
Wang, DeLiang; Kjems, Ulrik; Pedersen, Michael Syskind;
For a given mixture of speech and noise, an ideal binary time-frequency mask is constructed by comparing speech energy and noise energy within local time-frequency units. It is observed that listeners achieve nearly perfect speech recognition from gated noise with binary gains prescribed by the i...... by the ideal binary mask. Only 16 filter channels and a frame rate of 100 Hz are sufficient for high intelligibility. The results show that, despite a dramatic reduction of speech information, a pattern of binary gains provides an adequate basis for speech perception....
Clement, Bart Richard
Although speech perception has been considered a predominantly auditory phenomenon, large benefits from vision in degraded acoustic conditions suggest integration of audition and vision. More direct evidence of this comes from studies of audiovisual disparity that demonstrate vision can bias and even dominate perception (McGurk & MacDonald, 1976). It has been observed that hearing-impaired listeners demonstrate more visual biasing than normally hearing listeners (Walden et al., 1990). It is argued here that stimulus audibility must be equated across groups before true differences can be established. In the present investigation, effects of visual biasing on perception were examined as audibility was degraded for 12 young normally hearing listeners. Biasing was determined by quantifying the degree to which listener identification functions for a single synthetic auditory /ba-da-ga/ continuum changed across two conditions: (1)an auditory-only listening condition; and (2)an auditory-visual condition in which every item of the continuum was synchronized with visual articulations of the consonant-vowel (CV) tokens /ba/ and /ga/, as spoken by each of two talkers. Audibility was altered by presenting the conditions in quiet and in noise at each of three signal-to- noise (S/N) ratios. For the visual-/ba/ context, large effects of audibility were found. As audibility decreased, visual biasing increased. A large talker effect also was found, with one talker eliciting more biasing than the other. An independent lipreading measure demonstrated that this talker was more visually intelligible than the other. For the visual-/ga/ context, audibility and talker effects were less robust, possibly obscured by strong listener effects, which were characterized by marked differences in perceptual processing patterns among participants. Some demonstrated substantial biasing whereas others demonstrated little, indicating a strong reliance on audition even in severely degraded acoustic
Lynne E Bernstein
Full Text Available This paper examines the questions, what levels of speech can be perceived visually, and how is visual speech represented by the brain? Review of the literature leads to the conclusions that every level of psycholinguistic speech structure (i.e., phonetic features, phonemes, syllables, words, and prosody can be perceived visually, although individuals differ in their abilities to do so; and that there are visual modality-specific representations of speech qua speech in higher-level vision brain areas. That is, the visual system represents the modal patterns of visual speech. The suggestion that the auditory speech pathway receives and represents visual speech is examined in light of neuroimaging evidence on the auditory speech pathways. We outline the generally agreed-upon organization of the visual ventral and dorsal pathways and examine several types of visual processing that might be related to speech through those pathways, specifically, face and body, orthography, and sign language processing. In this context, we examine the visual speech processing literature, which reveals widespread diverse patterns activity in posterior temporal cortices in response to visual speech stimuli. We outline a model of the visual and auditory speech pathways and make several suggestions: (1 The visual perception of speech relies on visual pathway representations of speech qua speech. (2 A proposed site of these representations, the temporal visual speech area (TVSA has been demonstrated in posterior temporal cortex, ventral and posterior to multisensory posterior superior temporal sulcus (pSTS. (3 Given that visual speech has dynamic and configural features, its representations in feedforward visual pathways are expected to integrate these features, possibly in TVSA.
Full Text Available One view of speech perception is that acoustic signals are transformed into representations for pattern matching to determine linguistic structure. This process can be taken as a statistical pattern-matching problem, assuming realtively stable linguistic categories are characterized by neural representations related to auditory properties of speech that can be compared to speech input. This kind of pattern matching can be termed a passive process which implies rigidity of processingd with few demands on cognitive processing. An alternative view is that speech recognition, even in early stages, is an active process in which speech analysis is attentionally guided. Note that this does not mean consciously guided but that information-contingent changes in early auditory encoding can occur as a function of context and experience. Active processing assumes that attention, plasticity, and listening goals are important in considering how listeners cope with adverse circumstances that impair hearing by masking noise in the environment or hearing loss. Although theories of speech perception have begun to incorporate some active processing, they seldom treat early speech encoding as plastic and attentionally guided. Recent research has suggested that speech perception is the product of both feedforward and feedback interactions between a number of brain regions that include descending projections perhaps as far downstream as the cochlea. It is important to understand how the ambiguity of the speech signal and constraints of context dynamically determine cognitive resources recruited during perception including focused attention, learning, and working memory. Theories of speech perception need to go beyond the current corticocentric approach in order to account for the intrinsic dynamics of the auditory encoding of speech. In doing so, this may provide new insights into ways in which hearing disorders and loss may be treated either through augementation or
Howard Charles Nusbaum
One view of speech perception is that acoustic signals are transformed into representations for pattern matching to determine linguistic structure. This process can be taken as a statistical pattern-matching problem, assuming realtively stable linguistic categories are characterized by neural representations related to auditory properties of speech that can be compared to speech input. This kind of pattern matching can be termed a passive process which implies rigidity of processingd with few...
Little is known about the perception of speech sounds by native Danish listeners. However, the Danish sound system differs in several interesting ways from the sound systems of other languages. For instance, Danish is characterized, among other features, by a rich vowel inventory and by different...... reductions of speech sounds evident in the pronunciation of the language. This book (originally a PhD thesis) consists of three studies based on the results of two experiments. The experiments were designed to provide knowledge of the perception of Danish speech sounds by Danish adults and infants......, in the light of the rich and complex Danish sound system. The first two studies report on native adults’ perception of Danish speech sounds in quiet and noise. The third study examined the development of language-specific perception in native Danish infants at 6, 9 and 12 months of age. The book points...
Homae, Fumitaka; Watanabe, Hama; Taga, Gentaro
Infants often pay special attention to speech sounds, and they appear to detect key features of these sounds. To investigate the neural foundation of speech perception in infants, we measured cortical activation using near-infrared spectroscopy. We presented the following three types of auditory stimuli while 3-month-old infants watched a silent…
Lotto, Andrew J.; Hickok, Gregory S.; Holt, Lori L.
The discovery of mirror neurons, a class of neurons that respond when a monkey performs an action and also when the monkey observes others producing the same action, has promoted a renaissance for the Motor Theory (MT) of speech perception. This is because mirror neurons seem to accomplish the same kind of one to one mapping between perception and action that MT theorizes to be the basis of human speech communication. However, this seeming correspondence is superficial, and there are theoreti...
Lusk, Laina G; Mitchel, Aaron D
Speech is inextricably multisensory: both auditory and visual components provide critical information for all aspects of speech processing, including speech segmentation, the visual components of which have been the target of a growing number of studies. In particular, a recent study (Mitchel and Weiss, 2014) established that adults can utilize facial cues (i.e., visual prosody) to identify word boundaries in fluent speech. The current study expanded upon these results, using an eye tracker to identify highly attended facial features of the audiovisual display used in Mitchel and Weiss (2014). Subjects spent the most time watching the eyes and mouth. A significant trend in gaze durations was found with the longest gaze duration on the mouth, followed by the eyes and then the nose. In addition, eye-gaze patterns changed across familiarization as subjects learned the word boundaries, showing decreased attention to the mouth in later blocks while attention on other facial features remained consistent. These findings highlight the importance of the visual component of speech processing and suggest that the mouth may play a critical role in visual speech segmentation. PMID:26869959
van Hoesel, Richard J. M.
One of the key benefits of using cochlear implants (CIs) in both ears rather than just one is improved localization. It is likely that in complex listening scenes, improved localization allows bilateral CI users to orient toward talkers to improve signal-to-noise ratios and gain access to visual cues, but to date, that conjecture has not been tested. To obtain an objective measure of that benefit, seven bilateral CI users were assessed for both auditory-only and audio-visual speech intelligib...
Full Text Available It has been shown that integration of acoustic and visual information especially in noisy conditions yields improved speech recognition results. This raises the question of how to weight the two modalities in different noise conditions. Throughout this paper we develop a weighting process adaptive to various background noise situations. In the presented recognition system, audio and video data are combined following a Separate Integration (SI architecture. A hybrid Artificial Neural Network/Hidden Markov Model (ANN/HMM system is used for the experiments. The neural networks were in all cases trained on clean data. Firstly, we evaluate the performance of different weighting schemes in a manually controlled recognition task with different types of noise. Next, we compare different criteria to estimate the reliability of the audio stream. Based on this, a mapping between the measurements and the free parameter of the fusion process is derived and its applicability is demonstrated. Finally, the possibilities and limitations of adaptive weighting are compared and discussed.
Full Text Available This paper proposes an audio-visual speech recognition method using lip information extracted from side-face images as an attempt to increase noise robustness in mobile environments. Our proposed method assumes that lip images can be captured using a small camera installed in a handset. Two different kinds of lip features, lip-contour geometric features and lip-motion velocity features, are used individually or jointly, in combination with audio features. Phoneme HMMs modeling the audio and visual features are built based on the multistream HMM technique. Experiments conducted using Japanese connected digit speech contaminated with white noise in various SNR conditions show effectiveness of the proposed method. Recognition accuracy is improved by using the visual information in all SNR conditions. These visual features were confirmed to be effective even when the audio HMM was adapted to noise by the MLLR method.
Mitterer, H.; Scharenborg, O.; McQueen, J
Recent evidence shows that listeners use abstract prelexical units in speech perception. Using the phenomenon of lexical retuning in speech processing, we ask whether those units are necessarily phonemic. Dutch listeners were exposed to a Dutch speaker producing ambiguous phones between the Dutch syllable-final allophones approximant [r] and dark [l]. These ambiguous phones replaced either final /r/ or final /l/ in words in a lexical-decision task. This differential exposure affected percepti...
Rhone, Ariane E.; Nourski, Kirill V.; Oya, Hiroyuki; Kawasaki, Hiroto; Howard, Matthew A.; McMurray, Bob
In everyday conversation, viewing a talker's face can provide information about the timing and content of an upcoming speech signal, resulting in improved intelligibility. Using electrocorticography, we tested whether human auditory cortex in Heschl's gyrus (HG) and on superior temporal gyrus (STG) and motor cortex on precentral gyrus (PreC) were responsive to visual/gestural information prior to the onset of sound and whether early stages of auditory processing were sensitive to the visual content (speech syllable versus non-speech motion). Event-related band power (ERBP) in the high gamma band was content-specific prior to acoustic onset on STG and PreC, and ERBP in the beta band differed in all three areas. Following sound onset, we found with no evidence for content-specificity in HG, evidence for visual specificity in PreC, and specificity for both modalities in STG. These results support models of audio-visual processing in which sensory information is integrated in non-primary cortical areas. PMID:27182530
Gordon, Peter C.
Four experiments addressing the role of attention in phonetic perception are reported. The first experiment shows that the relative importance of two cues to the voicing distinction changes when subjects must perform an arithmetic distractor task at the same time as identifying a speech stimulus. The voice onset time cue loses phonetic significance when subjects are distracted, while the F0 onset frequency cue does not. The second experiment shows a similar pattern for two cues to the distinction between the vowels /i/ (as in 'beat') and /I/ (as in 'bit'). Together these experiments indicate that careful attention to speech perception is necessary for strong acoustic cues to achieve their full phonetic impact, while weaker acoustic cues achieve their full phonetic impact without close attention. Experiment 3 shows that this pattern is obtained when the distractor task places little demand on verbal short term memory. Experiment 4 provides a large data set for testing formal models of the role of attention in speech perception. Attention is shown to influence the signal to noise ratio in phonetic encoding. This principle is instantiated in a network model in which the role of attention is to reduce noise in the phonetic encoding of acoustic cues. Implications of this work for understanding speech perception and general theories of the role of attention in perception are discussed.
Alan James Power
Full Text Available Auditory cortical oscillations have been proposed to play an important role in speech perception. It is suggested that the brain may take temporal ‘samples’ of information from the speech stream at different rates, phase-resetting ongoing oscillations so that they are aligned with similar frequency bands in the input (‘phase locking’. Information from these frequency bands is then bound together for speech perception. To date, there are no explorations of neural phase-locking and entrainment to speech input in children. However, it is clear from studies of language acquisition that infants use both visual speech information and auditory speech information in learning. In order to study neural entrainment to speech in typically-developing children, we use a rhythmic entrainment paradigm (underlying 2 Hz or delta rate based on repetition of the syllable ba, presented in either the auditory modality alone, the visual modality alone, or as auditory-visual speech (via a talking head. To ensure attention to the task, children aged 13 years were asked to press a button as fast as possible when the ba stimulus violated the rhythm for each stream type. Rhythmic violation depended on delaying the occurrence of a ba in the isochronous stream. Neural entrainment was demonstrated for all stream types, and individual differences in standardized measures of language processing were related to auditory entrainment at the theta rate. Further, there was significant modulation of the preferred phase of auditory entrainment in the theta band when visual speech cues were present, indicating cross-modal phase resetting. The rhythmic entrainment paradigm developed here offers a method for exploring individual differences in oscillatory phase locking during development. In particular, a method for assessing neural entrainment and cross-modal phase resetting would be useful for exploring developmental learning difficulties thought to involve temporal sampling
Fava, Eswen; Hull, Rachel; Bortfeld, Heather
Initially, infants are capable of discriminating phonetic contrasts across the world's languages. Starting between seven and ten months of age, they gradually lose this ability through a process of perceptual narrowing. Although traditionally investigated with isolated speech sounds, such narrowing occurs in a variety of perceptual domains (e.g., faces, visual speech). Thus far, tracking the developmental trajectory of this tuning process has been focused primarily on auditory speech alone, and generally using isolated sounds. But infants learn from speech produced by people talking to them, meaning they learn from a complex audiovisual signal. Here, we use near-infrared spectroscopy to measure blood concentration changes in the bilateral temporal cortices of infants in three different age groups: 3-to-6 months, 7-to-10 months, and 11-to-14-months. Critically, all three groups of infants were tested with continuous audiovisual speech in both their native and another, unfamiliar language. We found that at each age range, infants showed different patterns of cortical activity in response to the native and non-native stimuli. Infants in the youngest group showed bilateral cortical activity that was greater overall in response to non-native relative to native speech; the oldest group showed left lateralized activity in response to native relative to non-native speech. These results highlight perceptual tuning as a dynamic process that happens across modalities and at different levels of stimulus complexity. PMID:25116572
Full Text Available Initially, infants are capable of discriminating phonetic contrasts across the world’s languages. Starting between seven and ten months of age, they gradually lose this ability through a process of perceptual narrowing. Although traditionally investigated with isolated speech sounds, such narrowing occurs in a variety of perceptual domains (e.g., faces, visual speech. Thus far, tracking the developmental trajectory of this tuning process has been focused primarily on auditory speech alone, and generally using isolated sounds. But infants learn from speech produced by people talking to them, meaning they learn from a complex audiovisual signal. Here, we use near-infrared spectroscopy to measure blood concentration changes in the bilateral temporal cortices of infants in three different age groups: 3-to-6 months, 7-to-10 months, and 11-to-14-months. Critically, all three groups of infants were tested with continuous audiovisual speech in both their native and another, unfamiliar language. We found that at each age range, infants showed different patterns of cortical activity in response to the native and non-native stimuli. Infants in the youngest group showed bilateral cortical activity that was greater overall in response to non-native relative to native speech; the oldest group showed left lateralized activity in response to native relative to non-native speech. These results highlight perceptual tuning as a dynamic process that happens across modalities and at different levels of stimulus complexity.
Full Text Available The use of visual cues during the processing of audiovisual speech is known to be less efficient in children and adults with language difficulties and difficulties are known to be more prevalent in children from low-income populations. In the present study, we followed an economically diverse group of thirty-seven infants longitudinally from 6-9 months to 14-16 months of age. We used eye-tracking to examine whether individual differences in visual attention during audiovisual processing of speech in 6 to 9 month old infants, particularly when processing congruent and incongruent auditory and visual speech cues, might be indicative of their later language development. Twenty-two of these 6-9 month old infants also participated in an event-related potential (ERP audiovisual task within the same experimental session. Language development was then followed-up at the age of 14-16 months, using two measures of language development, the Preschool Language Scale (PLS and the Oxford Communicative Development Inventory (CDI. The results show that those infants who were less efficient in auditory speech processing at the age of 6-9 months had lower receptive language scores at 14-16 months. A correlational analysis revealed that the pattern of face scanning and ERP responses to audio-visually incongruent stimuli at 6-9 months were both significantly associated with language development at 14-16 months. These findings add to the understanding of individual differences in neural signatures of audiovisual processing and associated looking behaviour in infants.
McGowan, Kevin B
Listeners' use of social information during speech perception was investigated by measuring transcription accuracy of Chinese-accented speech in noise while listeners were presented with a congruent Chinese face, an incongruent Caucasian face, or an uninformative silhouette. When listeners were presented with a Chinese face they transcribed more accurately than when presented with the Caucasian face. This difference existed both for listeners with a relatively high level of experience and for listeners with a relatively low level of experience with Chinese-accented English. Overall, these results are inconsistent with a model of social speech perception in which listener bias reduces attendance to the acoustic signal. These results are generally consistent with exemplar models of socially indexed speech perception predicting that activation of a social category will raise base activation levels of socially appropriate episodic traces, but the similar performance of more and less experienced listeners suggests the need for a more nuanced view with a role for both detailed experience and listener stereotypes. PMID:27483742
The emergence of heterogeneous networks and the rapid increase of Voice over IP (VoIP) applications provide important opportunities for the telecommunications market. These opportunities come at the price of increased complexity in the monitoring of the quality of service (QoS) and the need for adaptation of transmission systems to the changing environmental conditions. This thesis contains three papers concerned with quality assessment and enhancement of speech communication systems in adver...
Mitterer, Holger; Scharenborg, Odette; McQueen, James M
Recent evidence shows that listeners use abstract prelexical units in speech perception. Using the phenomenon of lexical retuning in speech processing, we ask whether those units are necessarily phonemic. Dutch listeners were exposed to a Dutch speaker producing ambiguous phones between the Dutch syllable-final allophones approximant [r] and dark [l]. These ambiguous phones replaced either final /r/ or final /l/ in words in a lexical-decision task. This differential exposure affected perception of ambiguous stimuli on the same allophone continuum in a subsequent phonetic-categorization test: Listeners exposed to ambiguous phones in /r/-final words were more likely to perceive test stimuli as /r/ than listeners with exposure in /l/-final words. This effect was not found for test stimuli on continua using other allophones of /r/ and /l/. These results confirm that listeners use phonological abstraction in speech perception. They also show that context-sensitive allophones can play a role in this process, and hence that context-insensitive phonemes are not necessary. We suggest there may be no one unit of perception. PMID:23973464
Campos-Sánchez, Antonio; López-Núñez, Juan-Antonio; Scionti, Giuseppe; Garzón, Ingrid; González-Andrades, Miguel; Alaminos, Miguel; Sola, Tomás
Videos can be used as didactic tools for self-learning under several circumstances, including those cases in which students are responsible for the development of this resource as an audiovisual notebook. We compared students' and teachers' perceptions regarding the main features that an audiovisual notebook should include. Four…
Full Text Available Recent magneto-encephalographic and electro-encephalographic studies provide evidence for cross-modal integration during audio-visual and audio-haptic speech perception, with speech gestures viewed or felt from manual tactile contact with the speaker’s face. Given the temporal precedence of the haptic and visual signals on the acoustic signal in these studies, the observed modulation of N1/P2 auditory evoked responses during bimodal compared to unimodal speech perception suggest that relevant and predictive visual and haptic cues may facilitate auditory speech processing. To further investigate this hypothesis, auditory evoked potentials were here compared during auditory-only, audio-visual and audio-haptic speech perception in live dyadic interactions between a listener and a speaker. In line with previous studies, auditory evoked potentials were attenuated and speeded up during both audio-haptic and audio-visual compared to auditory speech perception. Importantly, the observed latency and amplitude reduction did not significantly depend on the degree of visual and haptic recognition of the speech targets. Altogether, these results further demonstrate cross-modal interactions between the auditory, visual and haptic speech signals. Although they do not contradict the hypothesis that visual and haptic sensory inputs convey predictive information with respect to the incoming auditory speech input, these results suggest that, at least in live conversational interactions, systematic conclusions on sensory predictability in bimodal speech integration have to be taken with caution, with the extraction of predictive cues likely depending on the variability of the speech stimuli.
This book presents a new approach to examining perceived quality of audiovisual sequences. It uses electroencephalography to understand how exactly user quality judgments are formed within a test participant, and what might be the physiologically-based implications when being exposed to lower quality media. The book redefines experimental paradigms of using EEG in the area of quality assessment so that they better suit the requirements of standard subjective quality testings. Therefore, experimental protocols and stimuli are adjusted accordingly. .
Chen, Yi-Chuan; Shore, David I; Lewis, Terri L; Maurer, Daphne
We measured the typical developmental trajectory of the window of audiovisual simultaneity by testing four age groups of children (5, 7, 9, and 11 years) and adults. We presented a visual flash and an auditory noise burst at various stimulus onset asynchronies (SOAs) and asked participants to report whether the two stimuli were presented at the same time. Compared with adults, children aged 5 and 7 years made more simultaneous responses when the SOAs were beyond ± 200 ms but made fewer simultaneous responses at the 0 ms SOA. The point of subjective simultaneity was located at the visual-leading side, as in adults, by 5 years of age, the youngest age tested. However, the window of audiovisual simultaneity became narrower and response errors decreased with age, reaching adult levels by 9 years of age. Experiment 2 ruled out the possibility that the adult-like performance of 9-year-old children was caused by the testing of a wide range of SOAs. Together, the results demonstrate that the adult-like precision of perceiving audiovisual simultaneity is developed by 9 years of age, the youngest age that has been reported to date. PMID:26897264
Shahin, Antoine J.
Does musical training affect our perception of speech? For example, does learning to play a musical instrument modify the neural circuitry for auditory processing in a way that improves one’s ability to perceive speech more clearly in noisy environments? If so, can speech perception in individuals with hearing loss, who struggle in noisy situations, benefit from musical training? While music and speech exhibit some specialization in neural processing, there is evidence suggesting that skill...
Holt, Lori L.; Lotto, Andrew J.
The complexities of the acoustic speech signal pose many significant challenges for listeners. Although perceiving speech begins with auditory processing, investigation of speech perception has progressed mostly independently of study of the auditory system. Nevertheless, a growing body of evidence demonstrates that cross-fertilization between the two areas of research can be productive. We briefly describe research bridging the study of general auditory processing and speech perception, show...
Fernandez Pradier, Melanie
This thesis deals with emotion recognition from speech signals. The feature extraction step shall be improved by looking at the perception of music. In music theory, different pitch intervals (consonant, dissonant) and chords are believed to invoke different feelings in listeners. The question is whether there is a similar mechanism between perception of music and perception of emotional speech. Our research will follow three stages. First, the relationship between speech and music at segment...
Lolli, Sydney L.; Lewenstein, Ari D.; Basurto, Julian; Winnik, Sean; Loui, Psyche
Congenital amusics, or “tone-deaf” individuals, show difficulty in perceiving and producing small pitch differences. While amusia has marked effects on music perception, its impact on speech perception is less clear. Here we test the hypothesis that individual differences in pitch perception affect judgment of emotion in speech, by applying low-pass filters to spoken statements of emotional speech. A norming study was first conducted on Mechanical Turk to ensure that the intended emotions fro...
Sydney eLolli; Lewenstein, Ari D.; Julian eBasurto; Sean eWinnik; Psyche eLoui
Congenital amusics, or tone-deaf individuals, show difficulty in perceiving and producing small pitch differences. While amusia has marked effects on music perception, its impact on speech perception is less clear. Here we test the hypothesis that individual differences in pitch perception affect judgment of emotion in speech, by applying band-pass filters to spoken statements of emotional speech. A norming study was first conducted on Mechanical Turk to ensure that the intended emotions from...
Campos Sanchez, Antonio; López Núñez, Juan-Antonio; Scionti, Giuseppe; Garzón, Ingrid; González Andrades, Miguel; Alaminos, Miguel; Sola, Tomás
Videos can be used as didactic tools for self-learning under several circumstances, including those cases in which students are responsible for the development of this resource as an audiovisual notebook. We compared students' and teachers' perceptions regarding the main features that an audiovisual notebook should include. Four questionnaires with items about information, images, text and music, and filmmaking were used to investigate students' (n¿=¿115) and teachers' perceptions (n¿=¿28) re...
Kubicek, Claudia; Gervain, Judit; Hillairet de Boisferon, Anne; Pascalis, Olivier; Lœvenbruck, Hélène; Schwarzer, Gudrun
The present study examined whether infant-directed (ID) speech facilitates intersensory matching of audio--visual fluent speech in 12-month-old infants. German-learning infantsÃ¢ÂÂ audio--visual matching ability of German and French fluent speech was assessed by using a variant of the intermodal matching procedure, with auditory and visual speech information presented sequentially. In Experiment 1, the sentences were spoken in an adult-directed (AD) manner. Results showed that 12-month-old ...
Hulusi Kafaligonul; Can Oluk
Motion perception is a pervasive nature of vision and is affected by both immediate pattern of sensory inputs and prior experiences acquired through associations. Recently, several studies reported that an association can be established quickly between directions of visual motion and static sounds of distinct frequencies. After the association is formed, sounds are able to change the perceived direction of visual motion. To determine whether such rapidly acquired audiovisual associations and ...
Full Text Available Motion perception is a pervasive nature of vision and is affected by both immediate pattern of sensory inputs and prior experiences acquired through associations. Recently, several studies reported that an association can be established quickly between directions of visual motion and static sounds of distinct frequencies. After the association is formed, sounds are able to change the perceived direction of visual motion. To determine whether such rapidly acquired audiovisual associations and their subsequent influences on visual motion perception are dependent on the involvement of higher-order attentive tracking mechanisms, we designed psychophysical experiments using regular and reverse-phi random dot motions isolating low-level pre-attentive motion processing. Our results show that an association between the directions of low-level visual motion and static sounds can be formed and this audiovisual association alters the subsequent perception of low-level visual motion. These findings support the view that audiovisual associations are not restricted to high-level attention based motion system and early-level visual motion processing has some potential role.
Preston, Jonathan L; Irwin, Julia R; Turcios, Jacqueline
Children with speech sound disorders may perceive speech differently than children with typical speech development. The nature of these speech differences is reviewed with an emphasis on assessing phoneme-specific perception for speech sounds that are produced in error. Category goodness judgment, or the ability to judge accurate and inaccurate tokens of speech sounds, plays an important role in phonological development. The software Speech Assessment and Interactive Learning System, which has been effectively used to assess preschoolers' ability to perform goodness judgments, is explored for school-aged children with residual speech errors (RSEs). However, data suggest that this particular task may not be sensitive to perceptual differences in school-aged children. The need for the development of clinical tools for assessment of speech perception in school-aged children with RSE is highlighted, and clinical suggestions are provided. PMID:26458198
Burfin, Sabine; Pascalis, Olivier; Ruiz Tada, Elisa; Costa, Albert; Savariaux, Christophe; Kandel, Sonia
We all go through a process of perceptual narrowing for phoneme identification. As we become experts in the languages we hear in our environment we lose the ability to identify phonemes that do not exist in our native phonological inventory. This research examined how linguistic experience—i.e., the exposure to a double phonological code during childhood—affects the visual processes involved in non-native phoneme identification in audiovisual speech perception. We conducted a phoneme identifi...
Ito, Takayuki; Gracco, Vincent; Ostry, David J.
Speech perception is known to rely on both auditory and visual information. However, sound specific somatosensory input has been shown also to influence speech perceptual processing (Ito et al., 2009). In the present study we addressed further the relationship between somatosensory information and speech perceptual processing by addressing the hypothesis that the temporal relationship between orofacial movement and sound processing contributes to somatosensory-auditory interaction in speech p...
Full Text Available One of the main issues within the field of social robotics is to endow robots with the ability to direct attention to people with whom they are interacting. Different approaches follow bio-inspired mechanisms, merging audio and visual cues to localize a person using multiple sensors. However, most of these fusion mechanisms have been used in fixed systems, such as those used in video-conference rooms, and thus, they may incur difficulties when constrained to the sensors with which a robot can be equipped. Besides, within the scope of interactive autonomous robots, there is a lack in terms of evaluating the benefits of audio-visual attention mechanisms, compared to only audio or visual approaches, in real scenarios. Most of the tests conducted have been within controlled environments, at short distances and/or with off-line performance measurements. With the goal of demonstrating the benefit of fusing sensory information with a Bayes inference for interactive robotics, this paper presents a system for localizing a person by processing visual and audio data. Moreover, the performance of this system is evaluated and compared via considering the technical limitations of unimodal systems. The experiments show the promise of the proposed approach for the proactive detection and tracking of speakers in a human-robot interactive framework.
Viciana-Abad, Raquel; Marfil, Rebeca; Perez-Lorenzo, Jose M; Bandera, Juan P; Romero-Garces, Adrian; Reche-Lopez, Pedro
One of the main issues within the field of social robotics is to endow robots with the ability to direct attention to people with whom they are interacting. Different approaches follow bio-inspired mechanisms, merging audio and visual cues to localize a person using multiple sensors. However, most of these fusion mechanisms have been used in fixed systems, such as those used in video-conference rooms, and thus, they may incur difficulties when constrained to the sensors with which a robot can be equipped. Besides, within the scope of interactive autonomous robots, there is a lack in terms of evaluating the benefits of audio-visual attention mechanisms, compared to only audio or visual approaches, in real scenarios. Most of the tests conducted have been within controlled environments, at short distances and/or with off-line performance measurements. With the goal of demonstrating the benefit of fusing sensory information with a Bayes inference for interactive robotics, this paper presents a system for localizing a person by processing visual and audio data. Moreover, the performance of this system is evaluated and compared via considering the technical limitations of unimodal systems. The experiments show the promise of the proposed approach for the proactive detection and tracking of speakers in a human-robot interactive framework. PMID:24878593
Martínez-Montes, Eduardo; Hernández-Pérez, Heivet; Chobert, Julie; Morgado-Rodríguez, Lisbet; Suárez-Murias, Carlos; Valdés-Sosa, Pedro A.; Besson, Mireille
The aim of this experiment was to investigate the influence of musical expertise on the automatic perception of foreign syllables and harmonic sounds. Participants were Cuban students with high level of expertise in music or in visual arts and with the same level of general education and socio-economic background. We used a multi-feature Mismatch Negativity (MMN) design with sequences of either syllables in Mandarin Chinese or harmonic sounds, both comprising deviants in pitch contour, duration and Voice Onset Time (VOT) or equivalent that were either far from (Large deviants) or close to (Small deviants) the standard. For both Mandarin syllables and harmonic sounds, results were clear-cut in showing larger MMNs to pitch contour deviants in musicians than in visual artists. Results were less clear for duration and VOT deviants, possibly because of the specific characteristics of the stimuli. Results are interpreted as reflecting similar processing of pitch contour in speech and non-speech sounds. The implications of these results for understanding the influence of intense musical training from childhood to adulthood and of genetic predispositions for music on foreign language perception are discussed. PMID:24294193
Full Text Available The aim of this experiment was to investigate the influence of musical expertise on the automatic perception of foreign syllables and harmonic sounds. Participants were Cuban students with high level of expertise in music or in visual arts and with the same level of general education and socio-economic background. We used a multi-feature Mismatch Negativity (MMN design with sequences of either syllables in Mandarin Chinese or harmonic sounds, both comprising deviants in pitch contour, duration and Voice Onset Time (VOT or equivalent that were either far from (Large deviants or close to (Small deviants the standard. For both Mandarin syllables and harmonic sounds, results were clear-cut in showing larger MMNs to pitch contour deviants in musicians than in visual artists. Results were less clear for duration and VOT deviants, possibly because of the specific characteristics of the stimuli. Results are interpreted as reflecting similar processing of pitch contour in speech and non-speech sounds. The implications of these results for understanding the influence of intense musical training from childhood to adulthood and of genetic predispositions for music on foreign language perception is discussed.
Bertoncini, J; Cabrera, L
The development of speech perception relies upon early auditory capacities (i.e. discrimination, segmentation and representation). Infants are able to discriminate most of the phonetic contrasts occurring in natural languages, and at the end of the first year, this universal ability starts to narrow down to the contrasts used in the environmental language. During the second year, this specialization is characterized by the development of comprehension, lexical organization and word production. That process appears now as the result of multiple interactions between perceptual, cognitive and social developing abilities. Distinct factors like word acquisition, sensitivity to the statistical properties of the input, or even the nature of the social interactions, might play a role at one time or another during the acquisition of phonological patterns. Experience with the native language is necessary for phonetic segments to be functional units of perception and for speech sound representations (words, syllables) to be more specified and phonetically organized. This evolution goes on beyond 24 months of age in a learning context characterized from the early stages by the interaction with other developing (linguistic and non-linguistic) capacities. PMID:25218761
Foundations of Voice and Speech Quality Perception starts out with the fundamental question of: "How do listeners perceive voice and speech quality and how can these processes be modeled?" Any quantitative answers require measurements. This is natural for physical quantities but harder to imagine for perceptual measurands. This book approaches the problem by actually identifying major perceptual dimensions of voice and speech quality perception, defining units wherever possible and offering paradigms to position these dimensions into a structural skeleton of perceptual speech and voice quality. The emphasis is placed on voice and speech quality assessment of systems in artificial scenarios. Many scientific fields are involved. This book bridges the gap between two quite diverse fields, engineering and humanities, and establishes the new research area of Voice and Speech Quality Perception.
Shahin, Antoine J
Does musical training affect our perception of speech? For example, does learning to play a musical instrument modify the neural circuitry for auditory processing in a way that improves one's ability to perceive speech more clearly in noisy environments? If so, can speech perception in individuals with hearing loss (HL), who struggle in noisy situations, benefit from musical training? While music and speech exhibit some specialization in neural processing, there is evidence suggesting that skills acquired through musical training for specific acoustical processes may transfer to, and thereby improve, speech perception. The neurophysiological mechanisms underlying the influence of musical training on speech processing and the extent of this influence remains a rich area to be explored. A prerequisite for such transfer is the facilitation of greater neurophysiological overlap between speech and music processing following musical training. This review first establishes a neurophysiological link between musical training and speech perception, and subsequently provides further hypotheses on the neurophysiological implications of musical training on speech perception in adverse acoustical environments and in individuals with HL. PMID:21716639
Antoine J Shahin
Full Text Available Does musical training affect our perception of speech? For example, does learning to play a musical instrument modify the neural circuitry for auditory processing in a way that improves one’s ability to perceive speech more clearly in noisy environments? If so, can speech perception in individuals with hearing loss, who struggle in noisy situations, benefit from musical training? While music and speech exhibit some specialization in neural processing, there is evidence suggesting that skills acquired through musical training for specific acoustical processes may transfer to, and thereby improve, speech perception. The neurophysiological mechanisms underlying the influence of musical training on speech processing and the extent of this influence remains a rich area to be explored. A prerequisite for such transfer is the facilitation of greater neurophysiological overlap between speech and music processing following musical training. This review first establishes a neurophysiological link between musical training and speech perception, and subsequently provides further hypotheses on the neurophysiological implications of musical training on speech perception in adverse acoustical environments and in individuals with hearing loss.
Venezia, Jonathan H; Fillmore, Paul; Matchin, William; Isenberg, A Lisette; Hickok, Gregory; Fridriksson, Julius
Sensory information is critical for movement control, both for defining the targets of actions and providing feedback during planning or ongoing movements. This holds for speech motor control as well, where both auditory and somatosensory information have been shown to play a key role. Recent clinical research demonstrates that individuals with severe speech production deficits can show a dramatic improvement in fluency during online mimicking of an audiovisual speech signal suggesting the existence of a visuomotor pathway for speech motor control. Here we used fMRI in healthy individuals to identify this new visuomotor circuit for speech production. Participants were asked to perceive and covertly rehearse nonsense syllable sequences presented auditorily, visually, or audiovisually. The motor act of rehearsal, which is prima facie the same whether or not it is cued with a visible talker, produced different patterns of sensorimotor activation when cued by visual or audiovisual speech (relative to auditory speech). In particular, a network of brain regions including the left posterior middle temporal gyrus and several frontoparietal sensorimotor areas activated more strongly during rehearsal cued by a visible talker versus rehearsal cued by auditory speech alone. Some of these brain regions responded exclusively to rehearsal cued by visual or audiovisual speech. This result has significant implications for models of speech motor control, for the treatment of speech output disorders, and for models of the role of speech gesture imitation in development. PMID:26608242
Lu, Chunming; Long, Yuhang; Zheng, Lifen; Shi, Guang; Liu, Li; Ding, Guosheng; Howell, Peter
Speech production difficulties are apparent in people who stutter (PWS). PWS also have difficulties in speech perception compared to controls. It is unclear whether the speech perception difficulties in PWS are independent of, or related to, their speech production difficulties. To investigate this issue, functional MRI data were collected on 13 PWS and 13 controls whilst the participants performed a speech production task and a speech perception task. PWS performed poorer than controls in the perception task and the poorer performance was associated with a functional activity difference in the left anterior insula (part of the speech motor area) compared to controls. PWS also showed a functional activity difference in this and the surrounding area [left inferior frontal cortex (IFC)/anterior insula] in the production task compared to controls. Conjunction analysis showed that the functional activity differences between PWS and controls in the left IFC/anterior insula coincided across the perception and production tasks. Furthermore, Granger Causality Analysis on the resting-state fMRI data of the participants showed that the causal connection from the left IFC/anterior insula to an area in the left primary auditory cortex (Heschl’s gyrus) differed significantly between PWS and controls. The strength of this connection correlated significantly with performance in the perception task. These results suggest that speech perception difficulties in PWS are associated with anomalous functional activity in the speech motor area, and the altered functional connectivity from this area to the auditory area plays a role in the speech perception difficulties of PWS.
Cabbage, Kathryn L; Farquharson, Kelly; Hogan, Tiffany P
Some children with residual deficits in speech production also display characteristics of dyslexia; however, the causes of these disorders--in isolation or comorbidly--remain unknown. Presently, the role of phonological representations is an important construct for considering how the underlying system of phonology functions. In particular, two related skills--speech perception and phonological working memory--may provide insight into the nature of phonological representations. This study provides an exploratory investigation into the profiles of three 9-year-old children: one with residual speech errors, one with residual speech errors and dyslexia, and one who demonstrated typical, age-appropriate speech sound production and reading skills. We provide an in-depth examination of their relative abilities in the areas of speech perception, phonological working memory, vocabulary, and word reading. Based on these preliminary explorations, we suggest implications for the assessment and treatment of children with residual speech errors and/or dyslexia. PMID:26458199
Kaganovich, Natalya; Schumaker, Jennifer
Sensitivity to the temporal relationship between auditory and visual stimuli is key to efficient audiovisual integration. However, even adults vary greatly in their ability to detect audiovisual temporal asynchrony. What underlies this variability is currently unknown. We recorded event-related potentials (ERPs) while participants performed a simultaneity judgment task on a range of audiovisual (AV) and visual-auditory (VA) stimulus onset asynchronies (SOAs) and compared ERP responses in good and poor performers to the 200ms SOA, which showed the largest individual variability in the number of synchronous perceptions. Analysis of ERPs to the VA200 stimulus yielded no significant results. However, those individuals who were more sensitive to the AV200 SOA had significantly more positive voltage between 210 and 270ms following the sound onset. In a follow-up analysis, we showed that the mean voltage within this window predicted approximately 36% of variability in sensitivity to AV temporal asynchrony in a larger group of participants. The relationship between the ERP measure in the 210-270ms window and accuracy on the simultaneity judgment task also held for two other AV SOAs with significant individual variability -100 and 300ms. Because the identified window was time-locked to the onset of sound in the AV stimulus, we conclude that sensitivity to AV temporal asynchrony is shaped to a large extent by the efficiency in the neural encoding of sound onsets. PMID:27094850
Full Text Available Congenital amusics, or tone-deaf individuals, show difficulty in perceiving and producing small pitch differences. While amusia has marked effects on music perception, its impact on speech perception is less clear. Here we test the hypothesis that individual differences in pitch perception affect judgment of emotion in speech, by applying band-pass filters to spoken statements of emotional speech. A norming study was first conducted on Mechanical Turk to ensure that the intended emotions from the Macquarie Battery for Evaluation of Prosody (MBEP were reliably identifiable by US English speakers. The most reliably identified emotional speech samples were used in in Experiment 1, in which subjects performed a psychophysical pitch discrimination task, and an emotion identification task under band-pass and unfiltered speech conditions. Results showed a significant correlation between pitch discrimination threshold and emotion identification accuracy for band-pass filtered speech, with amusics (defined here as those with a pitch discrimination threshold > 16 Hz performing worse than controls. This relationship with pitch discrimination was not seen in unfiltered speech conditions. Given the dissociation between band-pass filtered and unfiltered speech conditions, we inferred that amusics may be compensating for poorer pitch perception by using speech cues that are filtered out in this manipulation.
Kushnerenko, Elena; Tomalski, Przemyslaw; Ballieux, Haiko; Potton, Anita; Birtles, Deidre; Frostick, Caroline; Moore, Derek G.
The use of visual cues during the processing of audiovisual (AV) speech is known to be less efficient in children and adults with language difficulties and difficulties are known to be more prevalent in children from low-income populations. In the present study, we followed an economically diverse group of thirty-seven infants longitudinally from 6–9 months to 14–16 months of age. We used eye-tracking to examine whether individual differences in visual attention during AV processing of speech in 6–9 month old infants, particularly when processing congruent and incongruent auditory and visual speech cues, might be indicative of their later language development. Twenty-two of these 6–9 month old infants also participated in an event-related potential (ERP) AV task within the same experimental session. Language development was then followed-up at the age of 14–16 months, using two measures of language development, the Preschool Language Scale and the Oxford Communicative Development Inventory. The results show that those infants who were less efficient in auditory speech processing at the age of 6–9 months had lower receptive language scores at 14–16 months. A correlational analysis revealed that the pattern of face scanning and ERP responses to audiovisually incongruent stimuli at 6–9 months were both significantly associated with language development at 14–16 months. These findings add to the understanding of individual differences in neural signatures of AV processing and associated looking behavior in infants. PMID:23882240
Lolli, Sydney L.; Lewenstein, Ari D.; Basurto, Julian; Winnik, Sean; Loui, Psyche
Congenital amusics, or “tone-deaf” individuals, show difficulty in perceiving and producing small pitch differences. While amusia has marked effects on music perception, its impact on speech perception is less clear. Here we test the hypothesis that individual differences in pitch perception affect judgment of emotion in speech, by applying low-pass filters to spoken statements of emotional speech. A norming study was first conducted on Mechanical Turk to ensure that the intended emotions from the Macquarie Battery for Evaluation of Prosody were reliably identifiable by US English speakers. The most reliably identified emotional speech samples were used in Experiment 1, in which subjects performed a psychophysical pitch discrimination task, and an emotion identification task under low-pass and unfiltered speech conditions. Results showed a significant correlation between pitch-discrimination threshold and emotion identification accuracy for low-pass filtered speech, with amusics (defined here as those with a pitch discrimination threshold >16 Hz) performing worse than controls. This relationship with pitch discrimination was not seen in unfiltered speech conditions. Given the dissociation between low-pass filtered and unfiltered speech conditions, we inferred that amusics may be compensating for poorer pitch perception by using speech cues that are filtered out in this manipulation. To assess this potential compensation, Experiment 2 was conducted using high-pass filtered speech samples intended to isolate non-pitch cues. No significant correlation was found between pitch discrimination and emotion identification accuracy for high-pass filtered speech. Results from these experiments suggest an influence of low frequency information in identifying emotional content of speech. PMID:26441718
Lolli, Sydney L; Lewenstein, Ari D; Basurto, Julian; Winnik, Sean; Loui, Psyche
Congenital amusics, or "tone-deaf" individuals, show difficulty in perceiving and producing small pitch differences. While amusia has marked effects on music perception, its impact on speech perception is less clear. Here we test the hypothesis that individual differences in pitch perception affect judgment of emotion in speech, by applying low-pass filters to spoken statements of emotional speech. A norming study was first conducted on Mechanical Turk to ensure that the intended emotions from the Macquarie Battery for Evaluation of Prosody were reliably identifiable by US English speakers. The most reliably identified emotional speech samples were used in Experiment 1, in which subjects performed a psychophysical pitch discrimination task, and an emotion identification task under low-pass and unfiltered speech conditions. Results showed a significant correlation between pitch-discrimination threshold and emotion identification accuracy for low-pass filtered speech, with amusics (defined here as those with a pitch discrimination threshold >16 Hz) performing worse than controls. This relationship with pitch discrimination was not seen in unfiltered speech conditions. Given the dissociation between low-pass filtered and unfiltered speech conditions, we inferred that amusics may be compensating for poorer pitch perception by using speech cues that are filtered out in this manipulation. To assess this potential compensation, Experiment 2 was conducted using high-pass filtered speech samples intended to isolate non-pitch cues. No significant correlation was found between pitch discrimination and emotion identification accuracy for high-pass filtered speech. Results from these experiments suggest an influence of low frequency information in identifying emotional content of speech. PMID:26441718
Full Text Available Audio-visual speech recognition (AVSR using acoustic and visual signals of speech have received attention recently because of its robustness in noisy environments. Perceptual studies also support this approach by emphasizing the importance of visual information for speech recognition in humans. An important issue in decision fusion based AVSR system is how to obtain the appropriate integration weight for the speech modalities to integrate and ensure the combined AVSR system’s performances better than that of the audio-only and visual-only systems under various noise conditions. To solve this issue, we present a genetic algorithm (GA based optimization scheme to obtain the appropriate integration weight from the relative reliability of each modality. The performance of the proposed GA optimized reliability-ratio based weight estimation scheme is demonstrated via single speaker, mobile functions isolated word recognition experiments. The results show that the proposed scheme improves robust recognition accuracy over the conventional unimodal systems and the baseline reliability ratio-based AVSR system under various signal to noise ratio conditions.
Biau, Emmanuel; Soto-Faraco, Salvador
Spontaneous beat gestures are an integral part of the paralinguistic context during face-to-face conversations. Here we investigated the time course of beat-speech integration in speech perception by measuring ERPs evoked by words pronounced with or without an accompanying beat gesture, while participants watched a spoken discourse. Words…
Aubanel, Vincent; Davis, Chris; Kim, Jeesun
A growing body of evidence shows that brain oscillations track speech. This mechanism is thought to maximize processing efficiency by allocating resources to important speech information, effectively parsing speech into units of appropriate granularity for further decoding. However, some aspects of this mechanism remain unclear. First, while periodicity is an intrinsic property of this physiological mechanism, speech is only quasi-periodic, so it is not clear whether periodicity would present an advantage in processing. Second, it is still a matter of debate which aspect of speech triggers or maintains cortical entrainment, from bottom-up cues such as fluctuations of the amplitude envelope of speech to higher level linguistic cues such as syntactic structure. We present data from a behavioral experiment assessing the effect of isochronous retiming of speech on speech perception in noise. Two types of anchor points were defined for retiming speech, namely syllable onsets and amplitude envelope peaks. For each anchor point type, retiming was implemented at two hierarchical levels, a slow time scale around 2.5 Hz and a fast time scale around 4 Hz. Results show that while any temporal distortion resulted in reduced speech intelligibility, isochronous speech anchored to P-centers (approximated by stressed syllable vowel onsets) was significantly more intelligible than a matched anisochronous retiming, suggesting a facilitative role of periodicity defined on linguistically motivated units in processing speech in noise.
BU Fanliang; CHEN Yanpu
As the human ear is dull to the phase in speech, little attention has been paid tophase information in speech coding. In fact, the speech perceptual quality may be degeneratedif the phase distortion is very large. The perceptual effect of the STFT (Short time Fouriertransform) phase spectrum is studied by auditory subjective hearing tests. Three main con-clusions are (1) If the phase information is neglected completely, the subjective quality of thereconstructed speech may be very poor; (2) Whether the neglected phase is in low frequencyband or high frequency band, the difference from the original speech can be perceived by ear;(3) It is very difficult for the human ear to perceive the difference of speech quality betweenoriginal speech and reconstructed speech while the phase quantization step size is shorter thanπ/7.
José Antonio Palao Errando
Full Text Available En la llamada «sociedad de la información» los estudios sobre cine se han visto diluidos en el abordaje pragmático y tecnológico del discurso audiovisual, así como la propia fruición del cine se ha visto atrapada en la red del DVD y del hipertexto. El propio cine reacciona ante ello a través de estructuras narrativas complejas que lo alejan del discurso audiovisual estándar. La función de los estudios sobre cine y de su enseñanza universitaria debe ser la reintroducción del sujeto rechazado del saber informativo por medio de la interpretación del texto fílmico. In the so called «information society», film studies have been diluted in the pragmatic and technological approaching of the audiovisual speech, as well as the own fruition of the cinema has been caught in the net of DVD and hypertext. The cinema itself reacts in the face of it through complex narrative structures that take it away from the standard audio-visual speech. The function of film studies at the university education should be the reintroduction of the rejected subject of the informative knowledge by means of the interpretation of film text.
This book interconnects two essential disciplines to study the perception of speech: Neuroscience and Quality of Experience, which to date have rarely been used together for the purposes of research on speech quality perception. In five key experiments, the book demonstrates the application of standard clinical methods in neurophysiology on the one hand, and of methods used in fields of research concerned with speech quality perception on the other. Using this combination, the book shows that speech stimuli with different lengths and different quality impairments are accompanied by physiological reactions related to quality variations, e.g., a positive peak in an event-related potential. Furthermore, it demonstrates that – in most cases – quality impairment intensity has an impact on the intensity of physiological reactions.
Andersen, Tobias S.
Information processing in the sensory modalities is not segregated but interacts strongly. The exact nature of this interaction is not known and might differ for different multisensory phenomena. Here, we investigate two cases of categorical audiovisual perception: speech perception and the perception of rapid flashes and beeps. It is known that multisensory interactions in general depend on physical factors, such as information reliability and modality appropriateness, but it is not know...
Pisoni, David B.
Over the past few years, there has been increased interest in studying some of the cognitive factors that affect speech perception performance of cochlear implant patients. In this paper, I provide a brief theoretical overview of the fundamental assumptions of the information-processing approach to cognition and discuss the role of perception, learning, and memory in speech perception and spoken language processing. The information-processing framework provides researchers and clinicians with...
Wagner, Petra; Windmann, Andreas
Time shrinking denotes the psycho-acoustic phenomenon that an acoustic event is perceived as shorter if it follows an even shorter acoustic event. Previous work has shown that time shrinking can be traced in speech-like phrases and may lead to the impression of a higher speech rate and syllable isochrony. This paper provides experimental evidence that time shrinking is effective on foot level as well as phrase level. Some examples from natural speech are given, where time shrinking effe...
Full Text Available This fMRI study examines shared and distinct cortical areas involved in the auditory perception of song and speech at the level of their underlying constituents: words, pitch and rhythm. Univariate and multivariate analyses were performed on the brain activity patterns of six conditions, arranged in a subtractive hierarchy: sung sentences including words, pitch and rhythm; hummed speech prosody and song melody containing only pitch patterns and rhythm; as well as the pure musical or speech rhythm.Systematic contrasts between these balanced conditions following their hierarchical organization showed a great overlap between song and speech at all levels in the bilateral temporal lobe, but suggested a differential role of the inferior frontal gyrus (IFG and intraparietal sulcus (IPS in processing song and speech. The left IFG was involved in word- and pitch-related processing in speech, the right IFG in processing pitch in song.Furthermore, the IPS showed sensitivity to discrete pitch relations in song as opposed to the gliding pitch in speech. Finally, the superior temporal gyrus and premotor cortex coded for general differences between words and pitch patterns, irrespective of whether they were sung or spoken. Thus, song and speech share many features which are reflected in a fundamental similarity of brain areas involved in their perception. However, fine-grained acoustic differences on word and pitch level are reflected in the activity of IFG and IPS.
Stacey, Paula C; Kitterick, Pádraig T; Morris, Saffron D; Sumner, Christian J
Understanding what is said in demanding listening situations is assisted greatly by looking at the face of a talker. Previous studies have observed that normal-hearing listeners can benefit from this visual information when a talker's voice is presented in background noise. These benefits have also been observed in quiet listening conditions in cochlear-implant users, whose device does not convey the informative temporal fine structure cues in speech, and when normal-hearing individuals listen to speech processed to remove these informative temporal fine structure cues. The current study (1) characterised the benefits of visual information when listening in background noise; and (2) used sine-wave vocoding to compare the size of the visual benefit when speech is presented with or without informative temporal fine structure. The accuracy with which normal-hearing individuals reported words in spoken sentences was assessed across three experiments. The availability of visual information and informative temporal fine structure cues was varied within and across the experiments. The results showed that visual benefit was observed using open- and closed-set tests of speech perception. The size of the benefit increased when informative temporal fine structure cues were removed. This finding suggests that visual information may play an important role in the ability of cochlear-implant users to understand speech in many everyday situations. Models of audio-visual integration were able to account for the additional benefit of visual information when speech was degraded and suggested that auditory and visual information was being integrated in a similar way in all conditions. The modelling results were consistent with the notion that audio-visual benefit is derived from the optimal combination of auditory and visual sensory cues. PMID:27085797
Coady, Jeffry A; Kluender, Keith R; Evans, Julia L
Previous research has suggested that children with specific language impairments (SLI) have deficits in basic speech perception abilities, and this may be an underlying source of their linguistic deficits. These findings have come from studies in which perception of synthetic versions of meaningless syllables was typically examined in tasks with high memory demands. In this study, 20 children with SLI (mean age = 9 years, 3 months) and 20 age-matched peers participated in a categorical perception task. Children identified and discriminated digitally edited versions of naturally spoken real words in tasks designed to minimize memory requirements. Both groups exhibited all hallmarks of categorical perception: a sharp labeling function, discontinuous discrimination performance, and discrimination predicted from identification. There were no group differences for identification data, but children with SLI showed lower peak discrimination values. Children with SLI still discriminated phonemically contrastive pairs at levels significantly better than chance, with discrimination of same-label pairs at chance. These data suggest that children with SLI perceive natural speech tokens comparably to age-matched controls when listening to words under conditions that minimize memory load. Further, poor performance on speech perception tasks may not be due to a speech perception deficit, but rather to a consequence of task demands. PMID:16378484
Biau, Emmanuel; Soto-Faraco, Salvador, 1970-
Spontaneous beat gestures are an integral part of the paralinguistic context during face-to-face conversations. Here we investigated the time course of beat-speech integration in speech perception by measuring ERPs evoked by words pronounced with or without an accompanying beat gesture, while participants watched a spoken discourse. Words accompanied by beats elicited a positive shift in ERPs at an early sensory stage (before 100 ms) and at a later time window coinciding with the auditory com...
Vlčková-Mejvaldová, J.; Horák, Petr
Berlin: Springer, 2011 - (Travieso-González, C.; Alonso-Hernández, J.), s. 170-176 ISBN 978-3-642-25019-4. ISSN 0302-9743. [5th International Conference on Nonlinear Speech Processing (NOLISP 2011). Las Palmas de Gran Canaria (ES), 07.11.2011-09.11.2011] Institutional research plan: CEZ:AV0Z20670512 Keywords : emotions * perception tests * speech synthesis Subject RIV: JA - Electronics ; Optoelectronics, Electrical Engineering
Anderson, Melinda C; Arehart, Kathryn H; Kates, James M
Speech perception depends on access to spectral and temporal acoustic cues. Temporal cues include slowly varying amplitude changes (i.e. temporal envelope, TE) and quickly varying amplitude changes associated with the center frequency of the auditory filter (i.e. temporal fine structure, TFS). This study quantifies the effects of TFS randomization through noise vocoding on the perception of speech quality by parametrically varying the amount of original TFS available above 1500Hz. The two research aims were: 1) to establish the role of TFS in quality perception, and 2) to determine if the role of TFS in quality perception differs between subjects with normal hearing and subjects with sensorineural hearing loss. Ratings were obtained from 20 subjects (10 with normal hearing and 10 with hearing loss) using an 11-point quality scale. Stimuli were processed in three different ways: 1) A 32-channel noise-excited vocoder with random envelope fluctuations in the noise carrier, 2) a 32-channel noise-excited vocoder with the noise-carrier envelope smoothed, and 3) removal of high-frequency bands. Stimuli were presented in quiet and in babble noise at 18dB and 12dB signal-to-noise ratios. TFS randomization had a measurable detrimental effect on quality ratings for speech in quiet and a smaller effect for speech in background babble. Subjects with normal hearing and subjects with sensorineural hearing loss provided similar quality ratings for noise-vocoded speech. PMID:24333929
Lametti, Daniel R.; Rochet-Capellan, Amélie; Neufeld, Emily; Shiller, Douglas M.; Ostry, David J.
Recent studies of human speech motor learning suggest that learning is accompanied by changes in auditory perception. But what drives the perceptual change? Is it a consequence of changes in the motor system? Or is it a result of sensory inflow during learning? Here, subjects participated in a speech motor-learning task involving adaptation to altered auditory feedback and they were subsequently tested for perceptual change. In two separate experiments, involving two different auditory percep...
Messaoud-Galusi, Souhila; Hazan, Valerie; Rosen, Stuart
Purpose: The claim that speech perception abilities are impaired in dyslexia was investigated in a group of 62 children with dyslexia and 51 average readers matched in age. Method: To test whether there was robust evidence of speech perception deficits in children with dyslexia, speech perception in noise and quiet was measured using 8 different…
Environmental statistics are known to be important factors shaping our perceptual system. The visual and auditory systems have evolved to be effcient for processing natural images or speech. The com- mon characteristics between natural images and speech are that they are both highly structured, therefore having much redundancy. Our perceptual system may use redundancy reduction and sparse coding strategies to deal with complex stimuli every day. Both redundancy reduction ...
［1］Richard, P., Schumeyer, Kenneth E. B., The effect of visual information on word initial consonant perception of dysarthric speech, in Proc. ICSLP'96 October 3-6 1996, Philadephia, Pennsylvania, USA.［2］Goff, B. L., Marigny, T. G., Benoit, C., Read my lips...and my jaw! How intelligible are the components of a speaker's face? Eurospeech'95, 4th European Conference on Speech Communication and Technology, Madrid, September 1995.［3］McGurk, H., MacDonald, J. Hearing lips and seeing voices, Nature, 1976, 264: 746.［4］Duran A. F., Mcgurk effect in Spanish and German listeners: Influences of visual cues in the perception of Spanish and German confliction audio-visual stimuli, Eurospeech'95. 4th European Conference on Speech Communication and Technology, Madrid, September 1995.［5］Luettin, J., Visual speech and speaker recognition, Ph.D thesis, University of Sheffield, 1997.［6］Xu Yanjun, Du Limin, Chinese audiovisual bimodal speech database CAVSR1.0, Chinese Journal of Acoustics, to appear.［7］Zhang Jialu, Speech corpora and language input/output methods' evaluation, Chinese Applied Acoustics, 1994, 13(3): 5.
Slavova, Velina; Verhelst, Werner; Sahli, Hichem
In this report we summarize the state-of-the-art of speech emotion recognition from the signal processing point of view. On the bases of multi-corporal experiments with machine-learning classifiers, the observation is made that existing approaches for supervised machine learning lead to database dependent classifiers which can not be applied for multi-language speech emotion recognition without additional training because they discriminate the emotion classes following the use...
Jin, Yu; Díaz, Begoña; Colomer, Marc; Sebastián Gallés, Núria
Individual differences in second language (L2) phoneme perception (within the normal population) have been related to speech perception abilities, also observed in the native language, in studies assessing the electrophysiological response mismatch negativity (MMN). Here, we investigate the brain oscillatory dynamics in the theta band, the spectral correlate of the MMN, that underpin success in phoneme learning. Using previous data obtained in an MMN paradigm, the dynamics of cort...
Jiang, Jintao; Bernstein, Lynne E.
When the auditory and visual components of spoken audiovisual nonsense syllables are mismatched, perceivers produce four different types of perceptual responses, auditory correct, visual correct, fusion (the so-called McGurk effect), and combination (i.e., two consonants are reported). Here, quantitative measures were developed to account for the distribution of types of perceptual responses to 384 different stimuli from four talkers. The measures included mutual information, the presented acoustic signal versus the acoustic signal recorded with the presented video, and the correlation between the presented acoustic and video stimuli. In Experiment 1, open-set perceptual responses were obtained for acoustic /bA/ or /lA/ dubbed to video /bA, dA, gA, vA, zA, lA, wA, ΔA/. The talker, the video syllable, and the acoustic syllable significantly influenced the type of response. In Experiment 2, the best predictors of response category proportions were a subset of the physical stimulus measures, with the variance accounted for in the perceptual response category proportions between 17% and 52%. That audiovisual stimulus relationships can account for response distributions supports the possibility that internal representations are based on modality-specific stimulus relationships. PMID:21574741
Nicholls, Michael E. R.; Searle, Dara A.
This study explored asymmetries for movement, expression and perception of visual speech. Sixteen dextral models were videoed as they articulated: "bat," "cat," "fat," and "sat." Measurements revealed that the right side of the mouth was opened wider and for a longer period than the left. The asymmetry was accentuated at the beginning and ends of…
Badino, Leonardo; D'Ausilio, Alessandro; Fadiga, Luciano; Metta, Giorgio
Action perception and recognition are core abilities fundamental for human social interaction. A parieto-frontal network (the mirror neuron system) matches visually presented biological motion information onto observers' motor representations. This process of matching the actions of others onto our own sensorimotor repertoire is thought to be important for action recognition, providing a non-mediated "motor perception" based on a bidirectional flow of information along the mirror parieto-frontal circuits. State-of-the-art machine learning strategies for hand action identification have shown better performances when sensorimotor data, as opposed to visual information only, are available during learning. As speech is a particular type of action (with acoustic targets), it is expected to activate a mirror neuron mechanism. Indeed, in speech perception, motor centers have been shown to be causally involved in the discrimination of speech sounds. In this paper, we review recent neurophysiological and machine learning-based studies showing (a) the specific contribution of the motor system to speech perception and (b) that automatic phone recognition is significantly improved when motor data are used during training of classifiers (as opposed to learning from purely auditory data). PMID:24935820
Boatman, Dana F.
Recent brain mapping studies have provided new insights into the cortical systems that mediate human speech perception. Electrocortical stimulation mapping (ESM) is a brain mapping method that is used clinically to localize cortical functions in neurosurgical patients. Recent ESM studies have yielded new insights into the cortical systems that…
Zhang, Juan; McBride-Chang, Catherine
While the importance of phonological sensitivity for understanding reading acquisition and impairment across orthographies is well documented, what underlies deficits in phonological sensitivity is not well understood. Some researchers have argued that speech perception underlies variability in phonological representations. Others have…
Rance, Gary; Fava, Rosanne; Baldock, Heath; Chong, April; Barker, Elizabeth; Corben, Louise; Delatycki
The aim of this study was to investigate auditory pathway function and speech perception ability in individuals with Friedreich ataxia (FRDA). Ten subjects confirmed by genetic testing as being homozygous for a GAA expansion in intron 1 of the FXN gene were included. While each of the subjects demonstrated normal, or near normal sound detection, 3…
Accounts of speech perception disagree on whether listeners perceive the acoustic signal (Diehl, Lotto, & Holt, 2004) or the vocal tract gestures that produce the signal (e.g., Fowler, 1986). In this dissertation, I outline a research program using a phenomenon called "perceptual compensation for coarticulation" (Mann, 1980) to examine this…
Fowler, Carol A; Shankweiler, Donald; Studdert-Kennedy, Michael
We revisit an article, "Perception of the Speech Code" (PSC), published in this journal 50 years ago (Liberman, Cooper, Shankweiler, & Studdert-Kennedy, 1967) and address one of its legacies concerning the status of phonetic segments, which persists in theories of speech today. In the perspective of PSC, segments both exist (in language as known) and do not exist (in articulation or the acoustic speech signal). Findings interpreted as showing that speech is not a sound alphabet, but, rather, phonemes are encoded in the signal, coupled with findings that listeners perceive articulation, led to the motor theory of speech perception, a highly controversial legacy of PSC. However, a second legacy, the paradoxical perspective on segments has been mostly unquestioned. We remove the paradox by offering an alternative supported by converging evidence that segments exist in language both as known and as used. We support the existence of segments in both language knowledge and in production by showing that phonetic segments are articulatory and dynamic and that coarticulation does not eliminate them. We show that segments leave an acoustic signature that listeners can track. This suggests that speech is well-adapted to public communication in facilitating, not creating a barrier to, exchange of language forms. PMID:26301536
ZHOU Li; NIE Yong-Wei
The paper tries to analyze speech perception in terms of its structure, process, levels and models. Some problems con⁃cerning speech perception have been touched upon. The paper aims at providing some reference for oral English teaching and learning in the light of speech perception. It is intended to arouse readers’reflection upon the effect of speech perception on oral English teaching.
Samira, Anderson; Bharath, Chandrasekaran; Han-Gyol, Yi; Nina, Kraus
Children are known to be particularly vulnerable to the effects of noise on speech perception, and it is commonly acknowledged that failure of central auditory processes can lead to these difficulties with speech-in-noise (SIN) perception. Still, little is known about the mechanistic relationship between central processes and the perception of speech in noise. Our aims were two-fold: to examine the effects of noise on the central encoding of speech through measurement of cortical event-relate...
Allard, Emily R.; Williams, Dale F.
Using semantic differential scales with nine trait pairs, 445 adults rated five audio-taped speech samples, one depicting an individual without a disorder and four portraying communication disorders. Statistical analyses indicated that the no disorder sample was rated higher with respect to the trait of employability than were the articulation,…
Wong, Patrick C. M.; Uppunda, Ajith K.; Parrish, Todd B.; Dhar, Sumitrajit
Purpose: The present study examines the brain basis of listening to spoken words in noise, which is a ubiquitous characteristic of communication, with the focus on the dorsal auditory pathway. Method: English-speaking young adults identified single words in 3 listening conditions while their hemodynamic response was measured using fMRI: speech in…
Genovese, E; Orzan, E; Turrini, M; Babighian, G; Arslan, E
Speech perception tests are an important part of procedures for diagnosing pre-verbal hearing loss. Merely establishing a child's hearing threshold with and without a hearing aid is not sufficient to ensure an adequate evaluation with a view to selecting cases suitable for cochlear implants because it fails to indicate the real benefit obtained from using a conventional hearing aid reliably. Speech perception tests have proved useful not only for patient selection, but also for subsequent evaluation of the efficacy of new hearing aids, such as tactile devices and cochlear implants. In clinical practice, the tests most commonly adopted with small children are: The Auditory Comprehension Test (ACT), Discrimination after Training (DAT), Monosyllable, Trochee, Spondee tests (MTS), Glendonald Auditory Screening Priocedure (GASP), Early Speech Perception Test (ESP), Rather than considering specific results achieved in individual cases, reference is generally made to the four speech perception classes proposed by Moog and Geers of the CID of St. Louis. The purpose of this classification, made on the results obtained with suitably differentiated tests according to the child's age and language ability, is to detect differences in perception of a spoken message in ideal listening conditions. To date, no italian language speech perception test has been designed to establish the assessment of speech perception level in children with profound hearing impairment. We attempted, therefore, to adapt the existing English tests to the Italian language taking into consideration the differences between the two languages. Our attention focused on the ESP test since it can be applied to even very small children (2 years old). The ESP is proposed in a standard version for hearing-impaired children over the age of 6 years and in a simplified version for younger children. The rationale we used for selecting Italian words reflect the rationale established for the original version, but the
Full Text Available Audiovisual translation is now a well-established sub-discipline of Translation Studies (TS: a position that it has reached over the last twenty years or so. Italian scholars and professionals in the field have made a substantial contribution to this successful development, a brief overview of which will be given in the first part of this article, inevitably concentrating on dubbing in the Italian context. Special attention will be devoted to the question of target audience perception, an area where researchers in the University of Bologna at Forlì have excelled. The second part of the article applies the methodology followed by the above mentioned researchers in a case study of how Italian end users perceive the dubbed version of the British film The History Boys (2006, which contains a plethora of culture-specific verbal and visual references to the English education system. The aim of the study was to ascertain: a whether translation/adaptation allows the transmission in this admittedly constrained medium of all the intended culture-bound issues, only too well known to the source audience, and, if so, to what extent, and b whether the target audience respondents to the e-questionnaire used were aware that they were missing information. The linked, albeit controversial, issue of quality assessment will also be addressed.
Bazon, Aline Cristine; Mantello, Erika Barioni; Gonçales, Alina Sanches; Isaac, Myriam de Lima; Hyppolito, Miguel Angelo; Reis, Ana Cláudia Mirândola Barbosa
Introduction The objective of the evaluation of auditory perception of cochlear implant users is to determine how the acoustic signal is processed, leading to the recognition and understanding of sound. Objective To investigate the differences in the process of auditory speech perception in individuals with postlingual hearing loss wearing a cochlear implant, using two different speech coding strategies, and to analyze speech perception and handicap perception in relation to the strategy us...
Stephens, Joseph D.; Holt, Lori L.
Data from Japanese quail suggest that the effect of preceding liquids (/l/ or /r/) on response to subsequent stops (/g/ or /d/) arises from general auditory processes sensitive to the spectral structure of sound [A. J. Lotto, K. R. Kluender, and L. L. Holt, J. Acoust. Soc. Am. 102, 1134-1140 (1997)]. If spectral content is key, appropriate nonspeech sounds should influence perception of speech sounds and vice versa. The former effect has been demonstrated [A. J. Lotto and K. R. Kluender, Percept. Psychophys. 60, 602-619 (1998)]. The current experiment investigated the influence of speech on the perception of nonspeech sounds. Nonspeech stimuli were 80-ms chirps modeled after the F2 and F3 transitions in /ga/ and /da/. F3 onset was increased in equal steps from 1800 Hz (/ga/ analog) to 2700 Hz (/da/ analog) to create a ten-member series. During AX discrimination trials, listeners heard chirps that were three steps apart on the series. Each chirp was preceded by a synthesized /al/ or /ar/. Results showed context effects predicted from differences in spectral content between the syllables and chirps. These results are consistent with the hypothesis that spectral contrast influences context effects in speech perception. [Work supported by ONR, NOHR, and CNBC.
Schellenberg, E Glenn
Claims of beneficial side effects of music training are made for many different abilities, including verbal and visuospatial abilities, executive functions, working memory, IQ, and speech perception in particular. Such claims assume that music training causes the associations even though children who take music lessons are likely to differ from other children in music aptitude, which is associated with many aspects of speech perception. Music training in childhood is also associated with cognitive, personality, and demographic variables, and it is well established that IQ and personality are determined largely by genetics. Recent evidence also indicates that the role of genetics in music aptitude and music achievement is much larger than previously thought. In short, music training is an ideal model for the study of gene-environment interactions but far less appropriate as a model for the study of plasticity. Children seek out environments, including those with music lessons, that are consistent with their predispositions; such environments exaggerate preexisting individual differences. PMID:25773632
Nisreen Naji Al-Khawaldeh
Full Text Available This paper presents the findings of an empirical study which compares Jordanian and English native speakers’ perceptions about the speech act of thanking. The forty interviews conducted revealed some similarities but also of remarkable cross-cultural differences relating to the significance of thanking, the variables affecting it, and the appropriate linguistic and paralinguistic choices, as well as their impact on the interpretation of thanking behaviour. The most important theoretical finding is that the data, while consistent with many views found in the existing literature, do not support Brown and Levinson’s (1987 claim that thanking is a speech act which intrinsically threatens the speaker’s negative face because it involves overt acceptance of an imposition on the speaker. Rather, thanking should be viewed as a means of establishing and sustaining social relationships. The study findings suggest that cultural variation in thanking is due to the high degree of sensitivity of this speech act to the complex interplay of a range of social and contextual variables, and point to some promising directions for further research.Keywords: Linguistic Variation, Cross-Cultural Pragmatics, Speech Act of Thanking, Perceptions of Politeness
Nisreen Naji Al-Khawaldeh; Vladimir Žegarac
This paper presents the findings of an empirical study which compares Jordanian and English native speakers’ perceptions about the speech act of thanking. The forty interviews conducted revealed some similarities but also of remarkable cross-cultural differences relating to the significance of thanking, the variables affecting it, and the appropriate linguistic and paralinguistic choices, as well as their impact on the interpretation of thanking behaviour. The most important theoretical findi...
PatriciaKuhl; SamuTaulu; AlexisBosseler; ElinaPihko; JyrkiMäkelä
The development of speech perception shows a dramatic transition between infancy and adulthood. Between 6 and 12 months, infants' initial ability to discriminate all phonetic units across the world's languages narrows—native discrimination increases while non-native discrimination shows a steep decline. We used magnetoencephalography (MEG) to examine whether brain oscillations in the theta band (4–8 Hz), reflecting increases in attention and cognitive effort, would provide a neural measure of...
Gick, Bryan; Derrick, Donald
Visual information from a speaker’s face can enhance1 or interfere with2 accurate auditory perception. This integration of information across auditory and visual streams has been observed in functional imaging studies3,4, and has typically been attributed to the frequency and robustness with which perceivers jointly encounter event-specific information from these two modalities5. Adding the tactile modality has long been considered a crucial next step in understanding multisensory integration...
Full Text Available Listeners vary in their ability to understand speech in noisy environments. Hearing sensitivity, as measured by pure-tone audiometry, can only partly explain these results, and cognition has emerged as another key concept. Although cognition relates to speech perception, the exact nature of the relationship remains to be fully understood. This study investigates how different aspects of cognition, particularly working memory and attention, relate to speech intelligibility for various tests.Perceptual accuracy of speech perception represents just one aspect of functioning in a listening environment. Activity and participation limits imposed by hearing loss, in addition to the demands of a listening environment, are also important and may be better captured by self-report questionnaires. Understanding how speech perception relates to self-reported aspects of listening forms the second focus of the study.Forty-four listeners aged between 50-74 years with mild SNHL were tested on speech perception tests differing in complexity from low (phoneme discrimination in quiet, to medium (digit triplet perception in speech-shaped noise to high (sentence perception in modulated noise; cognitive tests of attention, memory, and nonverbal IQ; and self-report questionnaires of general health-related and hearing-specific quality of life.Hearing sensitivity and cognition related to intelligibility differently depending on the speech test: neither was important for phoneme discrimination, hearing sensitivity alone was important for digit triplet perception, and hearing and cognition together played a role in sentence perception. Self-reported aspects of auditory functioning were correlated with speech intelligibility to different degrees, with digit triplets in noise showing the richest pattern. The results suggest that intelligibility tests can vary in their auditory and cognitive demands and their sensitivity to the challenges that auditory environments pose on
Heinrich, Antje; Henshaw, Helen; Ferguson, Melanie A
Listeners vary in their ability to understand speech in noisy environments. Hearing sensitivity, as measured by pure-tone audiometry, can only partly explain these results, and cognition has emerged as another key concept. Although cognition relates to speech perception, the exact nature of the relationship remains to be fully understood. This study investigates how different aspects of cognition, particularly working memory and attention, relate to speech intelligibility for various tests. Perceptual accuracy of speech perception represents just one aspect of functioning in a listening environment. Activity and participation limits imposed by hearing loss, in addition to the demands of a listening environment, are also important and may be better captured by self-report questionnaires. Understanding how speech perception relates to self-reported aspects of listening forms the second focus of the study. Forty-four listeners aged between 50 and 74 years with mild sensorineural hearing loss were tested on speech perception tests differing in complexity from low (phoneme discrimination in quiet), to medium (digit triplet perception in speech-shaped noise) to high (sentence perception in modulated noise); cognitive tests of attention, memory, and non-verbal intelligence quotient; and self-report questionnaires of general health-related and hearing-specific quality of life. Hearing sensitivity and cognition related to intelligibility differently depending on the speech test: neither was important for phoneme discrimination, hearing sensitivity alone was important for digit triplet perception, and hearing and cognition together played a role in sentence perception. Self-reported aspects of auditory functioning were correlated with speech intelligibility to different degrees, with digit triplets in noise showing the richest pattern. The results suggest that intelligibility tests can vary in their auditory and cognitive demands and their sensitivity to the challenges that
Full Text Available In many natural audiovisual events (e.g., a clap of the two hands, the visual signal precedes the sound and thus allows observers to predict when, where, and which sound will occur. Previous studies have already reported that there are distinct neural correlates of temporal (when versus phonetic/semantic (which content on audiovisual integration. Here we examined the effect of visual prediction of auditory location (where in audiovisual biological motion stimuli by varying the spatial congruency between the auditory and visual part of the audiovisual stimulus. Visual stimuli were presented centrally, whereas auditory stimuli were presented either centrally or at 90° azimuth. Typical subadditive amplitude reductions (AV – V < A were found for the auditory N1 and P2 for spatially congruent and incongruent conditions. The new finding is that the N1 suppression was larger for spatially congruent stimuli. A very early audiovisual interaction was also found at 30-50 ms in the spatially congruent condition, while no effect of congruency was found on the suppression of the P2. This indicates that visual prediction of auditory location can be coded very early in auditory processing.
van de Rijt, Luuk P. H.; van Opstal, A. John; Mylanus, Emmanuel A. M.; Straatman, Louise V.; Hu, Hai Yin; Snik, Ad F. M.; van Wanrooij, Marc M.
Background: Speech understanding may rely not only on auditory, but also on visual information. Non-invasive functional neuroimaging techniques can expose the neural processes underlying the integration of multisensory processes required for speech understanding in humans. Nevertheless, noise (from functional MRI, fMRI) limits the usefulness in auditory experiments, and electromagnetic artifacts caused by electronic implants worn by subjects can severely distort the scans (EEG, fMRI). Therefore, we assessed audio-visual activation of temporal cortex with a silent, optical neuroimaging technique: functional near-infrared spectroscopy (fNIRS). Methods: We studied temporal cortical activation as represented by concentration changes of oxy- and deoxy-hemoglobin in four, easy-to-apply fNIRS optical channels of 33 normal-hearing adult subjects and five post-lingually deaf cochlear implant (CI) users in response to supra-threshold unisensory auditory and visual, as well as to congruent auditory-visual speech stimuli. Results: Activation effects were not visible from single fNIRS channels. However, by discounting physiological noise through reference channel subtraction (RCS), auditory, visual and audiovisual (AV) speech stimuli evoked concentration changes for all sensory modalities in both cohorts (p < 0.001). Auditory stimulation evoked larger concentration changes than visual stimuli (p < 0.001). A saturation effect was observed for the AV condition. Conclusions: Physiological, systemic noise can be removed from fNIRS signals by RCS. The observed multisensory enhancement of an auditory cortical channel can be plausibly described by a simple addition of the auditory and visual signals with saturation. PMID:26903848
Cristia, Alejandrina; Seidl, Amanda; Junge, Caroline; Soderstrom, Melanie; Hagoort, Peter
There are increasing reports that individual variation in behavioral and neurophysiological measures of infant speech processing predicts later language outcomes, and specifically concurrent or subsequent vocabulary size. If such findings are held up under scrutiny, they could both illuminate theoretical models of language development and contribute to the prediction of communicative disorders. A qualitative, systematic review of this emergent literature illustrated the variety of approaches that have been used and highlighted some conceptual problems regarding the measurements. A quantitative analysis of the same data established that the bivariate relation was significant, with correlations of similar strength to those found for well-established nonlinguistic predictors of language. Further exploration of infant speech perception predictors, particularly from a methodological perspective, is recommended. PMID:24320112
Bidelman, Gavin M
Neural oscillations have been linked to various perceptual and cognitive brain operations. Here, we examined the role of these induced brain responses in categorical speech perception (CP), a phenomenon in which similar features are mapped to discrete, common identities despite their equidistant/continuous physical spacing. We recorded neuroelectric activity while participants rapidly classified sounds along a vowel continuum (/u/ to /a/). Time-frequency analyses applied to the EEG revealed distinct temporal dynamics in induced (non-phase locked) oscillations; increased β (15-30Hz) coded prototypical vowel sounds carrying well-defined phonetic categories whereas increased γ (50-70Hz) accompanied ambiguous tokens near the categorical boundary. Notably, changes in β activity were strongly correlated with the slope of listeners' psychometric identification functions, a measure of the "steepness" of their categorical percept. Our findings demonstrate that in addition to previously observed evoked (phase-locked) correlates of CP, induced brain activity in the β-band codes the ambiguity and strength of categorical speech percepts. PMID:25540857
Heinrich, Antje; Henshaw, Helen; Ferguson, Melanie A.
Good speech perception and communication skills in everyday life are crucial for participation and well-being, and are therefore an overarching aim of auditory rehabilitation. Both behavioral and self-report measures can be used to assess these skills. However, correlations between behavioral and self-report speech perception measures are often low. One possible explanation is that there is a mismatch between the specific situations used in the assessment of these skills in each method, and a more careful matching across situations might improve consistency of results. The role that cognition plays in specific speech situations may also be important for understanding communication, as speech perception tests vary in their cognitive demands. In this study, the role of executive function, working memory (WM) and attention in behavioral and self-report measures of speech perception was investigated. Thirty existing hearing aid users with mild-to-moderate hearing loss aged between 50 and 74 years completed a behavioral test battery with speech perception tests ranging from phoneme discrimination in modulated noise (easy) to words in multi-talker babble (medium) and keyword perception in a carrier sentence against a distractor voice (difficult). In addition, a self-report measure of aided communication, residual disability from the Glasgow Hearing Aid Benefit Profile, was obtained. Correlations between speech perception tests and self-report measures were higher when specific speech situations across both were matched. Cognition correlated with behavioral speech perception test results but not with self-report. Only the most difficult speech perception test, keyword perception in a carrier sentence with a competing distractor voice, engaged executive functions in addition to WM. In conclusion, any relationship between behavioral and self-report speech perception is not mediated by a shared correlation with cognition. PMID:27242564
Full Text Available In speech perception, a functional hierarchy has been proposed by recent functional neuroimaging studies: core auditory areas on the dorsal plane of superior temporal gyrus (STG are sensitive to basic acoustic characteristics, whereas downstream regions, specifically the left superior temporal sulcus (STS and middle temporal gyrus (MTG ventral to Heschl's gyrus (HG are responsive to abstract phonological features. What is unclear so far is the relationship between the dorsal and ventral processes, especially with regard to whether low-level acoustic processing is modulated by high-level phonological processing. To address the issue, we assessed sensitivity of core auditory and downstream regions to acoustic and phonological variations by using within- and across-category lexical tonal continua with equal physical intervals. We found that relative to within-category variation, across-category variation elicited stronger activation in the left middle MTG (mMTG, apparently reflecting the abstract phonological representations. At the same time, activation in the core auditory region decreased, resulting from the top-down influences of phonological processing. These results support a hierarchical organization of the ventral acoustic-phonological processing stream, which originates in the right HG/STG and projects to the left mMTG. Furthermore, our study provides direct evidence that low-level acoustic analysis is modulated by high-level phonological representations, revealing the cortical dynamics of acoustic and phonological processing in speech perception. Our findings confirm the existence of reciprocal progression projections in the auditory pathways and the roles of both feed-forward and feedback mechanisms in speech perception.
Patro, Chhayakanta; Mendel, Lisa Lucks
Understanding speech within an auditory scene is constantly challenged by interfering noise in suboptimal listening environments when noise hinders the continuity of the speech stream. In such instances, a typical auditory-cognitive system perceptually integrates available speech information and "fills in" missing information in the light of semantic context. However, individuals with cochlear implants (CIs) find it difficult and effortful to understand interrupted speech compared to their normal hearing counterparts. This inefficiency in perceptual integration of speech could be attributed to further degradations in the spectral-temporal domain imposed by CIs making it difficult to utilize the contextual evidence effectively. To address these issues, 20 normal hearing adults listened to speech that was spectrally reduced and spectrally reduced interrupted in a manner similar to CI processing. The Revised Speech Perception in Noise test, which includes contextually rich and contextually poor sentences, was used to evaluate the influence of semantic context on speech perception. Results indicated that listeners benefited more from semantic context when they listened to spectrally reduced speech alone. For the spectrally reduced interrupted speech, contextual information was not as helpful under significant spectral reductions, but became beneficial as the spectral resolution improved. These results suggest top-down processing facilitates speech perception up to a point, and it fails to facilitate speech understanding when the speech signals are significantly degraded. PMID:27586760
Mealings, Kiri T.; Demuth, Katherine; Buchholz, Jörg; Dillon, Harvey
Purpose: Open-plan classroom styles are increasingly being adopted in Australia despite evidence that their high intrusive noise levels adversely affect learning. The aim of this study was to develop a new Australian speech perception task (the Mealings, Demuth, Dillon, and Buchholz Classroom Speech Perception Test) and use it in an open-plan…
Ziegler, Johannes C.; Pech-Georgel, Catherine; George, Florence; Lorenzi, Christian
Speech perception of four phonetic categories (voicing, place, manner, and nasality) was investigated in children with specific language impairment (SLI) (n=20) and age-matched controls (n=19) in quiet and various noise conditions using an AXB two-alternative forced-choice paradigm. Children with SLI exhibited robust speech perception deficits in…
Mirman, Daniel; McClelland, James L; Holt, Lori L.
We describe an account of lexically guided tuning of speech perception based on interactive processing and Hebbian learning. Interactive feedback provides lexical information to prelexical levels, and Hebbian learning uses that information to retune the mapping from auditory input to prelexical representations of speech. Simulations of an extension of the TRACE model of speech perception are presented that demonstrate the efficacy of this mechanism. Further simulations show that acoustic simi...
Nabelek, Anna K.; Tampas, Joanna W.; Burchfield, Samuel B.
l, speech perception in noiseBackground noise is a significant factor influencing hearing-aid satisfaction and is a major reason for rejection of hearing aids. Attempts have been made by previous researchers to relate the use of hearing aids to speech perception in noise (SPIN), with an expectation of improved speech perception followed by an…
Naturally produced English clear speech has been shown to be more intelligible than English conversational speech. However, little is known about the extent of the clear speech effects in the production of nonnative English, and perception of foreign-accented English by younger and older listeners. The present study examined whether Cantonese speakers would employ the same strategies as those used by native English speakers in producing clear speech in their second language. Also, the clear s...
Oded Ghitza; Anne-Lise Giraud; David Poeppel
A RECENT OPINION ARTICLE (NEURAL OSCILLATIONS IN SPEECH: do not be enslaved by the envelope. Obleser et al., 2012) questions the validity of a class of speech perception models inspired by the possible role of neuronal oscillations in decoding speech (e.g., Ghitza, 2011; Giraud and Poeppel, 2012). The authors criticize, in particular, what they see as an over-emphasis of the role of temporal speech envelope information, and an over-emphasis of entrainment to the input rhythm while neglecting ...
Hu, Yi; Tahmina, Qudsia; Runge, Christina; Friedland, David R.
This study assesses the effects of adding low- or high-frequency information to the band-limited telephone-processed speech on bimodal listeners’ telephone speech perception in quiet environments. In the proposed experiments, bimodal users were presented under quiet listening conditions with wideband speech (WB), bandpass-filtered telephone speech (300–3,400 Hz, BP), high-pass filtered speech (f > 300 Hz, HP, i.e., distorted frequency components above 3,400 Hz in telephone speech were restore...
Full Text Available Listeners must accomplish two complementary perceptual feats in extracting a message from speech. They must discriminate linguistically-relevant acoustic variability and generalize across irrelevant variability. Said another way, they must categorize speech. Since the mapping of acoustic variability is language-specific, these categories must be learned from experience. Thus, understanding how, in general, the auditory system acquires and represents categories can inform us about the toolbox of mechanisms available to speech perception. This perspective invites consideration of findings from cognitive neuroscience literatures outside of the speech domain as a means of constraining models of speech perception. Although neurobiological models of speech perception have mainly focused on cerebral cortex, research outside the speech domain is consistent with the possibility of significant subcortical contributions in category learning. Here, we review the functional role of one such structure, the basal ganglia. We examine research from animal electrophysiology, human neuroimaging, and behavior to consider characteristics of basal ganglia processing that may be advantageous for speech category learning. We also present emerging evidence for a direct role for basal ganglia in learning auditory categories in a complex, naturalistic task intended to model the incidental manner in which speech categories are acquired. To conclude, we highlight new research questions that arise in incorporating the broader neuroscience research literature in modeling speech perception, and suggest how understanding contributions of the basal ganglia can inform attempts to optimize training protocols for learning non-native speech categories in adulthood.
Beasley, Augie E.; And Others
Six articles on the use of audiovisual materials in the school library media center cover how to develop an audiovisual production center; audiovisual forms; a checklist for effective video/16mm use in the classroom; slides in learning; hazards of videotaping in the library; and putting audiovisuals on the shelf. (EJS)
Lyxell, B; Rönnberg, J; Andersson, J; Linderoth, E
The study investigated the initial effects of the implementation of vibrotactile support on the individual's speech perception ability. Thirty-two subjects participated in the study; 16 with an acquired deafness and 16 with normal hearing. At a general level, the results indicated no immediate and direct improvement as a function of the implementation across all speech perception tests. However, when the subjects were divided into Skilled and Less Skilled groups, based on their performance in the visual condition of each test, it was found that the performance of the Skilled subjects deteriorated while that of the Less Skilled subjects improved when tactile information was provided in two conditions (word-discrimination and word-decoding conditions). It was concluded that tactile information interferes with Skilled subjects' automaticity of these functions. Furthermore, intercorrelations between discrimination and decoding tasks suggest that there are similarities between visually and tactilely supported speechreading in how they relate to sentence-based speechreading. Clinical implications of the results were discussed. PMID:8210957
Lev-Ari, Shiri; Peperkamp, Sharon
Speech perception is known to be influenced by listeners' expectations of the speaker. This paper tests whether the demographic makeup of individuals' communities can influence their perception of foreign sounds by influencing their expectations of the language. Using online experiments with participants from all across the U.S. and matched census data on the proportion of Spanish and other foreign language speakers in participants' communities, this paper shows that the demographic makeup of individuals' communities influences their expectations of foreign languages to have an alveolar trill versus a tap (Experiment 1), as well as their consequent perception of these sounds (Experiment 2). Thus, the paper shows that while individuals' expectations of foreign language to have a trill occasionally lead them to misperceive a tap in a foreign language as a trill, a higher proportion of non-trill language speakers in one's community decreases this likelihood. These results show that individuals' environment can influence their perception by shaping their linguistic expectations. PMID:27369129
Full Text Available Human locomotion typically creates noise, a possible consequence of which is the masking of sound signals originating in the surroundings. When walking side by side, people often subconsciously synchronize their steps. The neurophysiological and evolutionary background of this behavior is unclear. The present study investigated the potential of sound created by walking to mask perception of speech and compared the masking produced by walking in step with that produced by unsynchronized walking. The masking sound (footsteps on gravel and the target sound (speech were presented through the same speaker to 15 normal-hearing subjects. The original recorded walking sound was modified to mimic the sound of two individuals walking in pace or walking out of synchrony. The participants were instructed to adjust the sound level of the target sound until they could just comprehend the speech signal ("just follow conversation" or JFC level when presented simultaneously with synchronized or unsynchronized walking sound at 40 dBA, 50 dBA, 60 dBA, or 70 dBA. Synchronized walking sounds produced slightly less masking of speech than did unsynchronized sound. The median JFC threshold in the synchronized condition was 38.5 dBA, while the corresponding value for the unsynchronized condition was 41.2 dBA. Combined results at all sound pressure levels showed an improvement in the signal-to-noise ratio (SNR for synchronized footsteps; the median difference was 2.7 dB and the mean difference was 1.2 dB [P < 0.001, repeated-measures analysis of variance (RM-ANOVA]. The difference was significant for masker levels of 50 dBA and 60 dBA, but not for 40 dBA or 70 dBA. This study provides evidence that synchronized walking may reduce the masking potential of footsteps.
The perception of human languages is inherently a multi-modalprocess, in which audio information can be compensated by visual information to improve the recognition performance. Such a phenomenon in English, German, Spanish and so on has been researched, but in Chinese it has not been reported yet. In our experiment, 14 syllables (/ba, bi, bian, biao, bin, de, di, dian, duo, dong, gai, gan, gen, gu/), extracted from Chinese audiovisual bimodal speech database CAVSR-1.0, were pronounced by 10 subjects. The audio-only stimuli, audiovisual stimuli, and visual-only stimuli were recognized by 20 observers. The audio-only stimuli and audiovisual stimuli both were presented under 5 conditions: no noise, SNR 0 dB, -8 dB, -12 dB, and -16 dB. The experimental result is studied and the following conclusions for Chinese speech are reached. Human beings can recognize visual-only stimuli rather well. The place of articulation determines the visual distinction. In noisy environment, audio information can remarkably be compensated by visual information and as a result the recognition performance is greatly improved.
Full Text Available Production and comprehension of speech are closely interwoven. For example, the ability todetect an error in one's own speech, halt speech production, and finally correct the error can beexplained by assuming an inner speech loop which continuously compares the word representationsinduced by production to those induced by perception at various cognitive levels (e.g. conceptual, word,or phonological levels. Because spontaneous speech errors are relatively rare, a picture naming and haltparadigm can be used to evoke them. In this paradigm, picture presentation (target word initiation isfollowed by an auditory stop signal (distractor word for halting speech production. The current studyseeks to understand the neural mechanisms governing self-detection of speech errors by developing abiologically inspired neural model of the inner speech loop. The neural model is based on the NeuralEngineering Framework (NEF and consists of a network of about 500,000 spiking neurons. In the firstexperiment we induce simulated speech errors semantically and phonologically. In the secondexperiment, we simulate a picture naming and halt task. Target-distractor word pairs were balanced withrespect to variation of phonological and semantic similarity. The results of the first experiment show thatspeech errors are successfully detected by a monitoring component in the inner speech loop. The resultsof the second experiment show that the model correctly reproduces human behavioral data on thepicture naming and halt task. In particular, the halting rate in the production of target words was lowerfor phonologically similar words than for semantically similar or fully dissimilar distractor words. We thusconclude that the neural architecture proposed here to model the inner speech loop reflects importantinteractions in production and perception at phonological and semantic levels.
Full Text Available Audio‐visual speech recognition is a natural and robust approach to improving human-robot interaction in noisy environments. Although multi‐stream Dynamic Bayesian Network and coupled HMM are widely used for audio‐visual speech recognition, they fail to learn the shared features between modalities and ignore the dependency of features among the frames within each discrete state. In this paper, we propose a Deep Dynamic Bayesian Network (DDBN to perform unsupervised extraction of spatial‐temporal multimodal features from Tibetan audio‐visual speech data and build an accurate audio‐visual speech recognition model under a no frame‐independency assumption. The experiment results on Tibetan speech data from some real‐world environments showed the proposed DDBN outperforms the state‐of‐art methods in word recognition accuracy.
Mair, K. R.
High-functioning adults with autism spectrum disorder (ASD) frequently report difficulties with speech perception in background noise, which cannot be explained simply by an impairment in peripheral hearing or structural language ability. In spite of the apparent prevalence of this problem, however, only a handful of studies so far have evaluated speech reception thresholds (SRTs) in this group under controlled conditions, and then only with a limited range of (mainly non-speech) masking soun...
Mitterer, Holger; Ernestus, Mirjam
This study reports a shadowing experiment, in which one has to repeat a speech stimulus as fast as possible. We tested claims about a direct link between perception and production based on speech gestures, and obtained two types of counterevidence. First, shadowing is not slowed down by a gestural mismatch between stimulus and response. Second,…
Viswanathan, Navin; Magnuson, James S.; Fowler, Carol A.
According to one approach to speech perception, listeners perceive speech by applying general pattern matching mechanisms to the acoustic signal (e.g., Diehl, Lotto, & Holt, 2004). An alternative is that listeners perceive the phonetic gestures that structured the acoustic signal (e.g., Fowler, 1986). The two accounts have offered different…
Lavie, Limor; Banai, Karen; Karni, Avi; Attias, Joseph
Purpose: We tested whether using hearing aids can improve unaided performance in speech perception tasks in older adults with hearing impairment. Method: Unaided performance was evaluated in dichotic listening and speech-in-noise tests in 47 older adults with hearing impairment; 36 participants in 3 study groups were tested before hearing aid…
Casserly, Elizabeth D.
Real-time use of spoken language is a fundamentally interactive process involving speech perception, speech production, linguistic competence, motor control, neurocognitive abilities such as working memory, attention, and executive function, environmental noise, conversational context, and--critically--the communicative interaction between…
Guediche, Sara; Blumstein, Sheila E.; Fiez, Julie A.; Holt, Lori L.
Adult speech perception reflects the long-term regularities of the native language, but it is also flexible such that it accommodates and adapts to adverse listening conditions and short-term deviations from native-language norms. The purpose of this article is to examine how the broader neuroscience literature can inform and advance research efforts in understanding the neural basis of flexibility and adaptive plasticity in speech perception. Specifically, we highlight the potential role of ...
Lori Astheimer; Monika Janus; Sylvain Moreno; Ellen Bialystok
Event-related potential (ERP) evidence demonstrates that preschool-aged children selectively attend to informative moments such as word onsets during speech perception. Although this observation indicates a role for attention in language processing, it is unclear whether this type of attention is part of basic speech perception mechanisms, higher-level language skills, or general cognitive abilities. The current study examined these possibilities by measuring ERPs from 5-year-old children lis...
Eskelund, Kasper; MacDonald, Ewen; Andersen, Tobias
We perceive identity, expression and speech from faces. While perception of identity and expression depends crucially on the configuration of facial features it is less clear whether this holds for visual speech perception.Facial configuration is poorly perceived for upside-down faces as demonstrated by the Thatcher illusion in which the orientation of the eyes and mouth with respect to the face is inverted (Thatcherization). This gives the face a grotesque appearance but this is only seen wh...
Barbulescu, Adela; Bailly, Gérard; Ronfard, Rémi; Pouget, Maël
The focus of this study is the generation of expressive audiovisual speech from neutral utterances for 3D virtual actors. Taking into account the segmental and suprasegmental aspects of audiovisual speech, we propose and compare several computational frameworks for the generation of expressive speech and face animation. We notably evaluate a standard frame-based conversion approach with two other methods that postulate the existence of global prosodic audiovisual patterns that are characteris...
Shojaei, Elahe; Ashayeri, Hassan; Jafari, Zahra; Zarrin Dast, Mohammad Reza; Kamali, Koorosh
Background: Speech perception ability depends on auditory and extra-auditory elements. The signal- to-noise ratio (SNR) is an extra-auditory element that has an effect on the ability to normally follow speech and maintain a conversation. Speech in noise perception difficulty is a common complaint of the elderly. In this study, the importance of SNR magnitude as an extra-auditory effect on speech perception in noise was examined in the elderly. Methods: The speech perception in noise test (SPIN) was conducted on 25 elderly participants who had bilateral low–mid frequency normal hearing thresholds at three SNRs in the presence of ipsilateral white noise. These participants were selected by available sampling method. Cognitive screening was done using the Persian Mini Mental State Examination (MMSE) test. Results: Independent T- test, ANNOVA and Pearson Correlation Index were used for statistical analysis. There was a significant difference in word discrimination scores at silence and at three SNRs in both ears (p≤0.047). Moreover, there was a significant difference in word discrimination scores for paired SNRs (0 and +5, 0 and +10, and +5 and +10 (p≤0.04)). No significant correlation was found between age and word recognition scores at silence and at three SNRs in both ears (p≥0.386). Conclusion: Our results revealed that decreasing the signal level and increasing the competing noise considerably reduced the speech perception ability in normal hearing at low–mid thresholds in the elderly. These results support the critical role of SNRs for speech perception ability in the elderly. Furthermore, our results revealed that normal hearing elderly participants required compensatory strategies to maintain normal speech perception in challenging acoustic situations. PMID:27390712
Anderson, Samira; Kraus, Nina
Numerous factors contribute to understanding speech in noisy listening environments. There is a clinical need for objective biological assessment of auditory factors that contribute to the ability to hear speech in noise, factors that are free from the demands of attention and memory. Subcortical processing of complex sounds such as speech (auditory brainstem responses to speech and other complex stimuli [cABRs]) reflects the integrity of auditory function. Because cABRs physically resemble t...
Full Text Available The present study was intended to make electrophysiological investigations into the preattentive perception of native and non-native speech sounds. We recorded the mismatch negativity, elicited by single syllable change of both native and non-native speech-sound contrasts in tonal languages. EEGs were recorded and low-resolution brain electromagnetic tomography (LORETA was utilized to explore the neural electrical activity. Our results suggested that the left hemisphere was predominant in the perception of native speech sounds, whereas the non-native speech sound was perceived predominantly by the right hemisphere, which may be explained by the specialization in processing the prosodic and emotional components of speech formed in this hemisphere.
Crew, Joseph D; Galvin, John J; Landsberger, David M; Fu, Qian-Jie
Cochlear implant (CI) users have difficulty understanding speech in noisy listening conditions and perceiving music. Aided residual acoustic hearing in the contralateral ear can mitigate these limitations. The present study examined contributions of electric and acoustic hearing to speech understanding in noise and melodic pitch perception. Data was collected with the CI only, the hearing aid (HA) only, and both devices together (CI+HA). Speech reception thresholds (SRTs) were adaptively measured for simple sentences in speech babble. Melodic contour identification (MCI) was measured with and without a masker instrument; the fundamental frequency of the masker was varied to be overlapping or non-overlapping with the target contour. Results showed that the CI contributes primarily to bimodal speech perception and that the HA contributes primarily to bimodal melodic pitch perception. In general, CI+HA performance was slightly improved relative to the better ear alone (CI-only) for SRTs but not for MCI, with some subjects experiencing a decrease in bimodal MCI performance relative to the better ear alone (HA-only). Individual performance was highly variable, and the contribution of either device to bimodal perception was both subject- and task-dependent. The results suggest that individualized mapping of CIs and HAs may further improve bimodal speech and music perception. PMID:25790349
Joseph D Crew
Full Text Available Cochlear implant (CI users have difficulty understanding speech in noisy listening conditions and perceiving music. Aided residual acoustic hearing in the contralateral ear can mitigate these limitations. The present study examined contributions of electric and acoustic hearing to speech understanding in noise and melodic pitch perception. Data was collected with the CI only, the hearing aid (HA only, and both devices together (CI+HA. Speech reception thresholds (SRTs were adaptively measured for simple sentences in speech babble. Melodic contour identification (MCI was measured with and without a masker instrument; the fundamental frequency of the masker was varied to be overlapping or non-overlapping with the target contour. Results showed that the CI contributes primarily to bimodal speech perception and that the HA contributes primarily to bimodal melodic pitch perception. In general, CI+HA performance was slightly improved relative to the better ear alone (CI-only for SRTs but not for MCI, with some subjects experiencing a decrease in bimodal MCI performance relative to the better ear alone (HA-only. Individual performance was highly variable, and the contribution of either device to bimodal perception was both subject- and task-dependent. The results suggest that individualized mapping of CIs and HAs may further improve bimodal speech and music perception.
Judge, S.; Robertson, Z.; Hawley, M.
This study set out to collect data from assistive technology professionals about their provision of speech-driven environmental control systems. This study is part of a larger study looking at developing a new speech-driven environmental control system.
Full Text Available Objective: To determine speech-perception-in-noise (with speech and noise spatially distinct and coincident and bilateral spatial benefits of head-shadow effect, summation, squelch and spatial release of masking in adults with delayed sequential cochlear implants. Study design: A cross-sectional one group post-test-only exploratory design was employed. Eleven adults (mean age 47 years; range 21 – 69 years of the Pretoria Cochlear Implant Programme (PCIP in South Africa with a bilateral severe-to-profound sensorineural hearing loss were recruited. Prerecorded Everyday Speech Sentences of The Central Institute for the Deaf (CID were used to evaluate participants’ speech-in-noise perception at sentence level. An adaptive procedure was used to determine the signal-to-noise ratio (SNR, in dB at which the participant’s speech reception threshold (SRT was achieved. Specific calculations were used to estimate bilateral spatial benefit effects. Results: A minimal bilateral benefit for speech-in-noise perception was observed with noise directed to the first implant (CI 1 (1.69 dB and in the speech and noise spatial listening condition (0.78 dB, but was not statistically significant. The head-shadow effect at 180° was the most robust bilateral spatial benefit. An improvement in speech perception in spatially distinct speech and noise indicates the contribution of the second implant (CI 2 is greater than that of the first implant (CI 1 for bilateral spatial benefit. Conclusion: Bilateral benefit for delayed sequentially implanted adults is less than previously reported for simultaneous and sequentially implanted adults. Delayed sequential implantation benefit seems to relate to the availability of the ear with the most favourable SNR.
Kleinschmidt, Dave F; Jaeger, T Florian
Successful speech perception requires that listeners map the acoustic signal to linguistic categories. These mappings are not only probabilistic, but change depending on the situation. For example, one talker's /p/ might be physically indistinguishable from another talker's /b/ (cf. lack of invariance). We characterize the computational problem posed by such a subjectively nonstationary world and propose that the speech perception system overcomes this challenge by (a) recognizing previously encountered situations, (b) generalizing to other situations based on previous similar experience, and (c) adapting to novel situations. We formalize this proposal in the ideal adapter framework: (a) to (c) can be understood as inference under uncertainty about the appropriate generative model for the current talker, thereby facilitating robust speech perception despite the lack of invariance. We focus on 2 critical aspects of the ideal adapter. First, in situations that clearly deviate from previous experience, listeners need to adapt. We develop a distributional (belief-updating) learning model of incremental adaptation. The model provides a good fit against known and novel phonetic adaptation data, including perceptual recalibration and selective adaptation. Second, robust speech recognition requires that listeners learn to represent the structured component of cross-situation variability in the speech signal. We discuss how these 2 aspects of the ideal adapter provide a unifying explanation for adaptation, talker-specificity, and generalization across talkers and groups of talkers (e.g., accents and dialects). The ideal adapter provides a guiding framework for future investigations into speech perception and adaptation, and more broadly language comprehension. PMID:25844873
Evans, S; McGettigan, C; Agnew, ZK; Rosen, S; Scott, SK
Spoken conversations typically take place in noisy environments and different kinds of masking sounds place differing demands on cognitive resources. Previous studies, examining the modulation of neural activity associated with the properties of competing sounds, have shown that additional speech streams engage the superior temporal gyrus. However, the absence of a condition in which target speech was heard without additional masking made it difficult to identify brain networks specific to masking and to ascertain the extent to which competing speech was processed equivalently to target speech. In this study, we scanned young healthy adults with continuous functional Magnetic Resonance Imaging (fMRI), whilst they listened to stories masked by sounds that differed in their similarity to speech. We show that auditory attention and control networks are activated during attentive listening to masked speech in the absence of an overt behavioural task. We demonstrate that competing speech is processed predominantly in the left hemisphere within the same pathway as target speech but is not treated equivalently within that stream, and that individuals who perform better in speech in noise tasks activate the left mid-posterior superior temporal gyrus more. Finally, we identify neural responses associated with the onset of sounds in the auditory environment, activity was found within right lateralised frontal regions consistent with a phasic alerting response. Taken together, these results provide a comprehensive account of the neural processes involved in listening in noise. PMID:26696297
Full Text Available For deaf individuals with residual low-frequency acoustic hearing, combined use of a cochlear implant (CI and hearing aid (HA typically provides better speech understanding than with either device alone. Because of coarse spectral resolution, CIs do not provide fundamental frequency (F0 information that contributes to understanding of tonal languages such as Mandarin Chinese. The HA can provide good representation of F0 and, depending on the range of aided acoustic hearing, first and second formant (F1 and F2 information. In this study, Mandarin tone, vowel, and consonant recognition in quiet and noise was measured in 12 adult Mandarin-speaking bimodal listeners with the CI-only and with the CI+HA. Tone recognition was significantly better with the CI+HA in noise, but not in quiet. Vowel recognition was significantly better with the CI+HA in quiet, but not in noise. There was no significant difference in consonant recognition between the CI-only and the CI+HA in quiet or in noise. There was a wide range in bimodal benefit, with improvements often greater than 20 percentage points in some tests and conditions. The bimodal benefit was compared to CI subjects' HA-aided pure-tone average (PTA thresholds between 250 and 2000 Hz; subjects were divided into two groups: "better" PTA (50 dB HL. The bimodal benefit differed significantly between groups only for consonant recognition. The bimodal benefit for tone recognition in quiet was significantly correlated with CI experience, suggesting that bimodal CI users learn to better combine low-frequency spectro-temporal information from acoustic hearing with temporal envelope information from electric hearing. Given the small number of subjects in this study (n = 12, further research with Chinese bimodal listeners may provide more information regarding the contribution of acoustic and electric hearing to tonal language perception.
Pyschny, Verena; Landwehr, Markus; Hahn, Moritz; Walger, Martin; von Wedel, Hasso; Meister, Hartmut
Purpose: The objective of the study was to investigate the influence of bimodal stimulation upon hearing ability for speech recognition in the presence of a single competing talker. Method: Speech recognition was measured in 3 listening conditions: hearing aid (HA) alone, cochlear implant (CI) alone, and both devices together (CI + HA). To examine…
The general topic addressed by this dissertation is that of bilingualism, and more specifically, the topic of bilingual acquisition of speech sounds. The central question in this study is the following: does bilingualism affect children’s perceptual development of speech sounds? The term bilingual i
Carlin, Charles H.; Jennifer L. Milam; Carlin, Emily L.; Ashley Owen
E-supervision has a potential role in addressing speech-language personnel shortages in rural and difficult to staff school districts. The purposes of this article are twofold: to determine how e-supervision might support graduate speech-language pathologist (SLP) interns placed in rural, remote, and difficult to staff public school districts; and, to investigate interns’ perceptions of in-person supervision compared to e-supervision. The study used a mixed methodology approach and collected ...
Slim Ouni; Michael M. Cohen; Hope Ishak; Massaro, Dominic W
Animated agents are becoming increasingly frequent in research and applications in speech science. An important challenge is to evaluate the effectiveness of the agent in terms of the intelligibility of its visible speech. In three experiments, we extend and test the Sumby and Pollack (1954) metric to allow the comparison of an agent relative to a standard or reference, and also propose a new metric based on the fuzzy logical model of perception (FLMP) to describe the benefit provided b...
Naiphinich Kotchabhakdi; Chittin Chindaduangratn; Wichian Sittiprapaporn
The present study was intended to make electrophysiological investigations into the preattentive perception of native and non-native speech sounds. We recorded the mismatch negativity, elicited by single syllable change of both native and non-native speech-sound contrasts in tonal languages. EEGs were recorded and low-resolution brain electromagnetic tomography (LORETA) was utilized to explore the neural electrical activity. Our results suggested that the left hemisphere was predominant in th...
Tobias Weissgerber; Tobias Rader; Uwe Baumann
Objectives: Previous studies investigating speech perception in noise have typically been conducted with static masker positions. The aim of this study was to investigate the effect of spatial separation of source and masker (spatial release from masking, SRM) in a moving masker setup and to evaluate the impact of adaptive beamforming in comparison with fixed directional microphones in cochlear implant (CI) users. Design: Speech reception thresholds (SRT) were measured in S0N0 and in a mov...
Dole, Marjorie; Hoen, Michel; Meunier, Fanny
Developmental dyslexia is associated with impaired speech-in-noise perception. The goal of the present research was to further characterize this deficit in dyslexic adults. In order to specify the mechanisms and processing strategies used by adults with dyslexia during speech-in-noise perception, we explored the influence of background type,…
Dole, Marjorie; Hoen, Michel; Meunier, Fanny
Developmental dyslexia is associated with impaired speech-in-noise perception. The goal of the present research was to further characterize this deficit in dyslexic adults. In order to specify the mechanisms and processing strategies used by adults with dyslexia during speech-in-noise perception, we explored the influence of background type, presenting single target-words against backgrounds made of cocktail party sounds, modulated speech-derived noise or stationary noise. We also evaluated t...
Portnova Galina; Martynova Olga
The perception of complex auditory information such as complete speech sequences develops during human ontogeny. In order to explore age differences in the auditory perception of predictable speech sequences we compared event-related potentials (ERPs) recorded in 5- to 6-year-old children (N = 15) and adults (N = 15) in response to anticipated speech sequences as successive and reverse digital series with randomly omitted digits. The ERPs obtained from the omitted digits significantly differe...
Plant, G L
Four subjects fitted with single-channel vibrotactile aids and provided with training in their use took part in a testing programme aimed at assessing their aided and unaided lipreading performance, their ability to detect segmental and suprasegmental features of speech, and the discrimination of common environmental sounds. The results showed that the vibrotactile aid provided very useful information as to speech and non-speech stimuli with the subjects performing best on those tasks where time/intensity cues provided sufficient information to enable identification. The implications of the study are discussed and a comparison made with those results reported for subjects using cochlear implants. PMID:6897619
Munson, Benjamin; Johnson, Julie M.; Edwards, Jan
Purpose: This study examined whether experienced speech-language pathologists (SLPs) differ from inexperienced people in their perception of phonetic detail in children's speech. Method: Twenty-one experienced SLPs and 21 inexperienced listeners participated in a series of tasks in which they used a visual-analog scale (VAS) to rate children's…
Full Text Available Musicians have a more accurate temporal and tonal representation of auditory stimuli than their non-musician counterparts (Kraus & Chandrasekaran, 2010; Parbery-Clark, Skoe, & Kraus, 2009; Zendel & Alain, 2008; Musacchia, Sams, Skoe, & Kraus, 2007. Musicians who are adept at the production and perception of music are also more sensitive to key acoustic features of speech such as voice onset timing and pitch. Together, these data suggest that musical training may enhance the processing of acoustic information for speech sounds. In the current study, we sought to provide neural evidence that musicians process speech and music in a similar way. We hypothesized that for musicians, right hemisphere areas traditionally associated with music are also engaged for the processing of speech sounds. In contrast we predicted that in non-musicians processing of speech sounds would be localized to traditional left hemisphere language areas. Speech stimuli differing in voice onset time was presented using a dichotic listening paradigm. Subjects either indicated aural location for a specified speech sound or identified a specific speech sound from a directed aural location. Musical training effects and organization of acoustic features were reflected by activity in source generators of the P50. This included greater activation of right middle temporal gyrus (MTG and superior temporal gyrus (STG in musicians. The findings demonstrate recruitment of right hemisphere in musicians for discriminating speech sounds and a putative broadening of their language network. Musicians appear to have an increased sensitivity to acoustic features and enhanced selective attention to temporal features of speech that is facilitated by musical training and supported, in part, by right hemisphere homologues of established speech processing regions of the brain.
Bhat, Jyoti; Miller, Lee M; Pitt, Mark A; Shahin, Antoine J
Audiovisual (AV) speech perception is robust to temporal asynchronies between visual and auditory stimuli. We investigated the neural mechanisms that facilitate tolerance for audiovisual stimulus onset asynchrony (AVOA) with EEG. Individuals were presented with AV words that were asynchronous in onsets of voice and mouth movement and judged whether they were synchronous or not. Behaviorally, individuals tolerated (perceived as synchronous) longer AVOAs when mouth movement preceded the speech (V-A) stimuli than when the speech preceded mouth movement (A-V). Neurophysiologically, the P1-N1-P2 auditory evoked potentials (AEPs), time-locked to sound onsets and known to arise in and surrounding the primary auditory cortex (PAC), were smaller for the in-sync than the out-of-sync percepts. Spectral power of oscillatory activity in the beta band (14-30 Hz) following the AEPs was larger during the in-sync than out-of-sync perception for both A-V and V-A conditions. However, alpha power (8-14 Hz), also following AEPs, was larger for the in-sync than out-of-sync percepts only in the V-A condition. These results demonstrate that AVOA tolerance is enhanced by inhibiting low-level auditory activity (e.g., AEPs representing generators in and surrounding PAC) that code for acoustic onsets. By reducing sensitivity to acoustic onsets, visual-to-auditory onset mapping is weakened, allowing for greater AVOA tolerance. In contrast, beta and alpha results suggest the involvement of higher-level neural processes that may code for language cues (phonetic, lexical), selective attention, and binding of AV percepts, allowing for wider neural windows of temporal integration, i.e., greater AVOA tolerance. PMID:25505102
AndrewLeeBowers; DavidJenson; MeganCuellar
Activity in premotor and sensorimotor cortices is found in speech production and some perception tasks. Yet, how sensorimotor integration supports these functions is unclear due to a lack of data examining the timing of activity from these regions. Beta (~20Hz) and alpha (~10Hz) spectral power within the EEG µ rhythm are considered indices of motor and somatosensory activity, respectively. In the current study, perception conditions required discrimination (same/different) of syllables pai...
Berisha, Visar; Liss, Julie; Sandoval, Steven; Utianski, Rene; Spanias, Andreas
The current state of the art in judging pathological speech intelligibility is subjective assessment performed by trained speech pathologists (SLP). These tests, however, are inconsistent, costly and, oftentimes suffer from poor intra- and inter-judge reliability. As such, consistent, reliable, and perceptually-relevant objective evaluations of pathological speech are critical. Here, we propose a data-driven approach to this problem. We propose new cost functions for examining data from a series of experiments, whereby we ask certified SLPs to rate pathological speech along the perceptual dimensions that contribute to decreased intelligibility. We consider qualitative feedback from SLPs in the form of comparisons similar to statements "Is Speaker A's rhythm more similar to Speaker B or Speaker C?" Data of this form is common in behavioral research, but is different from the traditional data structures expected in supervised (data matrix + class labels) or unsupervised (data matrix) machine learning. The proposed method identifies relevant acoustic features that correlate with the ordinal data collected during the experiment. Using these features, we show that we are able to develop objective measures of the speech signal degradation that correlate well with SLP responses. PMID:25435817
Mayo, L H; Florentine, M; Buus, S
To determine how age of acquisition influences perception of second-language speech, the Speech Perception in Noise (SPIN) test was administered to native Mexican-Spanish-speaking listeners who learned fluent English before age 6 (early bilinguals) or after age 14 (late bilinguals) and monolingual American-English speakers (monolinguals). Results show that the levels of noise at which the speech was intelligible were significantly higher and the benefit from context was significantly greater for monolinguals and early bilinguals than for late bilinguals. These findings indicate that learning a second language at an early age is important for the acquisition of efficient high-level processing of it, at least in the presence of noise. PMID:9210123
Fowler, Jennifer R.; Eggleston, Jessica L.; Reavis, Kelly M.; McMillan, Garnett P.; Reiss, Lina A. J.
Purpose: The objective was to determine whether speech perception could be improved for bimodal listeners (those using a cochlear implant [CI] in one ear and hearing aid in the contralateral ear) by removing low-frequency information provided by the CI, thereby reducing acoustic-electric overlap. Method: Subjects were adult CI subjects with at…
Blood, Gordon W.; Boyle, Michael P.; Blood, Ingrid M.; Nalesnik, Gina R.
Bullying in school-age children is a global epidemic. School personnel play a critical role in eliminating this problem. The goals of this study were to examine speech-language pathologists' (SLPs) perceptions of bullying, endorsement of potential strategies for dealing with bullying, and associations among SLPs' responses and specific demographic…
Ofe, Erin E.; Plumb, Allison M.; Plexico, Laura W.; Haak, Nancy J.
Purpose: The purpose of the current investigation was to examine speech-language pathologists' (SLPs') knowledge and perceptions of bullying, with an emphasis on autism spectrum disorder (ASD). Method: A 46-item, web-based survey was used to address the purposes of this investigation. Participants were recruited through e-mail and electronic…
Eskelund, Kasper; MacDonald, Ewen; Andersen, Tobias
demonstrated by the Thatcher illusion in which the orientation of the eyes and mouth with respect to the face is inverted (Thatcherization). This gives the face a grotesque appearance but this is only seen when the face is upright. Thatcherization can likewise disrupt visual speech perception but only when the...
Feldstein, Stanley; Dohm, Faith-Anne; Crown, Cynthia L.
Presents a study that explores (1) whether listeners regard speakers with similar global speech rates as more competent and attractive and (2) the influence of gender on their perceptions. Explains that the judges consisted of 17 male and 28 female listeners. (CMK)
Lee, Andrew H.; Lyster, Roy
To what extent do second language (L2) learners benefit from instruction that includes corrective feedback (CF) on L2 speech perception? This article addresses this question by reporting the results of a classroom-based experimental study conducted with 32 young adult Korean learners of English. An instruction-only group and an instruction + CF…
experiment, subjects rated the signals in regard to loudness, speech clarity, noisiness and overall acceptance. Based on the results, a criterion for selecting compression parameters that yield some level-variation in the output signal, while still keeping the overall user-acceptance at a tolerable level, is...... they become audible again for the hearing impaired person. The general goal is to place all sounds within the hearing aid users’ audible range, such that speech intelligibility and listening comfort become as good as possible. Amplification strategies in hearing aids are in many cases based on...... empirical research -for example investigations of loudness perception in hearing impaired listeners. Most research has been focused on speech and sounds at medium input-levels (e.g., 60-65 dB SPL). It is well documented that for speech at conversational levels, hearing aid-users prefer the signal to be...
Knowland, Victoria C. P.; Evans, Sam; Snell, Caroline; Rosen, Stuart
Purpose: The purpose of the study was to assess the ability of children with developmental language learning impairments (LLIs) to use visual speech cues from the talking face. Method: In this cross-sectional study, 41 typically developing children (mean age: 8 years 0 months, range: 4 years 5 months to 11 years 10 months) and 27 children with…
Approaches to music and audiovisual meaning in film appear to be very different in nature and scope when considered from the point of view of experimental psychology or humanistic studies. Nevertheless, this article argues that experimental studies square with ideas of audiovisual perception and ...
Biau, Emmanuel; Torralba, Mireia; Fuentemilla, Lluis; de Diego Balaguer, Ruth; Soto-Faraco, Salvador
Speakers often accompany speech with spontaneous beat gestures in natural spoken communication. These gestures are usually aligned with lexical stress and can modulate the saliency of their affiliate words. Here we addressed the consequences of beat gestures on the neural correlates of speech perception. Previous studies have highlighted the role played by theta oscillations in temporal prediction of speech. We hypothesized that the sight of beat gestures may influence ongoing low-frequency neural oscillations around the onset of the corresponding words. Electroencephalographic (EEG) recordings were acquired while participants watched a continuous, naturally recorded discourse. The phase-locking value (PLV) at word onset was calculated from the EEG from pairs of identical words that had been pronounced with and without a concurrent beat gesture in the discourse. We observed an increase in PLV in the 5-6 Hz theta range as well as a desynchronization in the 8-10 Hz alpha band around the onset of words preceded by a beat gesture. These findings suggest that beats help tune low-frequency oscillatory activity at relevant moments during natural speech perception, providing a new insight of how speech and paralinguistic information are integrated. PMID:25595613
Young, N M; Grohne, K M; Carrasco, V N; Brown, C
This study compares the auditory perceptual skill development of 23 congenitally deaf children who received the Nucleus 22-channel cochlear implant with the SPEAK speech coding strategy, and 20 children who received the CLARION Multi-Strategy Cochlear Implant with the Continuous Interleaved Sampler (CIS) speech coding strategy. All were under 5 years old at implantation. Preimplantation, there were no significant differences between the groups in age, length of hearing aid use, or communication mode. Auditory skills were assessed at 6 months and 12 months after implantation. Postimplantation, the mean scores on all speech perception tests were higher for the Clarion group. These differences were statistically significant for the pattern perception and monosyllable subtests of the Early Speech Perception battery at 6 months, and for the Glendonald Auditory Screening Procedure at 12 months. Multiple regression analysis revealed that device type accounted for the greatest variance in performance after 12 months of implant use. We conclude that children using the CIS strategy implemented in the Clarion implant may develop better auditory perceptual skills during the first year postimplantation than children using the SPEAK strategy with the Nucleus device. PMID:10214811
Stevenson, Ryan A; Segers, Magali; Ferber, Susanne; Barense, Morgan D; Camarata, Stephen; Wallace, Mark T
A growing area of interest and relevance in the study of autism spectrum disorder (ASD) focuses on the relationship between multisensory temporal function and the behavioral, perceptual, and cognitive impairments observed in ASD. Atypical sensory processing is becoming increasingly recognized as a core component of autism, with evidence of atypical processing across a number of sensory modalities. These deviations from typical processing underscore the value of interpreting ASD within a multisensory framework. Furthermore, converging evidence illustrates that these differences in audiovisual processing may be specifically related to temporal processing. This review seeks to bridge the connection between temporal processing and audiovisual perception, and to elaborate on emerging data showing differences in audiovisual temporal function in autism. We also discuss the consequence of such changes, the specific impact on the processing of different classes of audiovisual stimuli (e.g. speech vs. nonspeech, etc.), and the presumptive brain processes and networks underlying audiovisual temporal integration. Finally, possible downstream behavioral implications, and possible remediation strategies are outlined. Autism Res 2016, 9: 720-738. © 2015 International Society for Autism Research, Wiley Periodicals, Inc. PMID:26402725
Cristia, A.; Seidl, A; Junge, C.; Soderstrom, M.; Hagoort, P.
There are increasing reports that individual variation in behavioral and neurophysiological measures of infant speech processing predicts later language outcomes, and specifically concurrent or subsequent vocabulary size. If such findings are held up under scrutiny, they could both illuminate theoretical models of language development and contribute to the prediction of communicative disorders. A qualitative, systematic review of this emergent literature illustrated the variety of approaches ...
Davis, Matthew H.; Coleman, Martin R.; Absalom, Anthony R.; Rodd, Jennifer M.; Johnsrude, Ingrid S.; Matta, Basil F.; Owen, Adrian M.; Menon, David K.
We used functional MRI and the anesthetic agent propofol to assess the relationship among neural responses to speech, successful comprehension, and conscious awareness. Volunteers were scanned while listening to sentences containing ambiguous words, matched sentences without ambiguous words, and signal-correlated noise (SCN). During three scanning sessions, participants were nonsedated (awake), lightly sedated (a slowed response to conversation), and deeply sedated (no conversational response...
Miller, James D; Watson, Charles S; Dubno, Judy R; Leek, Marjorie R
Following an overview of theoretical issues in speech-perception training and of previous efforts to enhance hearing aid use through training, a multisite study, designed to evaluate the efficacy of two types of computerized speech-perception training for adults who use hearing aids, is described. One training method focuses on the identification of 109 syllable constituents (45 onsets, 28 nuclei, and 36 codas) in quiet and in noise, and on the perception of words in sentences presented in various levels of noise. In a second type of training, participants listen to 6- to 7-minute narratives in noise and are asked several questions about each narrative. Two groups of listeners are trained, each using one of these types of training, performed in a laboratory setting. The training for both groups is preceded and followed by a series of speech-perception tests. Subjects listen in a sound field while wearing their hearing aids at their usual settings. The training continues over 15 to 20 visits, with subjects completing at least 30 hours of focused training with one of the two methods. The two types of training are described in detail, together with a summary of other perceptual and cognitive measures obtained from all participants. PMID:27587914
Jianwu Dang; Masato Akagi; Kiyoshi Honda
Realization of an intelligent human-machine interface requires us to investigate human mechanisms and learn from them. This study focuses on communication between speech production and perception within human brain and realizing it in an artificial system. A physiological research study based on electromyographic signals (Honda, 1996) suggested that speech communication in human brain might be based on a topological mapping between speech production and perception, according to an analogous topology between motor and sensory representations. Following this hypothesis, this study first investigated the topologies of the vowel system across the motor, kinematic, and acoustic spaces by means of a model simulation, and then examined the linkage between vowel production and perception in terms of a transformed auditory feedback (TAF) experiment. The model simulation indicated that there exists an invariant mapping from muscle activations (motor space) to articulations (kinematic space) via a coordinate consisting of force-dependent equilibrium positions, and the mapping from the motor space to kinematic space is unique. The motor-kinematic-acoustic deduction in the model simulation showed that the topologies were compatible from one space to another. In the TAF experiment, vowel production exhibited a compensatory response for a perturbation in the feedback sound. This implied that vowel production is controlled in reference to perception monitoring.
Ten Oever, Sanne; Sack, Alexander T
The role of oscillatory phase for perceptual and cognitive processes is being increasingly acknowledged. To date, little is known about the direct role of phase in categorical perception. Here we show in two separate experiments that the identification of ambiguous syllables that can either be perceived as /da/ or /ga/ is biased by the underlying oscillatory phase as measured with EEG and sensory entrainment to rhythmic stimuli. The measured phase difference in which perception is biased toward /da/ or /ga/ exactly matched the different temporal onset delays in natural audiovisual speech between mouth movements and speech sounds, which last 80 ms longer for /ga/ than for /da/. These results indicate the functional relationship between prestimulus phase and syllable identification, and signify that the origin of this phase relationship could lie in exposure and subsequent learning of unique audiovisual temporal onset differences. PMID:26668393
Whistled speech is a little studied local use of language shaped by several cultures of the world either for distant dialogues or for rendering traditional songs. This practice consists of an emulation of the voice thanks to a simple modulated pitch. It is therefore the result of a transformation of the vocal signal that implies simplifications in the frequency domain. The whistlers adapt their productions to the way each language combines the qualities of height perceived simultaneously by the human ear in the complex frequency spectrum of the spoken or sung voice (pitch, timbre). As a consequence, this practice underlines key acoustic cues for the intelligibility of the concerned languages. The present study provides an analysis of the acoustic and phonetic features selected by whistled speech in several traditions either in purely oral whistles (Spanish, Turkish, Mazatec) or in whistles produced with an instrument like a leaf (Akha, Hmong). It underlines the convergences with the strategies of the singing ...
Clausen, Thomas Wolff
This paper includes a theoretical understanding of the affects of rhetoric, metaphors and dehumanisations in political speeches. This theoretical framework is used to analyse specific chosen speeches of Barack Obama, David Cameron, Donald Trump and Hillary Clinton. The analysis is done in order to get a comprehension of how rhetoric, metaphors and dehumanisations, in the analysed speeches, are influencing the image and perception of Muslims and IS. An understanding of what affect the modern m...
Given that Chinese language learners are greatly influenced by their mother-tongue, which is a tone language rather than an intonation language, learning and coping with authentic English speech seems more difficult than for learners of other languages. The focus of the current research is, on the basis of analysis of the nature of spoken English and spoken Chinese, to help Chinese learners derive benefit from ICT technologies developed by the Dublin Institute of Technology (DIT). The thesis ...
International audience Whistled speech is a little studied local use of language shaped by several cultures of the world either for distant dialogues or for rendering traditional songs. This practice consists of an emulation of the voice thanks to a simple modulated pitch. It is therefore the result of a transformation of the vocal signal that implies simplifications in the frequency domain. The whistlers adapt their productions to the way each language combines the qualities of height per...
José Antonio Palao Errando
En la llamada «sociedad de la información» los estudios sobre cine se han visto diluidos en el abordaje pragmático y tecnológico del discurso audiovisual, así como la propia fruición del cine se ha visto atrapada en la red del DVD y del hipertexto. El propio cine reacciona ante ello a través de estructuras narrativas complejas que lo alejan del discurso audiovisual estándar. La función de los estudios sobre cine y de su enseñanza universitaria debe ser la reintroducción del sujeto rechazado d...
Jenson, David; Harkrider, Ashley W; Thornton, David; Bowers, Andrew L; Saltuklaroglu, Tim
Sensorimotor integration (SMI) across the dorsal stream enables online monitoring of speech. Jenson et al. (2014) used independent component analysis (ICA) and event related spectral perturbation (ERSP) analysis of electroencephalography (EEG) data to describe anterior sensorimotor (e.g., premotor cortex, PMC) activity during speech perception and production. The purpose of the current study was to identify and temporally map neural activity from posterior (i.e., auditory) regions of the dorsal stream in the same tasks. Perception tasks required "active" discrimination of syllable pairs (/ba/ and /da/) in quiet and noisy conditions. Production conditions required overt production of syllable pairs and nouns. ICA performed on concatenated raw 68 channel EEG data from all tasks identified bilateral "auditory" alpha (α) components in 15 of 29 participants localized to pSTG (left) and pMTG (right). ERSP analyses were performed to reveal fluctuations in the spectral power of the α rhythm clusters across time. Production conditions were characterized by significant α event related synchronization (ERS; pFDR < 0.05) concurrent with EMG activity from speech production, consistent with speech-induced auditory inhibition. Discrimination conditions were also characterized by α ERS following stimulus offset. Auditory α ERS in all conditions temporally aligned with PMC activity reported in Jenson et al. (2014). These findings are indicative of speech-induced suppression of auditory regions, possibly via efference copy. The presence of the same pattern following stimulus offset in discrimination conditions suggests that sensorimotor contributions following speech perception reflect covert replay, and that covert replay provides one source of the motor activity previously observed in some speech perception tasks. To our knowledge, this is the first time that inhibition of auditory regions by speech has been observed in real-time with the ICA/ERSP technique. PMID:26500519
David E Jenson; Bowers, Andrew L.
Sensorimotor integration within the dorsal stream enables online monitoring of speech. Jenson et al. (2014) used independent component analysis (ICA) and event related spectral perturbation (ERSP) analysis of EEG data to describe anterior sensorimotor (e.g., premotor cortex; PMC) activity during speech perception and production. The purpose of the current study was to identify and temporally map neural activity from posterior (i.e., auditory) regions of the dorsal stream in the same tasks. ...
Jenson, David; Harkrider, Ashley W.; Thornton, David; Bowers, Andrew L.; Saltuklaroglu, Tim
Sensorimotor integration (SMI) across the dorsal stream enables online monitoring of speech. Jenson et al. (2014) used independent component analysis (ICA) and event related spectral perturbation (ERSP) analysis of electroencephalography (EEG) data to describe anterior sensorimotor (e.g., premotor cortex, PMC) activity during speech perception and production. The purpose of the current study was to identify and temporally map neural activity from posterior (i.e., auditory) regions of the dors...
David E Jenson
Full Text Available Sensorimotor integration within the dorsal stream enables online monitoring of speech. Jenson et al. (2014 used independent component analysis (ICA and event related spectral perturbation (ERSP analysis of EEG data to describe anterior sensorimotor (e.g., premotor cortex; PMC activity during speech perception and production. The purpose of the current study was to identify and temporally map neural activity from posterior (i.e., auditory regions of the dorsal stream in the same tasks. Perception tasks required ‘active’ discrimination of syllable pairs (/ba/ and /da/ in quiet and noisy conditions. Production conditions required overt production of syllable pairs and nouns. ICA performed on concatenated raw 68 channel EEG data from all tasks identified bilateral ‘auditory’ alpha (α components in 15 of 29 participants localized to pSTG (left and pMTG (right. ERSP analyses were performed to reveal fluctuations in the spectral power of the α rhythm clusters across time. Production conditions were characterized by significant α event related synchronization (ERS; pFDR < .05 concurrent with EMG activity from speech production, consistent with speech-induced auditory inhibition. Discrimination conditions were also characterized by α ERS following stimulus offset. Auditory α ERS in all conditions also temporally aligned with PMC activity reported in Jenson et al. (2014. These findings are indicative of speech-induced suppression of auditory regions, possibly via efference copy. The presence of the same pattern following stimulus offset in discrimination conditions suggests that sensorimotor contributions following speech perception reflect covert replay, and that covert replay provides one source of the motor activity previously observed in some speech perception tasks. To our knowledge, this is the first time that inhibition of auditory regions by speech has been observed in real-time with the ICA/ERSP technique.
LaCroix, Arianna N.; Alvaro F. Diaz; Rogalsky, Corianne
The relationship between the neurobiology of speech and music has been investigated for more than a century. There remains no widespread agreement regarding how (or to what extent) music perception utilizes the neural circuitry that is engaged in speech processing, particularly at the cortical level. Prominent models such as Patel's Shared Syntactic Integration Resource Hypothesis (SSIRH) and Koelsch's neurocognitive model of music perception suggest a high degree of overlap, particularly in ...
Foote, Jennifer A; Trofimovich, Pavel
Second language speech learning is predicated on learners' ability to notice differences between their own language output and that of their interlocutors. Because many learners interact primarily with other second language users, it is crucial to understand which dimensions underlie the perception of second language speech by learners, compared to native speakers. For this study, 15 non-native and 10 native English speakers rated 30-s language audio-recordings from controlled reading and interview tasks for dissimilarity, using all pairwise combinations of recordings. PROXSCAL multidimensional scaling analyses revealed fluency and aspects of speakers' pronunciation as components underlying listener judgments but showed little agreement across listeners. Results contribute to an understanding of why second language speech learning is difficult and provide implications for language training. PMID:27166328
Physiology Teacher, 1976
Lists and reviews recent audiovisual materials in areas of medical, dental, nursing and allied health, and veterinary medicine; undergraduate, and high school studies. Each is classified as to level, type of instruction, usefulness, and source of availability. Topics include respiration, renal physiology, muscle mechanics, anatomy, evolution,…
Rébillat, Marc; Corteel, Etienne; Katz, Brian,; Boutillon, Xavier
International audience Virtual reality aims at providing users with audio-visual worlds where they will behave and learn as if they were in the real world. In this context, specific acoustic transducers are needed to fulfill simultaneous spatial requirements on visual and audio rendering in order to make them coherent. Large multi-actuator panels (LaMAPs) allow for the combined construction of a projection screen and loudspeaker array, and thus allows for the coherent creation of an audio ...
Feijoo, Sergio; Fernandez, Santiago; Alvarez, Jose Manuel
The combined effects of excessive ambient noise and reverberation in classrooms interfere with speech recognition and tend to degrade the learning process of young children. This paper reports a detailed analysis of a speech recognition test carried out with two different children populations of ages 8-9 and 10-11. Unlike English, Spanish has few minimal pairs to be used for phoneme recognition in a closed set manner. The test consisted in a series of two-syllable nonsense words formed by the combination of all possible syllables in Spanish. The test was administered to the children as a dictation task in which they had to write down the words spoken by their female teacher. The test was administered in two blocks on different days, and later repeated to analyze its consistency. The rationale for this procedure was (a) the test should reproduce normal academic situations, (b) all phonological and lexical context effects should be avoided, (c) errors in both words and phonemes should be scored to unveil any possible acoustic base for them. Although word recognition scores were similar among age groups and repetitions, phoneme errors showed high variability questioning the validity of such a test for classroom assessment.
Manis, Franklin R.; And Others
Administered phonological awareness and phoneme identification tasks to dyslexic children and chronological age (CA) and reading-level (RL) comparison groups. Found no real differences in categorical perception between dyslexic and RL groups; however, more dyslexics (7 of 25) had abnormal identification functions. Results suggest that some…
Guediche, Sara; Fiez, Julie A; Holt, Lori L
When listeners encounter speech under adverse listening conditions, adaptive adjustments in perception can improve comprehension over time. In some cases, these adaptive changes require the presence of external information that disambiguates the distorted speech signals, whereas in other cases mere exposure is sufficient. Both external (e.g., written feedback) and internal (e.g., prior word knowledge) sources of information can be used to generate predictions about the correct mapping of a distorted speech signal. We hypothesize that these predictions provide a basis for determining the discrepancy between the expected and actual speech signal that can be used to guide adaptive changes in perception. This study provides the first empirical investigation that manipulates external and internal factors through (a) the availability of explicit external disambiguating information via the presence or absence of postresponse orthographic information paired with a repetition of the degraded stimulus, and (b) the accuracy of internally generated predictions; an acoustic distortion is introduced either abruptly or incrementally. The results demonstrate that the impact of external information on adaptive plasticity is contingent upon whether the intelligibility of the stimuli permits accurate internally generated predictions during exposure. External information sources enhance adaptive plasticity only when input signals are severely degraded and cannot reliably access internal predictions. This is consistent with a computational framework for adaptive plasticity in which error-driven supervised learning relies on the ability to compute sensory prediction error signals from both internal and external sources of information. (PsycINFO Database Record PMID:26854531
A part of becoming a mature perceiver involves learning what signal properties provide relevant information about objects and events in the environment. Regarding speech perception, evidence supports the position that allocation of attention to various signal properties changes as children gain experience with their native language, and so learn what information is relevant to recognizing phonetic structure in that language. However, one weakness in that work has been that data have largely c...
Full Text Available In obstetrics postoperative cognitive dysfunctions may take place after caesarean section and vaginal delivery with poor results both for mother and child. The goal was to study influence of anesthesia techniques following caesarian section on memory, perception and speech. Having agreed with local ethics committee and obtained informed consent depending on anesthesia method, pregnant women were divided into 2 groups: 1st group (n=31 had spinal anesthesia, 2nd group (n=34 – total intravenous anesthesia. Spinal anesthesia: 1.8-2.2 mLs of hyperbaric 0.5% bupivacaine. ТIVА: Thiopental sodium (4 mgs kg-1, succinylcholine (1-1.5 mgs kg-1. Phentanyl (10-5-3 µgs kg-1 hour and Diazepam (10 mgs were used after newborn extraction. We used Luria’s test for memory assessment, perception was studied by test “recognition of time”. Speech was studied by test "name of fingers". Control points: 1 - before the surgery, 2 - in 24h after the caesarian section, 3 - on day 3 after surgery, 4 - at discharge from hospital (5-7th day. The study showed that initially decreased memory level in expectant mothers regressed along with the time after caesarean section. Memory is restored in 3 days after surgery regardless of anesthesia techniques. In spinal anesthesia on 5-7th postoperative day memory level exceeds that of used in total intravenous anesthesia. The perception and speech do not depend on the term of postoperative period. Anesthesia technique does not influence perception and speech restoration after caesarean sections.
Maré, M J; Dreschler, W A; Verschuure, H
Speech perception was tested through a broad-band syllabic compressor with four different static input-output configurations. All other parameters of the compressor were held constant. The compressor was implemented digitally and incorporated a delay to reduce overshoot. We studied four different input-output configurations, including a linear reference condition. Normal-hearing and hearing-impaired subjects participated in the experiments testing perception of meaningful sentences as well as nonsense CVCs in carrier phrases. The speech materials were presented in quiet and in noise. The results from the CVCs were analyzed quantitatively in terms of scores and qualitatively in terms of phoneme confusions. Differences in speech perception due to the different input-output configurations were small. The input-output configuration with the highest amplification of low amplitude sounds yielded the best results. Detailed analysis of the results included a correlational analysis with a number of auditory functions characterizing the ears tested. The pure-tone audiogram provided parameters of auditory sensitivity: average audiometric loss and audiometric slope. Psychophysical tests provided parameters of temporal resolution and frequency selectivity: the temporal resolution factor, temporal gap detection, and auditory filter shape. The correlational analysis showed that the subjects with better temporal acuity obtained better results. PMID:1608260
Bidelman, Gavin M; Howell, Megan
Previous studies suggest that at poorer signal-to-noise ratios (SNRs), auditory cortical event-related potentials are weakened, prolonged, and show a shift in the functional lateralization of cerebral processing from left to right hemisphere. Increased right hemisphere involvement during speech-in-noise (SIN) processing may reflect the recruitment of additional brain resources to aid speech recognition or alternatively, the progressive loss of involvement from left linguistic brain areas as speech becomes more impoverished (i.e., nonspeech-like). To better elucidate the brain basis of SIN perception, we recorded neuroelectric activity in normal hearing listeners to speech sounds presented at various SNRs. Behaviorally, listeners obtained superior SIN performance for speech presented to the right compared to the left ear (i.e., right ear advantage). Source analysis of neural data assessed the relative contribution of region-specific neural generators (linguistic and auditory brain areas) to SIN processing. We found that left inferior frontal brain areas (e.g., Broca's areas) partially disengage at poorer SNRs but responses do not right lateralize with increasing noise. In contrast, auditory sources showed more resilience to noise in left compared to right primary auditory cortex but also a progressive shift in dominance from left to right hemisphere at lower SNRs. Region- and ear-specific correlations revealed that listeners' right ear SIN advantage was predicted by source activity emitted from inferior frontal gyrus (but not primary auditory cortex). Our findings demonstrate changes in the functional asymmetry of cortical speech processing during adverse acoustic conditions and suggest that "cocktail party" listening skills depend on the quality of speech representations in the left cerebral hemisphere rather than compensatory recruitment of right hemisphere mechanisms. PMID:26386346
Arsenault, Jessica S; Buchsbaum, Bradley R
A fundamental goal of the human auditory system is to map complex acoustic signals onto stable internal representations of the basic sound patterns of speech. Phonemes and the distinctive features that they comprise constitute the basic building blocks from which higher-level linguistic representations, such as words and sentences, are formed. Although the neural structures underlying phonemic representations have been well studied, there is considerable debate regarding frontal-motor cortical contributions to speech as well as the extent of lateralization of phonological representations within auditory cortex. Here we used functional magnetic resonance imaging (fMRI) and multivoxel pattern analysis to investigate the distributed patterns of activation that are associated with the categorical and perceptual similarity structure of 16 consonant exemplars in the English language used in Miller and Nicely's (1955) classic study of acoustic confusability. Participants performed an incidental task while listening to phonemes in the MRI scanner. Neural activity in bilateral anterior superior temporal gyrus and supratemporal plane was correlated with the first two components derived from a multidimensional scaling analysis of a behaviorally derived confusability matrix. We further showed that neural representations corresponding to the categorical features of voicing, manner of articulation, and place of articulation were widely distributed throughout bilateral primary, secondary, and association areas of the superior temporal cortex, but not motor cortex. Although classification of phonological features was generally bilateral, we found that multivariate pattern information was moderately stronger in the left compared with the right hemisphere for place but not for voicing or manner of articulation. PMID:25589757
Foti, Dan; Roberts, Felicia
The neural circuitry for speech perception is well-characterized, yet the temporal dynamics therein are largely unknown. This timing information is critical in that spoken language almost always occurs in the context of joint speech (i.e., conversations) where effective communication requires the precise timing of speaker turn-taking-a core aspect of prosody. Here, we used event-related potentials to characterize neural activity elicited by conversation stimuli within a large, unselected adult sample (N=115). We focused on two stages of speech perception: inter-speaker gaps and speaker responses. We found activation in two known speech perception networks, with functional and neuroanatomical specificity: silence during inter-speaker gaps primarily activated the posterior pathway involving the supramarginal gyrus and premotor cortex, whereas hearing speaker responses primarily activated the anterior pathway involving the superior temporal gyrus. These data provide the first direct evidence that the posterior pathway is uniquely involved in monitoring speaker turn-taking. PMID:27177112
Daniel Callan; A. Callan
The finding that premotor areas are active not only during action production but also during observation of action, Mirror Neuron System, has led to considerable conjecture regarding the neurophysiological mechanisms underlying a variety of abilities ranging from perception to social cognition. Despite these findings, the relationship of this activity to perceptual performance has not been demonstrated. Without such evidence, it may be argued that this activity does not reflect neural process...
Full Text Available Previous studies investigating speech perception in noise have typically been conducted with static masker positions. The aim of this study was to investigate the effect of spatial separation of source and masker (spatial release from masking, SRM in a moving masker setup and to evaluate the impact of adaptive beamforming in comparison with fixed directional microphones in cochlear implant (CI users.Speech reception thresholds (SRT were measured in S0N0 and in a moving masker setup (S0Nmove in 12 normal hearing participants and 14 CI users (7 subjects bilateral, 7 bimodal with a hearing aid in the contralateral ear. Speech processor settings were a moderately directional microphone, a fixed beamformer, or an adaptive beamformer. The moving noise source was generated by means of wave field synthesis and was smoothly moved in a shape of a half-circle from one ear to the contralateral ear. Noise was presented in either of two conditions: continuous or modulated.SRTs in the S0Nmove setup were significantly improved compared to the S0N0 setup for both the normal hearing control group and the bilateral group in continuous noise, and for the control group in modulated noise. There was no effect of subject group. A significant effect of directional sensitivity was found in the S0Nmove setup. In the bilateral group, the adaptive beamformer achieved lower SRTs than the fixed beamformer setting. Adaptive beamforming improved SRT in both CI user groups substantially by about 3 dB (bimodal group and 8 dB (bilateral group depending on masker type.CI users showed SRM that was comparable to normal hearing subjects. In listening situations of everyday life with spatial separation of source and masker, directional microphones significantly improved speech perception with individual improvements of up to 15 dB SNR. Users of bilateral speech processors with both directional microphones obtained the highest benefit.
Liu, Ming; Xu, Xun; Huang, Thomas S.
Combining different modalities for pattern recognition task is a very promising field. Basically, human always fuse information from different modalities to recognize object and perform inference, etc. Audio-Visual gender recognition is one of the most common task in human social communication. Human can identify the gender by facial appearance, by speech and also by body gait. Indeed, human gender recognition is a multi-modal data acquisition and processing procedure. However, computational multimodal gender recognition has not been extensively investigated in the literature. In this paper, speech and facial image are fused to perform a mutli-modal gender recognition for exploring the improvement of combining different modalities.
Full Text Available The present study attempts to investigate Indonesian EFL teachersâ€™ and native English speakersâ€™ perceptions of mispronunciations of English sounds by Indonesian EFL learners. For this purpose, a paper-form questionnaire consisting of 32 target mispronunciations was distributed to Indonesian secondary school teachers of English and also to native English speakers. An analysis of the respondentsâ€™ perceptions has discovered that 14 out of the 32 target mispronunciations are pedagogically significant in pronunciation instruction. A further analysis of the reasons for these major mispronunciations has reconfirmed the prevalence of interference of learnersâ€™ native language in their English pronunciation as a major cause of mispronunciations. It has also revealed Indonesian EFL teachersâ€™ tendency to overestimate the seriousness of their learnersâ€™ pronunciations. Based on these findings, the study makes suggestions for better English pronunciation teaching in Indonesia or other EFL countries.
Full Text Available The relationship between the neurobiology of speech and music has been investigated for more than a century. There remains no widespread agreement regarding how (or to what extent music perception utilizes the neural circuitry that is engaged in speech processing, particularly at the cortical level. Prominent models such as Patel’s Shared Syntactic Integration Resource Hypothesis (SSIRH and Koelsch’s neurocognitive model of music perception suggest a high degree of overlap, particularly in the frontal lobe, but also perhaps more distinct representations in the temporal lobe with hemispheric asymmetries. The present meta-analysis study used activation likelihood estimate analyses to identify the brain regions consistently activated for music as compared to speech across the functional neuroimaging (fMRI and PET literature. Eighty music and 91 speech neuroimaging studies of healthy adult control subjects were analyzed. Peak activations reported in the music and speech studies were divided into four paradigm categories: passive listening, discrimination tasks, error/anomaly detection tasks and memory-related tasks. We then compared activation likelihood estimates within each category for music versus speech, and each music condition with passive listening. We found that listening to music and to speech preferentially activate distinct temporo-parietal bilateral cortical networks. We also found music and speech to have shared resources in the left pars opercularis but speech-specific resources in the left pars triangularis. The extent to which music recruited speech-activated frontal resources was modulated by task. While there are certainly limitations to meta-analysis techniques particularly regarding sensitivity, this work suggests that the extent of shared resources between speech and music may be task-dependent and highlights the need to consider how task effects may be affecting conclusions regarding the neurobiology of speech and music.
Trude, Alison M; Duff, Melissa C; Brown-Schmidt, Sarah
A hallmark of human speech perception is the ability to comprehend speech quickly and effortlessly despite enormous variability across talkers. However, current theories of speech perception do not make specific claims about the memory mechanisms involved in this process. To examine whether declarative memory is necessary for talker-specific learning, we tested the ability of amnesic patients with severe declarative memory deficits to learn and distinguish the accents of two unfamiliar talkers by monitoring their eye-gaze as they followed spoken instructions. Analyses of the time-course of eye fixations showed that amnesic patients rapidly learned to distinguish these accents and tailored perceptual processes to the voice of each talker. These results demonstrate that declarative memory is not necessary for this ability and points to the involvement of non-declarative memory mechanisms. These results are consistent with findings that other social and accommodative behaviors are preserved in amnesia and contribute to our understanding of the interactions of multiple memory systems in the use and understanding of spoken language. PMID:24657480
Li, Xing; Nie, Kaibao; Imennov, Nikita S; Won, Jong Ho; Drennan, Ward R; Rubinstein, Jay T; Atlas, Les E
Harmonic and temporal fine structure (TFS) information are important cues for speech perception in noise and music perception. However, due to the inherently coarse spectral and temporal resolution in electric hearing, the question of how to deliver harmonic and TFS information to cochlear implant (CI) users remains unresolved. A harmonic-single-sideband-encoder [(HSSE); Nie et al. (2008). Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing; Lie et al., (2010). Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing] strategy has been proposed that explicitly tracks the harmonics in speech and transforms them into modulators conveying both amplitude modulation and fundamental frequency information. For unvoiced speech, HSSE transforms the TFS into a slowly varying yet still noise-like signal. To investigate its potential, four- and eight-channel vocoder simulations of HSSE and the continuous-interleaved-sampling (CIS) strategy were implemented, respectively. Using these vocoders, five normal-hearing subjects' speech recognition performance was evaluated under different masking conditions; another five normal-hearing subjects' Mandarin tone identification performance was also evaluated. Additionally, the neural discharge patterns evoked by HSSE- and CIS-encoded Mandarin tone stimuli were simulated using an auditory nerve model. All subjects scored significantly higher with HSSE than with CIS vocoders. The modeling analysis demonstrated that HSSE can convey temporal pitch cues better than CIS. Overall, the results suggest that HSSE is a promising strategy to enhance speech perception with CIs. PMID:23145619
Rufener, Katharina S; Zaehle, Tino; Oechslin, Mathias S; Meyer, Martin
The present study investigated the functional relevance of gamma oscillations for the processing of rapidly changing acoustic features in speech signals. For this purpose we analyzed repetition-induced perceptual learning effects in 18 healthy adult participants. The participants received either 6Hz or 40Hz tACS over the bilateral auditory cortex, while repeatedly performing a phoneme categorization task. In result, we found that 40Hz tACS led to a specific alteration in repetition-induced perceptual learning. While participants in the non-stimulated control group as well as those in the experimental group receiving 6Hz tACS considerably improved their perceptual performance, the application of 40Hz tACS selectively attenuated the repetition-induced improvement in phoneme categorization abilities. Our data provide causal evidence for a functional relevance of gamma oscillations during the perceptual learning of acoustic speech features. Moreover, we demonstrate that even less than twenty minutes of alternating current stimulation below the individual perceptual threshold is sufficient to affect speech perception. This finding is relevant in that this novel approach might have implications with respect to impaired speech processing in dyslexics and older adults. PMID:26779822
In this dissertation I present a model that captures categorical effects in both first language (L1) and second language (L2) speech perception. In L1 perception, categorical effects range between extremely strong for consonants to nearly continuous perception of vowels. I treat the problem of speech perception as a statistical inference problem and by quantifying categoricity I obtain a unified model of both strong and weak categorical effects. In this optimal inference mechanism, the listener uses their knowledge of categories and the acoustics of the signal to infer the intended productions of the speaker. The model splits up speech variability into meaningful category variance and perceptual noise variance. The ratio of these two variances, which I call Tau, directly correlates with the degree of categorical effects for a given phoneme or continuum. By fitting the model to behavioral data from different phonemes, I show how a single parametric quantitative variation can lead to the different degrees of categorical effects seen in perception experiments with different phonemes. In L2 perception, L1 categories have been shown to exert an effect on how L2 sounds are identified and how well the listener is able to discriminate them. Various models have been developed to relate the state of L1 categories with both the initial and eventual ability to process the L2. These models largely lacked a formalized metric to measure perceptual distance, a means of making a-priori predictions of behavior for a new contrast, and a way of describing non-discrete gradient effects. In the second part of my dissertation, I apply the same computational model that I used to unify L1 categorical effects to examining L2 perception. I show that we can use the model to make the same type of predictions as other SLA models, but also provide a quantitative framework while formalizing all measures of similarity and bias. Further, I show how using this model to consider L2 learners at
The perception of isochrony and phonetic synchronisation in dubbing. An introduction to how Spanish cinema-goers perceive French and English dubbed films in terms of the audio-visual matching experience
Iturregui Gallardo, Gonzalo
The McGurk-MacDonald effect explains the perception of speech as a duality separately perceived by the cognitive system. Dubbing combines two stimuli of different linguistic origin. The study is an analysis of the perception of the stimuli in speech (auditory and visual) and the dyschronies in the matching in dubbing.English and French scenes dubbed into Spanish were selected. The experiment reveals that Spanish viewers develop a great acceptance to dyschronies in dubbing. Furthermore, subjec...
Full Text Available Animated agents are becoming increasingly frequent in research and applications in speech science. An important challenge is to evaluate the effectiveness of the agent in terms of the intelligibility of its visible speech. In three experiments, we extend and test the Sumby and Pollack (1954 metric to allow the comparison of an agent relative to a standard or reference, and also propose a new metric based on the fuzzy logical model of perception (FLMP to describe the benefit provided by a synthetic animated face relative to the benefit provided by a natural face. A valid metric would allow direct comparisons accross different experiments and would give measures of the benfit of a synthetic animated face relative to a natural face (or indeed any two conditions and how this benefit varies as a function of the type of synthetic face, the test items (e.g., syllables versus sentences, different individuals, and applications.
Full Text Available The peculiarities of lateralisation of perception during emotional speech intonation over the course of perceptual learning in adult listeners were studied. Subjects were required to recognise and identify test stimuli in a changing acoustical environment; either a “white noise” background or without noise. The sample consisted of 38 adults (23 females and 15 males with a mean age of 21.1 ± 0.4 years. The reaction time (RT and accuracy of recognition (AR for each subject were recorded in two sequential sessions of trials, after which a generalising index was calculated: relative recognition efficiency (RRE: RRE=AR/RT. An analysis of variance revealed that session sequence was highly significant for RRE. Lateralisation during emotional speech tone recognition over the course of perceptual learning in adults was found to depend on the valence of the emotional intonation and the acoustical environment. These findings confirm the dynamic behaviour of functional asymmetry.
Full Text Available Behavioral and neuroimaging studies have demonstrated that brain regions involved with speech production also support speech perception, especially under degraded conditions. The premotor cortex has been shown to be active during both observation and execution of action (‘Mirror System’ properties, and may facilitate speech perception by mapping unimodal and multimodal sensory features onto articulatory speech gestures. For this functional magnetic resonance imaging (fMRI study, participants identified vowels produced by a speaker in audio-visual (saw the speaker’s articulating face and heard her voice, visual only (only saw the speaker’s articulating face, and audio only (only heard the speaker’s voice conditions with varying audio signal-to-noise ratios in order to determine the regions of the premotor cortex involved with multisensory and modality specific processing of visual speech gestures. The task was designed so that identification could be made with a high level of accuracy from visual only stimuli to control for task difficulty and differences in intelligibility. The results of the fMRI analysis for visual only and audio-visual conditions showed overlapping activity in inferior frontal gyrus and premotor cortex. The left ventral inferior premotor cortex showed properties of multimodal (audio-visual enhancement with a degraded auditory signal. The left inferior parietal lobule and right cerebellum also showed these properties. The left ventral superior and dorsal premotor cortex did not show this multisensory enhancement effect, but there was greater activity for the visual only over audio-visual conditions in these areas. The results suggest that the inferior regions of the ventral premotor cortex are involved with integrating multisensory information, whereas, more superior and dorsal regions of the premotor cortex are involved with mapping unimodal (in this case visual sensory features of the speech signal with
Massaro, Dominic W
I review 2 seminal research reports published in this journal during its second decade more than a century ago. Given psychology's subdisciplines, they would not normally be reviewed together because one involves reading and the other speech perception. The small amount of interaction between these domains might have limited research and theoretical progress. In fact, the 2 early research reports revealed common processes involved in these 2 forms of language processing. Their illustration of the role of Wundt's apperceptive process in reading and speech perception anticipated descriptions of contemporary theories of pattern recognition, such as the fuzzy logical model of perception. Based on the commonalities between reading and listening, one can question why they have been viewed so differently. It is commonly believed that learning to read requires formal instruction and schooling, whereas spoken language is acquired from birth onward through natural interactions with people who talk. Most researchers and educators believe that spoken language is acquired naturally from birth onward and even prenatally. Learning to read, on the other hand, is not possible until the child has acquired spoken language, reaches school age, and receives formal instruction. If an appropriate form of written text is made available early in a child's life, however, the current hypothesis is that reading will also be learned inductively and emerge naturally, with no significant negative consequences. If this proposal is true, it should soon be possible to create an interactive system, Technology Assisted Reading Acquisition, to allow children to acquire literacy naturally. PMID:22953690
Gagné, Jean-Pierre; Charest, Monique; Le Monday, K; Desbiens, C
A research program was undertaken to evaluate the efficacy of an audiovisual-FM system as a speechreading aid. The present study investigated the effects of the distance between the talker and the speechreader on a visual-speech perception task. Sentences were recorded simultaneously with a conventional Hi8 mm video camera, and with the microcamera of an audiovisual-FM system. The recordings were obtained from two talkers at three different distances: 1.83 m, 3.66 m, and 7.32 m. Sixteen subjects completed a visual-keyword recognition task. The main results of the investigation were as follows: For the recordings obtained with the conventional video camera, there was a significant decrease in speechreading performance as the distance between the talker and the camera increased. For the recordings obtained with the microcamera of the audiovisual-FM system, there were no differences in speechreading as a function of the test distances. The findings of the investigation confirm that in a classroom setting the use of an audiovisual-FM system may constitute an effective way of overcoming the deleterious effects of distance on speechreading performance. PMID:16717020
Full Text Available Brain-computer interfaces (BCIs are systems that use real-time analysis of neuroimaging data to determine the mental state of their user for purposes such as providing neurofeedback. Here, we investigate the feasibility of a BCI based on speech perception. Multivariate pattern classification methods were applied to single-trial EEG data collected during speech perception by native and non-native speakers. Two principal questions were asked: 1 Can differences in the perceived categories of pairs of phonemes be decoded at the single-trial level? 2 Can these same categorical differences be decoded across participants, within or between native-language groups? Results indicated that classification performance progressively increased with respect to the categorical status (within, boundary or across of the stimulus contrast, and was also influenced by the native language of individual participants. Classifier performance showed strong relationships with traditional event-related potential measures and behavioral responses. The results of the cross-participant analysis indicated an overall increase in average classifier performance when trained on data from all participants (native and non-native. A second cross-participant classifier trained only on data from native speakers led to an overall improvement in performance for native speakers, but a reduction in performance for non-native speakers. We also found that the native language of a given participant could be decoded on the basis of EEG data with accuracy above 80%. These results indicate that electrophysiological responses underlying speech perception can be decoded at the single-trial level, and that decoding performance systematically reflects graded changes in the responses related to the phonological status of the stimuli. This approach could be used in extensions of the BCI paradigm to support perceptual learning during second language acquisition.
Heather Raye Dial
When the lexical and sublexical stimuli were matched in discriminability, scores were highly correlated and no individual demonstrated substantially better performance on lexical than sublexical perception (Figures 1a-c. However, when the word discriminations were easier (as in prior studies; e.g., Miceli et al., 1980, patients with impaired syllable discrimination were within the control range on word discrimination (Figure 1d. Finally, digit matching showed no significant relation to perception tasks (e.g., Figure 1e. Moreover, there was a wide range of digit matching spans for patients performing well on speech perception tasks (e.g., > 1.5 on syllable discrimination and digit matching ranging from 3.6 to 6.0. These data fail to support dual route claims, suggesting that lexical processing depends on sublexical perception and suggesting that phonological STM depends on a buffer separate from speech perception mechanisms.
Varnet, Léo; Knoblauch, Kenneth; Serniclaes, Willy; Meunier, Fanny; Hoen, Michel
Although there is a large consensus regarding the involvement of specific acoustic cues in speech perception, the precise mechanisms underlying the transformation from continuous acoustical properties into discrete perceptual units remains undetermined. This gap in knowledge is partially due to the lack of a turnkey solution for isolating critical speech cues from natural stimuli. In this paper, we describe a psychoacoustic imaging method known as the Auditory Classification Image technique t...
The work presented in this book focuses on modeling audiovisual quality as perceived by the users of IP-based solutions for video communication like videotelephony. It also extends the current framework for the parametric prediction of audiovisual call quality. The book addresses several aspects related to the quality perception of entire video calls, namely, the quality estimation of the single audio and video modalities in an interactive context, the audiovisual quality integration of these modalities and the temporal pooling of short sample-based quality scores to account for the perceptual quality impact of time-varying degradations.
Rummukainen, Olli; Radun, Jenni; Virtanen, Toni; Pulkki, Ville
This work analyzed the perceptual attributes of natural dynamic audiovisual scenes. We presented thirty participants with 19 natural scenes in a similarity categorization task, followed by a semi-structured interview. The scenes were reproduced with an immersive audiovisual display. Natural scene perception has been studied mainly with unimodal settings, which have identified motion as one of the most salient attributes related to visual scenes, and sound intensity along with pitch trajectori...
Full Text Available Recent models on speech perception propose a dual stream processing network, with a dorsal stream, extending from the posterior temporal lobe of the left hemisphere through inferior parietal areas into the left inferior frontal gyrus, and a ventral stream that is assumed to originate in the primary auditory cortex in the upper posterior part of the temporal lobe and to extend towards the anterior part of the temporal lobe, where it may connect to the ventral part of the inferior frontal gyrus. This article describes and reviews the results from a series of complementary functional magnetic imaging (fMRI studies that aimed to trace the hierarchical processing network for speech comprehension within the left and right hemisphere with a particular focus on the temporal lobe and the ventral stream. As hypothesised, the results demonstrate a bilateral involvement of the temporal lobes in the processing of speech signals. However, an increasing leftward asymmetry was detected from auditory-phonetic to lexico-semantic processing and along the posterior-anterior axis, thus forming a “lateralisation” gradient. This increasing leftward lateralisation was particularly evident for the left superior temporal sulcus (STS and more anterior parts of the temporal lobe.
Wang, Hsiao-Lan S; Chen, I-Chen; Chiang, Chun-Han; Lai, Ying-Hui; Tsao, Yu
The current study examined the associations between basic auditory perception, speech prosodic processing, and vocabulary development in Chinese kindergartners, specifically, whether early basic auditory perception may be related to linguistic prosodic processing in Chinese Mandarin vocabulary acquisition. A series of language, auditory, and linguistic prosodic tests were given to 100 preschool children who had not yet learned how to read Chinese characters. The results suggested that lexical tone sensitivity and intonation production were significantly correlated with children's general vocabulary abilities. In particular, tone awareness was associated with comprehensive language development, whereas intonation production was associated with both comprehensive and expressive language development. Regression analyses revealed that tone sensitivity accounted for 36% of the unique variance in vocabulary development, whereas intonation production accounted for 6% of the variance in vocabulary development. Moreover, auditory frequency discrimination was significantly correlated with lexical tone sensitivity, syllable duration discrimination, and intonation production in Mandarin Chinese. Also it provided significant contributions to tone sensitivity and intonation production. Auditory frequency discrimination may indirectly affect early vocabulary development through Chinese speech prosody. PMID:27519239
Tremblay, Pascale; Deschamps, Isabelle; Baroni, Marco; Hasson, Uri
Many factors affect our ability to decode the speech signal, including its quality, the complexity of the elements that compose it, as well as their frequency of occurrence and co-occurrence in a language. Syllable frequency effects have been described in the behavioral literature, including facilitatory effects during speech production and inhibitory effects during word recognition, but the neural mechanisms underlying these effects remain largely unknown. The objective of this study was to examine, using functional neuroimaging, the neurobiological correlates of three different distributional statistics in simple 2-syllable nonwords: the frequency of the first and second syllables, and the mutual information between the syllables. We examined these statistics during nonword perception and production using a powerful single-trial analytical approach. We found that repetition accuracy was higher for nonwords in which the frequency of the first syllable was high. In addition, brain responses to distributional statistics were widespread and almost exclusively cortical. Importantly, brain activity was modulated in a distinct manner for each statistic, with the strongest facilitatory effects associated with the frequency of the first syllable and mutual information. These findings show that distributional statistics modulate nonword perception and production. We discuss the common and unique impact of each distributional statistic on brain activity, as well as task differences. PMID:27184201
Bierer, Julie A; Litvak, Leonid
Speech perception among cochlear implant (CI) listeners is highly variable. High degrees of channel interaction are associated with poorer speech understanding. Two methods for reducing channel interaction, focusing electrical fields, and deactivating subsets of channels were assessed by the change in vowel and consonant identification scores with different program settings. The main hypotheses were that (a) focused stimulation will improve phoneme recognition and (b) speech perception will improve when channels with high thresholds are deactivated. To select high-threshold channels for deactivation, subjects' threshold profiles were processed to enhance the peaks and troughs, and then an exclusion or inclusion criterion based on the mean and standard deviation was used. Low-threshold channels were selected manually and matched in number and apex-to-base distribution. Nine ears in eight adult CI listeners with Advanced Bionics HiRes90k devices were tested with six experimental programs. Two, all-channel programs, (a) 14-channel partial tripolar (pTP) and (b) 14-channel monopolar (MP), and four variable-channel programs, derived from these two base programs, (c) pTP with high- and (d) low-threshold channels deactivated, and (e) MP with high- and (f) low-threshold channels deactivated, were created. Across subjects, performance was similar with pTP and MP programs. However, poorer performing subjects (scoring correct on vowel identification) tended to perform better with the all-channel pTP than with the MP program (1 > 2). These same subjects showed slightly more benefit with the reduced channel MP programs (5 and 6). Subjective ratings were consistent with performance. These finding suggest that reducing channel interaction may benefit poorer performing CI listeners. PMID:27317668
Full Text Available Spoken words are highly variable. A single word may never be uttered the same way twice. As listeners, we regularly encounter speakers of different ages, genders, and accents, increasing the amount of variation we face. How listeners understand spoken words as quickly and adeptly as they do despite this variation remains an issue central to linguistic theory. We propose that learned acoustic patterns are mapped simultaneously to linguistic representations and to social representations. In doing so, we illuminate a paradox that results in the literature from, we argue, the focus on representations and the peripheral treatment of word-level phonetic variation. We consider phonetic variation more fully and highlight a growing body of work that is problematic for current theory: Words with different pronunciation variants are recognized equally well in immediate processing tasks, while an atypical, infrequent, but socially-idealized form is remembered better in the long-term. We suggest that the perception of spoken words is socially-weighted, resulting in sparse, but high-resolution clusters of socially-idealized episodes that are robust in immediate processing and are more strongly encoded, predicting memory inequality. Our proposal includes a dual-route approach to speech perception in which listeners map acoustic patterns in speech to linguistic and social representations in tandem. This approach makes novel predictions about the extraction of information from the speech signal, and provides a framework with which we can ask new questions. We propose that language comprehension, broadly, results from the integration of both linguistic and social information.
Rodríguez Peñarroja, Manuel
This thesis describes the teaching and learning of multiple speech acts from an interlanguage pragmatics perspective since the already existing materials for that purpose have been considered as impoverished in terms of reflecting the use of language in its context. The first chapter "Pragmatics and Speech Act theory" includes the description of Pragmatics as the main area of study which this thesis is based on. It also includes the description of concepts related to pragmatics, such as speec...
Full Text Available Foreign-accented speech often presents a challenging listening condition. In addition to deviations from the target speech norms related to the inexperience of the nonnative speaker, listener characteristics may play a role in determining intelligibility levels. We have previously shown that an implicit visual bias for associating East Asian faces and foreignness predicts the listeners’ perceptual ability to process Korean-accented English audiovisual speech (Yi et al., 2013. Here, we examine the neural mechanism underlying the influence of listener bias to foreign faces on speech perception. In a functional magnetic resonance imaging (fMRI study, native English speakers listened to native- and Korean-accented English sentences, with or without faces. The participants’ Asian-foreign association was measured using an implicit association test (IAT, conducted outside the scanner. We found that foreign-accented speech evoked greater activity in the bilateral primary auditory cortices and the inferior frontal gyri, potentially reflecting greater computational demand. Higher IAT scores, indicating greater bias, were associated with increased BOLD response to foreign-accented speech with faces in the primary auditory cortex, the early node for spectrotemporal analysis. We conclude the following: (1 foreign-accented speech perception places greater demand on the neural systems underlying speech perception; (2 face of the talker can exaggerate the perceived foreignness of foreign-accented speech; (3 implicit Asian-foreign association is associated with decreased neural efficiency in early spectrotemporal processing.
Julio Montero Díaz
Full Text Available This article analyzes the possibilities of presenting an audiovisual history in a society in which audiovisual media has progressively gained greater protagonism. We analyze specific cases of films and historical documentaries and we assess the difficulties faced by historians to understand the keys of audiovisual language and by filmmakers to understand and incorporate history into their productions. We conclude that it would not be possible to disseminate history in the western world without audiovisual resources circulated through various types of screens (cinema, television, computer, mobile phone, video games.
Scott, Sophie K.
Our understanding of the neurobiological basis for human speech production and perception has benefited from insights from psychology, neuropsychology and neurology. In this overview, I outline some of the ways that functional imaging has added to this knowledge and argue that, as a neuroanatomical tool, functional imaging has led to some…
Stacey, Paula C.; Summerfield, A. Quentin
Purpose: To compare the effectiveness of 3 self-administered strategies for auditory training that might improve speech perception by adult users of cochlear implants. The strategies are based, respectively, on discriminating isolated words, words in sentences, and phonemes in nonsense syllables. Method: Participants were 18 normal-hearing adults…
Weir, Kristy A.
Speech pathology students readily identify the importance of a sound understanding of anatomical structures central to their intended profession. In contrast, they often do not recognize the relevance of a broader understanding of structure and function. This study aimed to explore students' perceptions of the relevance of anatomy to speech…
Zhang, Juan; McBride-Chang, Catherine
A 4-stage developmental model, in which auditory sensitivity is fully mediated by speech perception at both the segmental and suprasegmental levels, which are further related to word reading through their associations with phonological awareness, rapid automatized naming, verbal short-term memory and morphological awareness, was tested with…
Gou, J.; Smith, J.; Valero, J.; Rubio, I.
This paper reports on a clinical trial evaluating outcomes of a frequency-lowering technique for adolescents and young adults with severe to profound hearing impairment. Outcomes were defined by changes in aided thresholds, speech perception, and acceptance. The participants comprised seven young people aged between 13 and 25 years. They were…
McMurray, Bob; Munson, Cheyenne; Tomblin, J. Bruce
Purpose: The authors examined speech perception deficits associated with individual differences in language ability, contrasting auditory, phonological, or lexical accounts by asking whether lexical competition is differentially sensitive to fine-grained acoustic variation. Method: Adolescents with a range of language abilities (N = 74, including…
Most, Tova; Rothem, Hilla; Luntz, Michal
The researchers evaluated the contribution of cochlear implants (CIs) to speech perception by a sample of prelingually deaf individuals implanted after age 8 years. This group was compared with a group with profound hearing impairment (HA-P), and with a group with severe hearing impairment (HA-S), both of which used hearing aids. Words and…
Ordin, Mikhail; Polyanskaya, Leona
We investigated the perception of developmental changes in timing patterns that happen in the course of second language (L2) acquisition, provided that the native and the target languages of the learner are rhythmically similar (German and English). It was found that speech rhythm in L2 English produced by German learners becomes increasingly stress-timed as acquisition progresses. This development is captured by the tempo-normalized rhythm measures of durational variability. Advanced learners also deliver speech at a faster rate. However, when native speakers have to classify the timing patterns characteristic of L2 English of German learners at different proficiency levels, they attend to speech rate cues and ignore the differences in speech rhythm. PMID:25859228
Kanai, Ryota; Sheth, Bhavin R.; Verstraten, Frans A J; Shimojo, Shinsuke
Background: The timing at which sensory input reaches the level of conscious perception is an intriguing question still awaiting an answer. It is often assumed that both visual and auditory percepts have a modality specific processing delay and their difference determines perceptual temporal offset. Methodology/Principal Findings: Here, we show that the perception of audiovisual simultaneity can change flexibly and fluctuates over a short period of time while subjects observe a constant ...
Isabel Fernandes Silva
Full Text Available Over the last decades, audiovisual translation has gained increased significance in Translation Studies as well as an interdisciplinary subject within other fields (media, cinema studies etc. Although many articles have been published on communicative aspects of translation such as politeness, only recently have scholars taken an interest in the translation of compliments. This study will focus on both these areas from a multimodal and pragmatic perspective, emphasizing the links between these fields and how this multidisciplinary approach will evidence the polysemiotic nature of the translation process. In Audiovisual Translation both text and image are at play, therefore, the translation of speech produced by the characters may either omit (because it is provided by visualgestual signs or it may emphasize information. A selection was made of the compliments present in the film What Women Want, our focus being on subtitles which did not successfully convey the compliment expressed in the source text, as well as analyze the reasons for this, namely difference in register, Culture Specific Items and repetitions. These differences lead to a different portrayal/identity/perception of the main character in the English version (original soundtrack and subtitled versions in Portuguese and Italian.
Abdala, Carolina; Dhar, Sumitrajit; Ahmadi, Mahnaz; Luo, Ping
The medial olivocochlear reflex (MOCR) modulates cochlear amplifier gain and is thought to facilitate the detection of signals in noise. High-resolution distortion product otoacoustic emissions (DPOAEs) were recorded in teens, young, middle-aged, and elderly adults at moderate levels using primary tones swept from 0.5 to 4 kHz with and without a contralateral acoustic stimulus (CAS) to elicit medial efferent activation. Aging effects on magnitude and phase of the 2f1-f2 DPOAE and on its components were examined, as was the link between speech-in-noise performance and MOCR strength. Results revealed a mild aging effect on the MOCR through middle age for frequencies below 1.5 kHz. Additionally, positive correlations were observed between strength of the MOCR and performance on select measures of speech perception parsed into features. The elderly group showed unexpected results including relatively large effects of CAS on DPOAE, and CAS-induced increases in DPOAE fine structure as well as increases in the amplitude and phase accumulation of DPOAE reflection components. Contamination of MOCR estimates by middle ear muscle contractions cannot be ruled out in the oldest subjects. The findings reiterate that DPOAE components should be unmixed when measuring medial efferent effects to better consider and understand these potential confounds. PMID:25234884
Full Text Available Activity in premotor and sensorimotor cortices is found in speech production and some perception tasks. Yet, how sensorimotor integration supports these functions is unclear due to a lack of data examining the timing of activity from these regions. Beta (~20Hz and alpha (~10Hz spectral power within the EEG µ rhythm are considered indices of motor and somatosensory activity, respectively. In the current study, perception conditions required discrimination (same/different of syllables pairs (/ba/ and /da/ in quiet and noisy conditions. Production conditions required covert and overt syllable productions and overt word production. Independent component analysis was performed on EEG data obtained during these conditions to 1 identify clusters of µ components common to all conditions and 2 examine real-time event-related spectral perturbations (ERSP within alpha and beta bands. 17 and 15 out of 20 participants produced left and right µ-components, respectively, localized to precentral gyri. Discrimination conditions were characterized by significant (pFDR<.05 early alpha event-related synchronization (ERS prior to and during stimulus presentation and later alpha event-related desynchronization (ERD following stimulus offset. Beta ERD began early and gained strength across time. Differences were found between quiet and noisy discrimination conditions. Both overt syllable and word productions yielded similar alpha/beta ERD that began prior to production and was strongest during muscle activity. Findings during covert production were weaker than during overt production. One explanation for these findings is that µ-beta ERD indexes early predictive coding (e.g., internal modeling and/or overt and covert attentional / motor processes. µ-alpha ERS may index inhibitory input to the premotor cortex from sensory regions prior to and during discrimination, while µ-alpha ERD may index re-afferent sensory feedback during speech rehearsal and production.
Jenson, David; Bowers, Andrew L; Harkrider, Ashley W; Thornton, David; Cuellar, Megan; Saltuklaroglu, Tim
Activity in anterior sensorimotor regions is found in speech production and some perception tasks. Yet, how sensorimotor integration supports these functions is unclear due to a lack of data examining the timing of activity from these regions. Beta (~20 Hz) and alpha (~10 Hz) spectral power within the EEG μ rhythm are considered indices of motor and somatosensory activity, respectively. In the current study, perception conditions required discrimination (same/different) of syllables pairs (/ba/ and /da/) in quiet and noisy conditions. Production conditions required covert and overt syllable productions and overt word production. Independent component analysis was performed on EEG data obtained during these conditions to (1) identify clusters of μ components common to all conditions and (2) examine real-time event-related spectral perturbations (ERSP) within alpha and beta bands. 17 and 15 out of 20 participants produced left and right μ-components, respectively, localized to precentral gyri. Discrimination conditions were characterized by significant (pFDR < 0.05) early alpha event-related synchronization (ERS) prior to and during stimulus presentation and later alpha event-related desynchronization (ERD) following stimulus offset. Beta ERD began early and gained strength across time. Differences were found between quiet and noisy discrimination conditions. Both overt syllable and word productions yielded similar alpha/beta ERD that began prior to production and was strongest during muscle activity. Findings during covert production were weaker than during overt production. One explanation for these findings is that μ-beta ERD indexes early predictive coding (e.g., internal modeling) and/or overt and covert attentional/motor processes. μ-alpha ERS may index inhibitory input to the premotor cortex from sensory regions prior to and during discrimination, while μ-alpha ERD may index sensory feedback during speech rehearsal and production. PMID:25071633
Bidelman, Gavin M; Lee, Chia-Cheng
Categorical perception (CP) represents a fundamental process in converting continuous speech acoustics into invariant percepts. Using scalp-recorded event-related brain potentials (ERPs), we investigated how tone-language experience and stimulus context influence the CP for lexical tones-pitch patterns used by a majority of the world's languages to signal word meaning. Stimuli were vowel pairs overlaid with a high-level tone (T1) followed by a pitch continuum spanning between dipping (T3) and rising (T2) contours of the Mandarin tonal space. To vary context, T1 either preceded or followed the critical T2/T3 continuum. Behaviorally, native Chinese showed stronger CP as evident by their steeper, more dichotomous psychometric functions and faster identification of linguistic pitch patterns than native English-speaking controls. Stimulus context produced shifts in both groups' categorical boundary but was more exaggerated in native listeners. Analysis of source activity extracted from primary auditory cortex revealed overall stronger neural encoding of tone in Chinese compared to English, indicating experience-dependent plasticity in cortical pitch processing. More critically, "neurometric" functions derived from multidimensional scaling and clustering of source ERPs established: (i) early auditory cortical activity could accurately predict listeners' psychometric speech identification and contextual shifts in the perceptual boundary; (ii) neurometric profiles were organized more categorically in native speakers. Our data show that tone-language experience refines early auditory cortical brain representations so as to supply more faithful templates to neural mechanisms subserving lexical pitch categorization. We infer that contextual influence on the CP for tones is determined by language experience and the frequency of pitch patterns as they occur in listeners' native lexicon. PMID:26146197
Zaar, Johannes; Dau, Torsten
The present study investigated the influence of various sources of response variability in consonant perception. A distinction was made between sourceinduced variability and receiverrelated variability. The former refers to perceptual differences induced by differences in the ......The present study investigated the influence of various sources of response variability in consonant perception. A distinction was made between sourceinduced variability and receiverrelated variability. The former refers to perceptual differences induced by...... and of similar magnitude. Even timeshifts in the waveforms of white masking noise produced a significant effect, which was well above the withinlistener variability (the smallest effect). Two auditoryinspired models in combination with a template...... confusions. Both models captured the sourceinduced perceptual distance remarkably well. However, the modulationbased approach showed a better agreement with the data in terms of consonant recognition and confusions. The results indicate that low-frequency modulations up to 16 Hz play a crucial role in consonant perception....
Gjaja, Marin N.
Neural networks for supervised and unsupervised learning are developed and applied to problems in remote sensing, continuous map learning, and speech perception. Adaptive Resonance Theory (ART) models are real-time neural networks for category learning, pattern recognition, and prediction. Unsupervised fuzzy ART networks synthesize fuzzy logic and neural networks, and supervised ARTMAP networks incorporate ART modules for prediction and classification. New ART and ARTMAP methods resulting from analyses of data structure, parameter specification, and category selection are developed. Architectural modifications providing flexibility for a variety of applications are also introduced and explored. A new methodology for automatic mapping from Landsat Thematic Mapper (TM) and terrain data, based on fuzzy ARTMAP, is developed. System capabilities are tested on a challenging remote sensing problem, prediction of vegetation classes in the Cleveland National Forest from spectral and terrain features. After training at the pixel level, performance is tested at the stand level, using sites not seen during training. Results are compared to those of maximum likelihood classifiers, back propagation neural networks, and K-nearest neighbor algorithms. Best performance is obtained using a hybrid system based on a convex combination of fuzzy ARTMAP and maximum likelihood predictions. This work forms the foundation for additional studies exploring fuzzy ARTMAP's capability to estimate class mixture composition for non-homogeneous sites. Exploratory simulations apply ARTMAP to the problem of learning continuous multidimensional mappings. A novel system architecture retains basic ARTMAP properties of incremental and fast learning in an on-line setting while adding components to solve this class of problems. The perceptual magnet effect is a language-specific phenomenon arising early in infant speech development that is characterized by a warping of speech sound perception. An
Speech processing addresses various scientific and technological areas. It includes speech analysis and variable rate coding, in order to store or transmit speech. It also covers speech synthesis, especially from text, speech recognition, including speaker and language identification, and spoken language understanding. This book covers the following topics: how to realize speech production and perception systems, how to synthesize and understand speech using state-of-the-art methods in signal processing, pattern recognition, stochastic modelling computational linguistics and human factor studi
Waaramaa, Teija; Laukkanen, Anne-Maria; Airas, Matti; Alku, Paavo
This study aimed to investigate the role of voice source and formant frequencies in the perception of emotional valence and psychophysiological activity level from short vowel samples (approximately 150 milliseconds). Nine professional actors (five males and four females) read a prose passage simulating joy, tenderness, sadness, anger, and a neutral emotional state. The stress carrying vowel [a:] was extracted from continuous speech during the Finnish word [ta:k:ahan] and analyzed for duration, fundamental frequency (F0), equivalent sound level (L(eq)), alpha ratio, and formant frequencies F1-F4. Alpha ratio was calculated by subtracting the L(eq) (dB) in the range 50 Hz-1 kHz from the L(eq) in the range 1-5 kHz. The samples were inverse filtered by Iterative Adaptive Inverse Filtering and the estimates of the glottal flow obtained were parameterized with the normalized amplitude quotient (NAQ = f(AC)/(d(peak)T)). Fifty listeners (mean age 28.5 years) identified the emotional valences from the randomized samples. Multinomial Logistic Regression Analysis was used to study the interrelations of the parameters for perception. It appeared to be possible to identify valences from vowel samples of short duration ( approximately 150 milliseconds). NAQ tended to differentiate between the valences and activity levels perceived in both genders. Voice source may not only reflect variations of F0 and L(eq), but may also have an independent role in expression, reflecting phonation types. To some extent, formant frequencies appeared to be related to valence perception but no clear patterns could be identified. Coding of valence tends to be a complicated multiparameter phenomenon with wide individual variation. PMID:19111438
Schmidt, Juliane; Janse, Esther; Scharenborg, Odette
This study investigated whether age and/or differences in hearing sensitivity influence the perception of the emotion dimensions arousal (calm vs. aroused) and valence (positive vs. negative attitude) in conversational speech. To that end, this study specifically focused on the relationship between participants’ ratings of short affective utterances and the utterances’ acoustic parameters (pitch, intensity, and articulation rate) known to be associated with the emotion dimensions arousal and valence. Stimuli consisted of short utterances taken from a corpus of conversational speech. In two rating tasks, younger and older adults either rated arousal or valence using a 5-point scale. Mean intensity was found to be the main cue participants used in the arousal task (i.e., higher mean intensity cueing higher levels of arousal) while mean F0 was the main cue in the valence task (i.e., higher mean F0 being interpreted as more negative). Even though there were no overall age group differences in arousal or valence ratings, compared to younger adults, older adults responded less strongly to mean intensity differences cueing arousal and responded more strongly to differences in mean F0 cueing valence. Individual hearing sensitivity among the older adults did not modify the use of mean intensity as an arousal cue. However, individual hearing sensitivity generally affected valence ratings and modified the use of mean F0. We conclude that age differences in the interpretation of mean F0 as a cue for valence are likely due to age-related hearing loss, whereas age differences in rating arousal do not seem to be driven by hearing sensitivity differences between age groups (as measured by pure-tone audiometry). PMID:27303340
Getzmann, Stephan; Falkenstein, Michael; Wascher, Edmund
The ability to understand speech under adverse listening conditions deteriorates with age. In addition to genuine hearing deficits, age-related declines in attentional and inhibitory control are assumed to contribute to these difficulties. Here, the impact of task-irrelevant distractors on speech perception was studied in 28 younger and 24 older participants in a simulated "cocktail party" scenario. In a two-alternative forced-choice word discrimination task, the participants responded to a rapid succession of short speech stimuli ("on" and "off") that was presented at a frequent standard location or at a rare deviant location in silence or with a concurrent distractor speaker. Behavioral responses and event-related potentials (mismatch negativity MMN, P3a, and reorienting negativity RON) were analyzed to study the interplay of distraction, orientation, and refocusing in the presence of changes in target location. While shifts in target location decreased performance of both age groups, this effect was more pronounced in the older group. Especially in the distractor condition, the electrophysiological measures indicated a delayed attention capture and a delayed re-focussing of attention toward the task-relevant stimulus feature in the older group, relative to the young group. In sum, the results suggest that a delay in the attention switching mechanism contribute to the age-related difficulties in speech perception in dynamic listening situations with multiple speakers. PMID:25447300
Been, Pieter H; Zwarts, Frans
At the behavioral level one of the primary disturbances involved in congenital dyslexia concerns phonological processing. At the neuroarchitectural level autopsies have revealed ectopies, e.g., a reduced number of neurons in the upper layers of the cortex and an increased number in the lower ones. In dynamic models of interacting neuronal populations the behavioral level can be related to the neurophysiological level. In this study an attempt is made to do so at the cortical level. The first focus of this model study are the results of a Finnish experiment assessing geminate stop perception in quasi speech stimuli by 6 month old infants using a head turning paradigm and evoked potentials. The second focus of this study are the results of a Dutch experiment assessing discrimination of transients in speech stimuli, by adult dyslexics and controls and 2 month old infants. There appears to be a difference in the phonemic perceptual boundaries of children at genetic risk for dyslexia and control children as revealed in the Finnish study. Assuming a lowered neuronal density in the 'dyslexic' model, reflecting ectopies, it may be postulated that less neuronal surface is available for synaptic connections resulting in a lowered synaptic density and thus a lowered amount of available neurotransmitter. A lowered synaptic density also implies a reduced amount of membrane surface available for neurotransmitter metabolism. By assuming both, a reduced upper bound of neurotransmitter and a reduced metabolic transmitter rate in the dynamic model, the Finnish experimental results can be approximated closely. This applies both to data from behavioral head turning and that of the evoked potential study. In the Dutch study adult dyslexics show poor performance in discriminating transients in the speech signal compared to the controls. The same stimuli were used in a a study comparing infants from dyslexic families and controls. Using the same transmitter parameters as in modeling the
Speech perception (SP), verbal working memory (WM) and auditory temporal resolution (ATR) have been studied in children with attention deficit hyperactivity disorder (ADHD) and language impairment (LI), as well as in reference groups of typically developed children. A computerised method was developed, in which discrimination of same or different pairs of stimuli was tested. In a functional Magnetic Resonance Imaging (fMRI) study a similar test was used to explore the neural...
Full Text Available Brain imaging studies indicate that speech motor areas are recruited for auditory speech perception, especially when intelligibility is low due to environmental noise or when speech is accented. The purpose of the present study was to determine the relative contribution of brain regions to the processing of speech containing phonetic categories from one’s own language, speech with accented samples of one’s native phonetic categories, and speech with unfamiliar phonetic categories. To that end, native English and Japanese speakers identified the speech sounds /r/ and /l/ that were produced by native English speakers (unaccented and Japanese speakers (foreign-accented while functional magnetic resonance imaging measured their brain activity. For native English speakers, the Japanese accented speech was more difficult to categorize than the unaccented English speech. In contrast, Japanese speakers have difficulty distinguishing between /r/ and /l/, so both the Japanese accented and English unaccented speech were difficult to categorize. Brain regions involved with listening to foreign-accented productions of a first language included primarily the right cerebellum, left ventral inferior premotor cortex PMvi, and Broca’s area. Brain regions most involved with listening to a second-language phonetic contrast (foreign-accented and unaccented productions also included the left PMvi and the right cerebellum. Additionally, increased activity was observed in the right PMvi, the left and right ventral superior premotor cortex PMvs, and the left cerebellum. These results support a role for speech motor regions during the perception of foreign-accented native speech and for perception of difficult second-language phonetic contrasts.
Full Text Available Abstract Background How does the brain repair obliterated speech and cope with acoustically ambivalent situations? A widely discussed possibility is to use top-down information for solving the ambiguity problem. In the case of speech, this may lead to a match of bottom-up sensory input with lexical expectations resulting in resonant states which are reflected in the induced gamma-band activity (GBA. Methods In the present EEG study, we compared the subject's pre-attentive GBA responses to obliterated speech segments presented after a series of correct words. The words were a minimal pair in German and differed with respect to the degree of specificity of segmental phonological information. Results The induced GBA was larger when the expected lexical information was phonologically fully specified compared to the underspecified condition. Thus, the degree of specificity of phonological information in the mental lexicon correlates with the intensity of the matching process of bottom-up sensory input with lexical information. Conclusions These results together with those of a behavioural control experiment support the notion of multi-level mechanisms involved in the repair of deficient speech. The delineated alignment of pre-existing knowledge with sensory input is in accordance with recent ideas about the role of internal forward models in speech perception.
Léo Varnet; Tianyun Wang; Chloe Peter; Fanny Meunier; Michel Hoen
It is now well established that extensive musical training percolates to higher levels of cognition, such as speech processing. However, the lack of a precise technique to investigate the specific listening strategy involved in speech comprehension has made it difficult to determine how musicians’ higher performance in non-speech tasks contributes to their enhanced speech comprehension. The recently developed Auditory Classification Image approach reveals the precise time-frequency regions us...
Ramirez, Joshua; Mann, Virginia
Both dyslexics and auditory neuropathy (AN) subjects show inferior consonant-vowel (CV) perception in noise, relative to controls. To better understand these impairments, natural acoustic speech stimuli that were masked in speech-shaped noise at various intensities were presented to dyslexic, AN, and control subjects either in isolation or accompanied by visual articulatory cues. AN subjects were expected to benefit from the pairing of visual articulatory cues and auditory CV stimuli, provided that their speech perception impairment reflects a relatively peripheral auditory disorder. Assuming that dyslexia reflects a general impairment of speech processing rather than a disorder of audition, dyslexics were not expected to similarly benefit from an introduction of visual articulatory cues. The results revealed an increased effect of noise masking on the perception of isolated acoustic stimuli by both dyslexic and AN subjects. More importantly, dyslexics showed less effective use of visual articulatory cues in identifying masked speech stimuli and lower visual baseline performance relative to AN subjects and controls. Last, a significant positive correlation was found between reading ability and the ameliorating effect of visual articulatory cues on speech perception in noise. These results suggest that some reading impairments may stem from a central deficit of speech processing.
Tajima, Keiichi; Akahane-Yamada, Reiko
Past studies on second-language (L2) speech perception have suggested that L2 learners have difficulty exploiting contextual information when perceiving L2 utterances, and that they exhibit greater difficulty than native listeners when faced with variability in temporal context. The present study investigated the extent to which native Japanese listeners, who are known to have difficulties perceiving English syllables, are influenced by changes in speaking rate when asked to count syllables in spoken English words. The stimuli consisted of a set of English words and nonwords varying in syllable structure spoken at three rates by a native English speaker. The stimuli produced at the three rates were presented to native Japanese listeners in a random order. Results indicated that listeners' identification accuracy did not vary as a function of speaking rate, although it decreased significantly as the syllable structure of the stimuli became more complex. Moreover, even though speaking rate varied from trial to trial, Japanese listeners' performance did not decline compared to a condition in which the speaking rate was fixed. Theoretical and practical implications of these findings will be discussed. [Work supported by JSPS and NICT.
Purdy, Suzanne C; Kelly, Andrea S
Speech perception varies widely across cochlear implant (CI) users and typically improves over time after implantation. There is also some evidence for improved auditory evoked potentials (shorter latencies, larger amplitudes) after implantation but few longitudinal studies have examined the relationship between behavioral and evoked potential measures after implantation in postlingually deaf adults. The relationship between speech perception and auditory evoked potentials was investigated in newly implanted cochlear implant users from the day of implant activation to 9 months postimplantation, on five occasions, in 10 adults age 27 to 57 years who had been bilaterally profoundly deaf for 1 to 30 years prior to receiving a unilateral CI24 cochlear implant. Changes over time in middle latency response (MLR), mismatch negativity, and obligatory cortical auditory evoked potentials and word and sentence speech perception scores were examined. Speech perception improved significantly over the 9-month period. MLRs varied and showed no consistent change over time. Three participants aged in their 50s had absent MLRs. The pattern of change in N1 amplitudes over the five visits varied across participants. P2 area increased significantly for 1,000- and 4,000-Hz tones but not for 250 Hz. The greatest change in P2 area occurred after 6 months of implant experience. Although there was a trend for mismatch negativity peak latency to reduce and width to increase after 3 months of implant experience, there was considerable variability and these changes were not significant. Only 60% of participants had a detectable mismatch initially; this increased to 100% at 9 months. The continued change in P2 area over the period evaluated, with a trend for greater change for right hemisphere recordings, is consistent with the pattern of incremental change in speech perception scores over time. MLR, N1, and mismatch negativity changes were inconsistent and hence P2 may be a more robust measure
Mayer, Jennifer L; Hannent, Ian; Heaton, Pamela F
Whilst enhanced perception has been widely reported in individuals with Autism Spectrum Disorders (ASDs), relatively little is known about the developmental trajectory and impact of atypical auditory processing on speech perception in intellectually high-functioning adults with ASD. This paper presents data on perception of complex tones and speech pitch in adult participants with high-functioning ASD and typical development, and compares these with pre-existing data using the same paradigm with groups of children and adolescents with and without ASD. As perceptual processing abnormalities are likely to influence behavioural performance, regression analyses were carried out on the adult data set. The findings revealed markedly different pitch discrimination trajectories and language correlates across diagnostic groups. While pitch discrimination increased with age and correlated with receptive vocabulary in groups without ASD, it was enhanced in childhood and stable across development in ASD. Pitch discrimination scores did not correlate with receptive vocabulary scores in the ASD group and for adults with ASD superior pitch perception was associated with sensory atypicalities and diagnostic measures of symptom severity. We conclude that the development of pitch discrimination, and its associated mechanisms markedly distinguish those with and without ASD. PMID:25106823
Francis, Alexander L; MacPherson, Megan K; Chandrasekaran, Bharath; Alvar, Ann M
Typically, understanding speech seems effortless and automatic. However, a variety of factors may, independently or interactively, make listening more effortful. Physiological measures may help to distinguish between the application of different cognitive mechanisms whose operation is perceived as effortful. In the present study, physiological and behavioral measures associated with task demand were collected along with behavioral measures of performance while participants listened to and repeated sentences. The goal was to measure psychophysiological reactivity associated with three degraded listening conditions, each of which differed in terms of the source of the difficulty (distortion, energetic masking, and informational masking), and therefore were expected to engage different cognitive mechanisms. These conditions were chosen to be matched for overall performance (keywords correct), and were compared to listening to unmasked speech produced by a natural voice. The three degraded conditions were: (1) Unmasked speech produced by a computer speech synthesizer, (2) Speech produced by a natural voice and masked byspeech-shaped noise and (3) Speech produced by a natural voice and masked by two-talker babble. Masked conditions were both presented at a -8 dB signal to noise ratio (SNR), a level shown in previous research to result in comparable levels of performance for these stimuli and maskers. Performance was measured in terms of proportion of key words identified correctly, and task demand or effort was quantified subjectively by self-report. Measures of psychophysiological reactivity included electrodermal (skin conductance) response frequency and amplitude, blood pulse amplitude and pulse rate. Results suggest that the two masked conditions evoked stronger psychophysiological reactivity than did the two unmasked conditions even when behavioral measures of listening performance and listeners' subjective perception of task demand were comparable across the three
采用启动范式,以汉语听者为被试,考察了非言语声音是否影响言语声音的知觉.实验1考察了纯音对辅音范畴连续体知觉的影响,结果发现纯音影响到辅音范畴连续体的知觉,表现出频谱对比效应.实验2考察了纯音和复合音对元音知觉的影响,结果发现与元音共振峰频率一致的纯音或复合音加快了元音的识别,表现出启动效应.两个实验一致发现非言语声音能够影响言语声音的知觉,表明言语声音知觉也需要一个前言语的频谱特征分析阶段,这与言语知觉听觉理论的观点一致.%A long-standing debate in the field of speech perception concerns whether specialized processing mechanisms are necessary to perceive speech sounds. The motor theory argues that speech perception is a special process and non-speech sounds don't affect the perception of speech sounds. The auditory theory suggests that speech perception can be understood in terms of general auditory process, which is shared with the perception of non-speech sounds. The findings from English subjects indicate that the processing of non-speech sounds affects the perception of speech sounds. Few studies have been administered in Chinese. The present study administered two experiments to examine whether the processing of non-speech sounds could affect the perception of speech segments in Chinese listeners. In experiment 1, speech sounds were a continuum of synthesized consonant category ranging from /ba/ to /da/. Non-speech sounds were two sine wave tones, with frequency equal to the onset frequency of F2 of/ba/ and /da/, respectively. Following the two tones, the /ba/-/da/ series were presented with a 50ms ISI. Undergraduate participants were asked to identify the speech sounds. The results found that non-speech tones influenced identification of speech targets: when the frequency of tone was equal to F2 onset frequency of /ba/, participants were more likely to identify consonant
Goswami, Usha; Cumming, Ruth; Chait, Maria; Huss, Martina; Mead, Natasha; Wilson, Angela M.; Barnes, Lisa; Fosker, Tim
Here we use two filtered speech tasks to investigate children’s processing of slow (<4 Hz) versus faster (∼33 Hz) temporal modulations in speech. We compare groups of children with either developmental dyslexia (Experiment 1) or speech and language impairments (SLIs, Experiment 2) to groups of typically-developing (TD) children age-matched to each disorder group. Ten nursery rhymes were filtered so that their modulation frequencies were either low-pass filtered (<4 Hz) or band-pass filt...
Schatzer, Reinhold; Koroleva, Inna; Griessner, Andreas; Levin, Sergey; Kusovkov, Vladislav; Yanov, Yuri; Zierhofer, Clemens
Early multi-channel designs in the history of cochlear implant development were based on a vocoder-type processing of frequency channels and presented bands of compressed analog stimulus waveforms simultaneously on multiple tonotopically arranged electrodes. The realization that the direct summation of electrical fields as a result of simultaneous electrode stimulation exacerbates interactions among the stimulation channels and limits cochlear implant outcome led to the breakthrough in the development of cochlear implants, the continuous interleaved (CIS) sampling coding strategy. By interleaving stimulation pulses across electrodes, CIS activates only a single electrode at each point in time, preventing a direct summation of electrical fields and hence the primary component of channel interactions. In this paper we show that a previously presented approach of simultaneous stimulation with channel interaction compensation (CIC) may also ameliorate the deleterious effects of simultaneous channel interaction on speech perception. In an acute study conducted in eleven experienced MED-EL implant users, configurations involving simultaneous stimulation with CIC and doubled pulse phase durations have been investigated. As pairs of electrodes were activated simultaneously and pulse durations were doubled, carrier rates remained the same. Comparison conditions involved both CIS and fine structure (FS) strategies, either with strictly sequential or paired-simultaneous stimulation. Results showed no statistical difference in the perception of sentences in noise and monosyllables for sequential and paired-simultaneous stimulation with doubled phase durations. This suggests that CIC can largely compensate for the effects of simultaneous channel interaction, for both CIS and FS coding strategies. A simultaneous stimulation paradigm has a number of potential advantages over a traditional sequential interleaved design. The flexibility gained when dropping the requirement of
Santarelli, Rosamaria; del Castillo, Ignacio; Cama, Elona; Scimemi, Pietro; Starr, Arnold
Mutations in the OTOF gene encoding otoferlin result in a disrupted function of the ribbon synapses with impairment of the multivesicular glutamate release. Most affected subjects present with congenital hearing loss and abnormal auditory brainstem potentials associated with preserved cochlear hair cell activities (otoacoustic emissions, cochlear microphonics [CMs]). Transtympanic electrocochleography (ECochG) has recently been proposed for defining the details of potentials arising in both the cochlea and auditory nerve in this disorder, and with a view to shedding light on the pathophysiological mechanisms underlying auditory dysfunction. We review the audiological and electrophysiological findings in children with congenital profound deafness carrying two mutant alleles of the OTOF gene. We show that cochlear microphonic (CM) amplitude and summating potential (SP) amplitude and latency are normal, consistently with a preserved outer and inner hair cell function. In the majority of OTOF children, the SP component is followed by a markedly prolonged low-amplitude negative potential replacing the compound action potential (CAP) recorded in normally-hearing children. This potential is identified at intensities as low as 90 dB below the behavioral threshold. In some ears, a synchronized CAP is superimposed on the prolonged responses at high intensity. Stimulation at high rates reduces the amplitude and duration of the prolonged potentials, consistently with their neural generation. In some children, however, the ECochG response only consists of the SP, with no prolonged potential. Cochlear implants restore hearing sensitivity, speech perception and neural CAP by electrically stimulating the auditory nerve fibers. These findings indicate that an impaired multivesicular glutamate release in OTOF-related disorders leads to abnormal auditory nerve fiber activation and a consequent impairment of spike generation. The magnitude of these effects seems to vary, ranging from
Burnham, Denis; Dodd, Barbara
The McGurk effect, in which auditory [ba] dubbed onto [ga] lip movements is perceived as "da" or "tha," was employed in a real-time task to investigate auditory-visual speech perception in prelingual infants. Experiments 1A and 1B established the validity of real-time dubbing for producing the effect. In Experiment 2, 4 1/2-month-olds were tested in a habituation-test paradigm, in which an auditory-visual stimulus was presented contingent upon visual fixation of a live face. The experimental group was habituated to a McGurk stimulus (auditory [ba] visual [ga]), and the control group to matching auditory-visual [ba]. Each group was then presented with three auditory-only test trials, [ba], [da], and [(delta)a] (as in then). Visual-fixation durations in test trials showed that the experimental group treated the emergent percept in the McGurk effect, [da] or [(delta)a], as familiar (even though they had not heard these sounds previously) and [ba] as novel. For control group infants [da] and [(delta)a] were no more familiar than [ba]. These results are consistent with infants' perception of the McGurk effect, and support the conclusion that prelinguistic infants integrate auditory and visual speech information. PMID:15549685
Baese-Berk, Melissa M; Dilley, Laura C; Schmidt, Stephanie; Morrill, Tuuli H; Pitt, Mark A
Neil Armstrong insisted that his quote upon landing on the moon was misheard, and that he had said one small step for a man, instead of one small step for man. What he said is unclear in part because function words like a can be reduced and spectrally indistinguishable from the preceding context. Therefore, their presence can be ambiguous, and they may disappear perceptually depending on the rate of surrounding speech. Two experiments are presented examining production and perception of reduced tokens of for and for a in spontaneous speech. Experiment 1 investigates the distributions of several acoustic features of for and for a. The results suggest that the distributions of for and for a overlap substantially, both in terms of temporal and spectral characteristics. Experiment 2 examines perception of these same tokens when the context speaking rate differs. The perceptibility of the function word a varies as a function of this context speaking rate. These results demonstrate that substantial ambiguity exists in the original quote from Armstrong, and that this ambiguity may be understood through context speaking rate. PMID:27603209
Full Text Available In binocular rivalry (BR, sensory input remains the same yet subjective experience fluctuates irremediably between two mutually exclusive representations. We investigated the perceptual stabilization effect of an additional sound on the BR dynamics using speech stimuli known to involve robust audiovisual (AV interactions at several cortical levels. Subjects sensitive to the McGurk effect were presented looping videos of rivaling faces uttering /aba/ and /aga/ respectively, while synchronously hearing the voice /aba/. They reported continuously the dominant percept, either observing passively or trying actively to promote one of the faces. The few studies that investigated the influence of information from an external modality on perceptual competition reported results that seem at first sight inconsistent. Since these differences could stem from how well the modalities matched, we addressed this by comparing two levels of AV congruence: real (/aba/ viseme vs. illusory (/aga/ viseme producing the /ada/ McGurk fusion. First, adding the voice /aba/ stabilized both real and illusory congruent lips percept. Second, real congruence of the added voice improved volitional control whereas illusory congruence did not, suggesting a graded contribution to the top-down sensitivity control of selective attention. In conclusion, a congruent sound enhanced considerably attentional control over the perceptual outcome selection; however, differences between passive stabilization and active control according to AV congruency suggest these are governed by two distinct mechanisms. Based on existing theoretical models of BR, selective attention and AV interaction in speech perception, we provide a general interpretation of our findings.
Full Text Available Background: Upon graduation, newly qualified speech-language therapists are expected to provide services independently. This study describes new graduates’ perceptions of their preparedness to provide services across the scope of the profession and explores associations between perceptions of dysphagia theory and clinical learning curricula with preparedness for adult and paediatric dysphagia service delivery.Methods: New graduates of six South African universities were recruited to participate in a survey by completing an electronic questionnaire exploring their perceptions of the dysphagia curricula and their preparedness to practise across the scope of the profession of speechlanguage therapy. Results: Eighty graduates participated in the study yielding a response rate of 63.49%. Participants perceived themselves to be well prepared in some areas (e.g. child language: 100%; articulation and phonology: 97.26%, but less prepared in other areas (e.g. adult dysphagia: 50.70%; paediatric dysarthria: 46.58%; paediatric dysphagia: 38.36% and most unprepared to provide services requiring sign language (23.61% and African languages (20.55%. There was a significant relationship between perceptions of adequate theory and clinical learning opportunities with assessment and management of dysphagia and perceptions of preparedness to provide dysphagia services. Conclusion: There is a need for review of existing curricula and consideration of developing a standard speech-language therapy curriculum across universities, particularly in service provision to a multilingual population, and in both the theory and clinical learning of the assessment and management of adult and paediatric dysphagia, to better equip graduates for practice.
Today, huge quantities of digital audiovisual resources are already available - everywhere and at any time - through Web portals, online archives and libraries, and video blogs. One central question with respect to this huge amount of audiovisual data is how they can be used in specific (social, pedagogical, etc.) contexts and what are their potential interest for target groups (communities, professionals, students, researchers, etc.).This book examines the question of the (creative) exploitation of digital audiovisual archives from a theoretical, methodological, technical and practical
Ohms, Verena Regina
Birdsong and human speech are both complex behaviours which show striking similarities mainly thought to be present in the area of development and learning. The most important parameters in human speech are vocal tract resonances, called formants. Different formant patterns characterize different vo
Students exhibiting speech deficits may not have the appropriate skills or support structures necessary to obtain adequate or acceptable literacy development as mixed results from past research have indicated that some students with speech impairments have the capacity to gain appropriate literacy skills. The purpose of the qualitative holistic…
Varnet, Léo; Knoblauch, Kenneth; Serniclaes, Willy; Meunier, Fanny; Hoen, Michel
Although there is a large consensus regarding the involvement of specific acoustic cues in speech perception, the precise mechanisms underlying the transformation from continuous acoustical properties into discrete perceptual units remains undetermined. This gap in knowledge is partially due to the lack of a turnkey solution for isolating critical speech cues from natural stimuli. In this paper, we describe a psychoacoustic imaging method known as the Auditory Classification Image technique t...
The work presents a methodology for the analysis of journalistic audiovisual narratives, and instrument of critical reading of news contents and formats which utilize audiovisual language and multimedia resources on TV and on the web. It is assumed that the comprehension of the dynamic combinations of the elements which constitute the audiovisual text contributes to a better perception of the meanings of the news, and that uses of the digital tools in a critical and creative way can collabora...
Robins, Diana L.; Hunyadi, Elinora; Schultz, Robert T.
Perception of emotion is critical for successful social interaction, yet the neural mechanisms underlying the perception of dynamic, audiovisual emotional cues are poorly understood. Evidence from language and sensory paradigms suggests that the superior temporal sulcus and gyrus (STS/STG) play a key role in the integration of auditory and visual cues. Emotion perception research has focused on static facial cues; however, dynamic audiovisual (AV) cues mimic real-world social cues more accura...
Kim, Jongin; Lee, Suh-Kyung; Lee, Boreom
Objective. The objective of this study is to find components that might be related to phoneme representation in the brain and to discriminate EEG responses for each speech sound on a trial basis. Approach. We used multivariate empirical mode decomposition (MEMD) and common spatial pattern for feature extraction. We chose three vowel stimuli, /a/, /i/ and /u/, based on previous findings, such that the brain can detect change in formant frequency (F2) of vowels. EEG activity was recorded from seven native Korean speakers at Gwangju Institute of Science and Technology. We applied MEMD over EEG channels to extract speech-related brain signal sources, and looked for the intrinsic mode functions which were dominant in the alpha bands. After the MEMD procedure, we applied the common spatial pattern algorithm for enhancing the classification performance, and used linear discriminant analysis (LDA) as a classifier. Main results. The brain responses to the three vowels could be classified as one of the learned phonemes on a single-trial basis with our approach. Significance. The results of our study show that brain responses to vowels can be classified for single trials using MEMD and LDA. This approach may not only become a useful tool for the brain-computer interface but it could also be used for discriminating the neural correlates of categorical speech perception.
Slater, Jessica; Skoe, Erika; Strait, Dana L; O'Connell, Samantha; Thompson, Elaine; Kraus, Nina
Music training may strengthen auditory skills that help children not only in musical performance but in everyday communication. Comparisons of musicians and non-musicians across the lifespan have provided some evidence for a "musician advantage" in understanding speech in noise, although reports have been mixed. Controlled longitudinal studies are essential to disentangle effects of training from pre-existing differences, and to determine how much music training is necessary to confer benefits. We followed a cohort of elementary school children for 2 years, assessing their ability to perceive speech in noise before and after musical training. After the initial assessment, participants were randomly assigned to one of two groups: one group began music training right away and completed 2 years of training, while the second group waited a year and then received 1 year of music training. Outcomes provide the first longitudinal evidence that speech-in-noise perception improves after 2 years of group music training. The children were enrolled in an established and successful community-based music program and followed the standard curriculum, therefore these findings provide an important link between laboratory-based research and real-world assessment of the impact of music training on everyday communication skills. PMID:26005127
Researches on the development of learners＇ phonological competence have been done mainly from the aspects of physical prosperities of phonology and interlanguage of L2 acquisition, ignoring the effect of speech perception and production on it. Based on theories of cognition and psychology, this paper attempts to explore the prosperities and pattern of L2 oral reading speech perception. It indicates that learner is the subject of L2 oral reading speech perception, which is constrained by speech organs, cognitive ability and pattern of L1 Speech perception. In Addiction, there exist differences between L1 and I2 phonology perception, psychology perception and concept perception. L2 oral reading is essentially a physical and cognitive experience, the construction basis for the empirically cognitive teaching model.%国内已有的二语朗读研究主要从音系的物理特性和二语习得中介语的角度来探讨学习者音系发展水平，却忽略了言语感知和输出对二语朗读发展水平的作用。研究表明，学习者是二语朗读的主体，二语朗读受到发青器官、认知水平和母语感知方式的制约；二语朗读在语音感知、情感感知和概念感知方面与母语者存在差别。二语朗读的本质是生理和认知的体验性，这一特性正是二语朗读听读说叠加教学模式构建的基础。
Prodi, Nicola; Visentin, Chiara; Feletti, Alice
It is well documented that the interference of noise in the classroom puts younger pupils at a disadvantage for speech perception tasks. Nevertheless, the dependence of this phenomenon on the type of noise, and the way it is realized for each class by a specific combination of intelligibility and effort have not been fully investigated. Following on a previous laboratory study on "listening efficiency," which stems from a combination of accuracy and latency measures, this work tackles the problems above to better understand the basic mechanisms governing the speech perception performance of pupils in noisy classrooms. Listening tests were conducted in real classrooms for a relevant number of students, and tests in quiet were also developed. The statistical analysis is based on stochastic ordering and is able to clarify the behavior of the classes and the different impacts of noises on performance. It is found that the joint babble and activity noise has the worst effect on performance whereas tapping and external traffic noises are less disruptive. PMID:23297900
Kartushina, Natalia; Hervais-Adelman, Alexis; Frauenfelder, Ulrich Hans; Golestani, Narly
Second-language learners often experience major difficulties in producing non-native speech sounds. This paper introduces a training method that uses a real-time analysis of the acoustic properties of vowels produced by non-native speakers to provide them with immediate, trial-by-trial visual feedback about their articulation alongside that of the same vowels produced by native speakers. The Mahalanobis acoustic distance between non-native productions and target native acoustic spaces was used to assess L2 production accuracy. The experiment shows that 1 h of training per vowel improves the production of four non-native Danish vowels: the learners' productions were closer to the corresponding Danish target vowels after training. The production performance of a control group remained unchanged. Comparisons of pre- and post-training vowel discrimination performance in the experimental group showed improvements in perception. Correlational analyses of training-related changes in production and perception revealed no relationship. These results suggest, first, that this training method is effective in improving non-native vowel production. Second, training purely on production improves perception. Finally, it appears that improvements in production and perception do not systematically progress at equal rates within individuals. PMID:26328698
Ross, Lars A.; Molholm, Sophie; Blanco, Daniella; Gomez-Ramirez, Manuel; Saint-Amour, Dave; Foxe, John J.
Observing a speaker’s articulations substantially improves intelligibility of spoken speech, especially under noisy listening conditions. This multisensory integration of speech inputs is crucial to effective communication. Appropriate development of this ability has major implications for children in classroom and social settings, and deficits in it have been linked to a number of neurodevelopmental disorders, especially autism. It is clear from structural imaging studies that there is a pro...
Kelly Cristina Lira de Andrade; Pedro de Lemos Menezes; Aline Tenório Lins Carnaúba; Renato Glauco de Sousa Rodrigues; Mariana de Carvalho Leal; Liliane Desgualdo Pereira
OBJECTIVE: The audibility thresholds for the sound frequency of 137 upward- and downward-sloping audiograms showing sensorineural hearing loss were selected and analyzed in conjunction with speech recognition thresholds obtained from individuals seen at a public otolaryngology clinic to determine which frequencies in slope audiograms best represent speech recognition thresholds. METHOD: The linear regression model and mean square error were used to determine the associations between the thr...
Lim, Sung-joo; Fiez, Julie A.; Holt, Lori L.
Listeners must accomplish two complementary perceptual feats in extracting a message from speech. They must discriminate linguistically-relevant acoustic variability and generalize across irrelevant variability. Said another way, they must categorize speech. Since the mapping of acoustic variability is language-specific, these categories must be learned from experience. Thus, understanding how, in general, the auditory system acquires and represents categories can inform us about the toolbox ...
Full Text Available A number of studies showed that infants reorganize their perception of speech sounds according to their native language categories during their first year of life. Still, information is lacking about the contribution of basic auditory mechanisms to this process. This study aimed to evaluate when native language experience starts to noticeably affect the perceptual processing of basic acoustic cues (i.e., frequency-modulation (FM and amplitude-modulation (AM information known to be crucial for speech perception in adults. The discrimination of a lexical-tone contrast (rising versus low was assessed in 6- and 10-month-old infants learning either French or Mandarin using a visual habituation paradigm. The lexical tones were presented in two conditions designed to either keep intact or to severely degrade the FM and fine spectral cues needed to accurately perceive voice-pitch trajectory. A third condition was designed to assess the discrimination of the same voice-pitch trajectories using click trains containing only the FM cues related to the fundamental-frequency (F0 in French- and Mandarin-learning 10-month-old infants. Results showed that the younger infants of both language groups and the Mandarin-learning 10-month-olds discriminated the intact lexical-tone contrast while French-learning 10-month-olds failed. However, only the French 10-month-olds discriminated degraded lexical tones when FM, and thus voice-pitch cues were reduced. Moreover, Mandarin-learning 10-month-olds were found to discriminate the pitch trajectories as presented in click trains better than French infants. Altogether, these results reveal that the perceptual reorganization occurring during the first year of life for lexical tones is coupled with changes in the auditory ability to use speech modulation cues.
Full Text Available Considerable progress has been made in the treatment of hearing loss with auditory implants. However, there are still many implanted patients that experience hearing deficiencies, such as limited speech understanding or vanishing perception with continuous stimulation (i.e., abnormal loudness adaptation. The present study aims to identify specific patterns of cerebral cortex activity involved with such deficiencies. We performed O-15-water positron emission tomography (PET in patients implanted with electrodes within the cochlea, brainstem, or midbrain to investigate the pattern of cortical activation in response to speech or continuous multi-tone stimuli directly inputted into the implant processor that then delivered electrical patterns through those electrodes. Statistical parametric mapping was performed on a single subject basis. Better speech understanding was correlated with a larger extent of bilateral auditory cortex activation. In contrast to speech, the continuous multi-tone stimulus elicited mainly unilateral auditory cortical activity in which greater loudness adaptation corresponded to weaker activation and even deactivation. Interestingly, greater loudness adaptation was correlated with stronger activity within the ventral prefrontal cortex, which could be up-regulated to suppress the irrelevant or aberrant signals into the auditory cortex. The ability to detect these specific cortical patterns and differences across patients and stimuli demonstrates the potential for using PET to diagnose auditory function or dysfunction in implant patients, which in turn could guide the development of appropriate stimulation strategies for improving hearing rehabilitation. Beyond hearing restoration, our study also reveals a potential role of the frontal cortex in suppressing irrelevant or aberrant activity within the auditory cortex, and thus may be relevant for understanding and treating tinnitus.
Berding, Georg; Wilke, Florian; Rode, Thilo; Haense, Cathleen; Joseph, Gert; Meyer, Geerd J; Mamach, Martin; Lenarz, Minoo; Geworski, Lilli; Bengel, Frank M; Lenarz, Thomas; Lim, Hubert H
Considerable progress has been made in the treatment of hearing loss with auditory implants. However, there are still many implanted patients that experience hearing deficiencies, such as limited speech understanding or vanishing perception with continuous stimulation (i.e., abnormal loudness adaptation). The present study aims to identify specific patterns of cerebral cortex activity involved with such deficiencies. We performed O-15-water positron emission tomography (PET) in patients implanted with electrodes within the cochlea, brainstem, or midbrain to investigate the pattern of cortical activation in response to speech or continuous multi-tone stimuli directly inputted into the implant processor that then delivered electrical patterns through those electrodes. Statistical parametric mapping was performed on a single subject basis. Better speech understanding was correlated with a larger extent of bilateral auditory cortex activation. In contrast to speech, the continuous multi-tone stimulus elicited mainly unilateral auditory cortical activity in which greater loudness adaptation corresponded to weaker activation and even deactivation. Interestingly, greater loudness adaptation was correlated with stronger activity within the ventral prefrontal cortex, which could be up-regulated to suppress the irrelevant or aberrant signals into the auditory cortex. The ability to detect these specific cortical patterns and differences across patients and stimuli demonstrates the potential for using PET to diagnose auditory function or dysfunction in implant patients, which in turn could guide the development of appropriate stimulation strategies for improving hearing rehabilitation. Beyond hearing restoration, our study also reveals a potential role of the frontal cortex in suppressing irrelevant or aberrant activity within the auditory cortex, and thus may be relevant for understanding and treating tinnitus. PMID:26046763
Lőcsei, Gusztáv; Pedersen, Julie H; Laugesen, Søren; Santurette, Sébastien; Dau, Torsten; MacDonald, Ewen N
This study investigated the relationship between speech perception performance in spatially complex, lateralized listening scenarios and temporal fine-structure (TFS) coding at low frequencies. Young normal-hearing (NH) and two groups of elderly hearing-impaired (HI) listeners with mild or moderate hearing loss above 1.5 kHz participated in the study. Speech reception thresholds (SRTs) were estimated in the presence of either speech-shaped noise, two-, four-, or eight-talker babble played reversed, or a nonreversed two-talker masker. Target audibility was ensured by applying individualized linear gains to the stimuli, which were presented over headphones. The target and masker streams were lateralized to the same or to opposite sides of the head by introducing 0.7-ms interaural time differences between the ears. TFS coding was assessed by measuring frequency discrimination thresholds and interaural phase difference thresholds at 250 Hz. NH listeners had clearly better SRTs than the HI listeners. However, when maskers were spatially separated from the target, the amount of SRT benefit due to binaural unmasking differed only slightly between the groups. Neither the frequency discrimination threshold nor the interaural phase difference threshold tasks showed a correlation with the SRTs or with the amount of masking release due to binaural unmasking, respectively. The results suggest that, although HI listeners with normal hearing thresholds below 1.5 kHz experienced difficulties with speech understanding in spatially complex environments, these limitations were unrelated to TFS coding abilities and were only weakly associated with a reduction in binaural-unmasking benefit for spatially separated competing sources. PMID:27601071
Møller, Anders Kalsgaard; Hoffmann, Pablo F.; Carrozzino, Marcello;
The state-of-the-art speech intelligibility tests are created with the purpose of evaluating acoustic communication devices and not for evaluating audio-visual virtual reality systems. This paper present a novel method to evaluate a communication situation based on both the speech intelligibility...
Tse, Chun-Yu; Gratton, Gabriele; Garnsey, Susan M; Novak, Michael A; Fabiani, Monica
Information from different modalities is initially processed in different brain areas, yet real-world perception often requires the integration of multisensory signals into a single percept. An example is the McGurk effect, in which people viewing a speaker whose lip movements do not match the utterance perceive the spoken sounds incorrectly, hearing them as more similar to those signaled by the visual rather than the auditory input. This indicates that audiovisual integration is important for generating the phoneme percept. Here we asked when and where the audiovisual integration process occurs, providing spatial and temporal boundaries for the processes generating phoneme perception. Specifically, we wanted to separate audiovisual integration from other processes, such as simple deviance detection. Building on previous work employing ERPs, we used an oddball paradigm in which task-irrelevant audiovisually deviant stimuli were embedded in strings of non-deviant stimuli. We also recorded the event-related optical signal, an imaging method combining spatial and temporal resolution, to investigate the time course and neuroanatomical substrate of audiovisual integration. We found that audiovisual deviants elicit a short duration response in the middle/superior temporal gyrus, whereas audiovisual integration elicits a more extended response involving also inferior frontal and occipital regions. Interactions between audiovisual integration and deviance detection processes were observed in the posterior/superior temporal gyrus. These data suggest that dynamic interactions between inferior frontal cortex and sensory regions play a significant role in multimodal integration. PMID:25848682
Chandrasekaran, Chandramouli; Trubanova, Andrea; Stillittano, Sébastien; Caplier, Alice; Ghazanfar, Asif A.
Humans, like other animals, are exposed to a continuous stream of signals, which are dynamic, multimodal, extended, and time varying in nature. This complex input space must be transduced and sampled by our sensory systems and transmitted to the brain where it can guide the selection of appropriate actions. To simplify this process, it's been suggested that the brain exploits statistical regularities in the stimulus space. Tests of this idea have largely been confined to unimodal signals and ...
Barlow, Nathan; Purdy, Suzanne C; Sharma, Mridula; Giles, Ellen; Narne, Vijay
This study investigated whether a short intensive psychophysical auditory training program is associated with speech perception benefits and changes in cortical auditory evoked potentials (CAEPs) in adult cochlear implant (CI) users. Ten adult implant recipients trained approximately 7 hours on psychophysical tasks (Gap-in-Noise Detection, Frequency Discrimination, Spectral Rippled Noise [SRN], Iterated Rippled Noise, Temporal Modulation). Speech performance was assessed before and after training using Lexical Neighborhood Test (LNT) words in quiet and in eight-speaker babble. CAEPs evoked by a natural speech stimulus /baba/ with varying syllable stress were assessed pre- and post-training, in quiet and in noise. SRN psychophysical thresholds showed a significant improvement (78% on average) over the training period, but performance on other psychophysical tasks did not change. LNT scores in noise improved significantly post-training by 11% on average compared with three pretraining baseline measures. N1P2 amplitude changed post-training for /baba/ in quiet (p = 0.005, visit 3 pretraining versus visit 4 post-training). CAEP changes did not correlate with behavioral measures. CI recipients' clinical records indicated a plateau in speech perception performance prior to participation in the study. A short period of intensive psychophysical training produced small but significant gains in speech perception in noise and spectral discrimination ability. There remain questions about the most appropriate type of training and the duration or dosage of training that provides the most robust outcomes for adults with CIs. PMID:27587925
Chuen, Lorraine; Schutz, Michael
An observer's inference that multimodal signals originate from a common underlying source facilitates cross-modal binding. This 'unity assumption' causes asynchronous auditory and visual speech streams to seem simultaneous (Vatakis & Spence, Perception & Psychophysics, 69(5), 744-756, 2007). Subsequent tests of non-speech stimuli such as musical and impact events found no evidence for the unity assumption, suggesting the effect is speech-specific (Vatakis & Spence, Acta Psychologica, 127(1), 12-23, 2008). However, the role of amplitude envelope (the changes in energy of a sound over time) was not previously appreciated within this paradigm. Here, we explore whether previous findings suggesting speech-specificity of the unity assumption were confounded by similarities in the amplitude envelopes of the contrasted auditory stimuli. Experiment 1 used natural events with clearly differentiated envelopes: single notes played on either a cello (bowing motion) or marimba (striking motion). Participants performed an un-speeded temporal order judgments task; viewing audio-visually matched (e.g., marimba auditory with marimba video) and mismatched (e.g., cello auditory with marimba video) versions of stimuli at various stimulus onset asynchronies, and were required to indicate which modality was presented first. As predicted, participants were less sensitive to temporal order in matched conditions, demonstrating that the unity assumption can facilitate the perception of synchrony outside of speech stimuli. Results from Experiments 2 and 3 revealed that when spectral information was removed from the original auditory stimuli, amplitude envelope alone could not facilitate the influence of audiovisual unity. We propose that both amplitude envelope and spectral acoustic cues affect the percept of audiovisual unity, working in concert to help an observer determine when to integrate across modalities. PMID:27084701
Nittrouer, Susan; Kuess, Jamie; Lowenstein, Joanna H
Children need to discover linguistically meaningful structures in the acoustic speech signal. Being attentive to recurring, time-varying formant patterns helps in that process. However, that kind of acoustic structure may not be available to children with cochlear implants (CIs), thus hindering development. The major goal of this study was to examine whether children with CIs are as sensitive to time-varying formant structure as children with normal hearing (NH) by asking them to recognize sine-wave speech. The same materials were presented as speech in noise, as well, to evaluate whether any group differences might simply reflect general perceptual deficits on the part of children with CIs. Vocabulary knowledge, phonemic awareness, and "top-down" language effects were all also assessed. Finally, treatment factors were examined as possible predictors of outcomes. Results showed that children with CIs were as accurate as children with NH at recognizing sine-wave speech, but poorer at recognizing speech in noise. Phonemic awareness was related to that recognition. Top-down effects were similar across groups. Having had a period of bimodal stimulation near the time of receiving a first CI facilitated these effects. Results suggest that children with CIs have access to the important time-varying structure of vocal-tract formants. PMID:25994709
It is commonly admitted today that speech perception is more performing in an audiovisual context than in a visual one (Benoît, Mohamadi and Kandel, 1994, Schwartz, Berthommier and Savariaux, 2004). Visual information in this situation often consists of the speaker’s articulatory and facial gestures provided by the face-to-face interaction. However, when learning a foreign language, another type of visual help is generally available to identify oral forms: their written forms. And yet, in the...
Kelly Cristina Lira de Andrade
Full Text Available OBJECTIVE: The audibility thresholds for the sound frequency of 137 upward- and downward-sloping audiograms showing sensorineural hearing loss were selected and analyzed in conjunction with speech recognition thresholds obtained from individuals seen at a public otolaryngology clinic to determine which frequencies in slope audiograms best represent speech recognition thresholds. METHOD: The linear regression model and mean square error were used to determine the associations between the threshold values. RESULT: The mean square error identified larger errors when using thresholds of 500, 1000, and 2000 Hz than when using audibility thresholds of 500, 1000, 2000, and 4000 Hz. The linear regression model showed a higher correlation (91% between the audiogram thresholds for frequencies of 500, 1000, 2000, and 4000 Hz than for the frequencies of 500, 1000, and 2000 Hz (88%. CONCLUSION: Frequencies of 500, 1000, 2000, and 4000 Hz were the most significant in predicting the speech recognition threshold.
Full Text Available Here we use two filtered speech tasks to investigate children’s processing of slow (<4 Hz versus faster (~33 Hz temporal modulations in speech. We compare groups of children with either developmental dyslexia (Experiment 1 or speech and language impairments (SLIs, Experiment 2 to groups of typically-developing (TD children age-matched to each disorder group. Ten nursery rhymes were filtered so that their modulation frequencies were either low-pass filtered (< 4 Hz or band-pass filtered (22 – 40 Hz. Recognition of the filtered nursery rhymes was tested in a picture recognition multiple choice paradigm. Children with dyslexia aged 10 years showed equivalent recognition overall to TD controls for both the low-pass and band-pass filtered stimuli, but showed significantly impaired acoustic learning during the experiment from low-pass filtered targets. Children with oral speech and language impairments (SLIs aged 9 years showed significantly poorer recognition of band pass filtered targets compared to their TD controls, and showed comparable acoustic learning effects to TD children during the experiment. The SLI sample were also divided into children with and without phonological difficulties. The children with both SLI and phonological difficulties were impaired in recognising both kinds of filtered speech. These data are suggestive of impaired temporal sampling of the speech signal at different modulation rates by children with different kinds of developmental language disorder. Both SLI and dyslexic samples showed impaired discrimination of amplitude rise times. Implications of these findings for a temporal sampling framework for understanding developmental language disorders are discussed.
Weidema, Joey L.; Roncaglia-Denissen, M. P.; Honing, Henkjan
Whether pitch in language and music is governed by domain-specific or domain-general cognitive mechanisms is contentiously debated. The aim of the present study was to investigate whether mechanisms governing pitch contour perception operate differently when pitch information is interpreted as either speech or music. By modulating listening mode, this study aspired to demonstrate that pitch contour perception relies on domain-specific cognitive mechanisms, which are regulated by top–down influences from language and music. Three groups of participants (Mandarin speakers, Dutch speaking non-musicians, and Dutch musicians) were exposed to identical pitch contours, and tested on their ability to identify these contours in a language and musical context. Stimuli consisted of disyllabic words spoken in Mandarin, and melodic tonal analogs, embedded in a linguistic and melodic carrier phrase, respectively. Participants classified identical pitch contours as significantly different depending on listening mode. Top–down influences from language appeared to alter the perception of pitch contour in speakers of Mandarin. This was not the case for non-musician speakers of Dutch. Moreover, this effect was lacking in Dutch speaking musicians. The classification patterns of pitch contours in language and music seem to suggest that domain-specific categorization is modulated by top–down influences from language and music. PMID:27313552
Joey L. Weidema
Full Text Available Whether pitch in language and music is governed by domain-specific or domain-general cognitive mechanisms is contentiously debated. The aim of the present study was to investigate whether mechanisms governing pitch contour perception operate differently when pitch information is interpreted as either speech or music. By modulating listening mode, this study aspired to demonstrate that pitch contour perception relies on domain-specific cognitive mechanisms, which are regulated by top-down influences from language and music. Three groups of participants (Mandarin speakers, Dutch speaking non-musicians, and Dutch musicians were exposed to identical pitch contours, and tested on their ability to identify these contours in a language and musical context. Stimuli consisted of disyllabic words spoken in Mandarin, and melodic tonal analogues, embedded in a linguistic and melodic carrier phrase, respectively. Participants classified identical pitch contours as significantly different depending on listening mode. Top-down influences from language appeared to alter the perception of pitch contour in speakers of Mandarin. This was not the case for non-musician speakers of Dutch. Moreover, this effect was lacking in Dutch speaking musicians. The classification patterns of pitch contours in language and music seem to suggest that domain-specific categorization is modulated by top-down influences from language and music.
Weidema, Joey L; Roncaglia-Denissen, M P; Honing, Henkjan
Whether pitch in language and music is governed by domain-specific or domain-general cognitive mechanisms is contentiously debated. The aim of the present study was to investigate whether mechanisms governing pitch contour perception operate differently when pitch information is interpreted as either speech or music. By modulating listening mode, this study aspired to demonstrate that pitch contour perception relies on domain-specific cognitive mechanisms, which are regulated by top-down influences from language and music. Three groups of participants (Mandarin speakers, Dutch speaking non-musicians, and Dutch musicians) were exposed to identical pitch contours, and tested on their ability to identify these contours in a language and musical context. Stimuli consisted of disyllabic words spoken in Mandarin, and melodic tonal analogs, embedded in a linguistic and melodic carrier phrase, respectively. Participants classified identical pitch contours as significantly different depending on listening mode. Top-down influences from language appeared to alter the perception of pitch contour in speakers of Mandarin. This was not the case for non-musician speakers of Dutch. Moreover, this effect was lacking in Dutch speaking musicians. The classification patterns of pitch contours in language and music seem to suggest that domain-specific categorization is modulated by top-down influences from language and music. PMID:27313552
Rafaela Bezerra Façanha Correia
Full Text Available Objective: To evaluate the school hearing health actions developed in the Listen Sobral Project. Methods: Qualitative study, conducted at the Department of Hearing Health Care (SASA of the city of Sobral - CE, Brazil, from April to June, 2010. Study participants were the Listen Sobral Project’s coordinator and four speech therapists attending Multidisciplinary Residency in Family Health, working in partnership with the project. Data collection was performed through semi-structured interviews, adopting the technique of content analysis according to the convergence of speech, in which the categories emerged: school hearing health actions; benefits from the actions; difficulties in developing the actions; and changes for improvement in the actions. Results: According to discourse of speech therapists, one realizes that school hearing health actions are developed centered on health promotion, prevention and early identification of hearing loss. However, weak points were identified, especially regarding the teacher training; partnership between school and speech therapists; ear, nose and throat care; and suitable facilities. Conclusion: School hearing health actions have become part of reality in the city of Sobral, although not yet fully at the present time. It is therefore necessary to maintain these actions, but with some changes toward the elaboration of a more organized structure, in order to promote care of superior quality for school children.
Full Text Available Natural sounds, including vocal communication sounds, contain critical information at multiple time scales. Two essential temporal modulation rates in speech have been argued to be in the low gamma band (~20-80 ms duration information and the theta band (~150-300 ms, corresponding to segmental and syllabic modulation rates, respectively. On one hypothesis, auditory cortex implements temporal integration using time constants closely related to these values. The neural correlates of a proposed dual temporal window mechanism in human auditory cortex remain poorly understood. We recorded MEG responses from participants listening to non-speech auditory stimuli with different temporal structures, created by concatenating frequency-modulated segments of varied segment durations. We show that these non-speech stimuli with temporal structure matching speech-relevant scales (~25 ms and ~200 ms elicit reliable phase tracking in the corresponding associated oscillatory frequencies (low gamma and theta bands. In contrast, stimuli with non-matching temporal structure do not. Furthermore, the topography of theta band phase tracking shows rightward lateralization while gamma band phase tracking occurs bilaterally. The results support the hypothesis that there exists multi-time resolution processing in cortex on discontinuous scales and provide evidence for an asymmetric organization of temporal analysis (asymmetrical sampling in time, AST. The data argue for a macroscopic-level neural mechanism underlying multi-time resolution processing: the sliding and resetting of intrinsic temporal windows on privileged time scales.
Goswami, Usha; Cumming, Ruth; Chait, Maria; Huss, Martina; Mead, Natasha; Wilson, Angela M.; Barnes, Lisa; Fosker, Tim
Here we use two filtered speech tasks to investigate children’s processing of slow (dyslexia (Experiment 1) or speech and language impairments (SLIs, Experiment 2) to groups of typically-developing (TD) children age-matched to each disorder group. Ten nursery rhymes were filtered so that their modulation frequencies were either low-pass filtered (dyslexia aged 10 years showed equivalent recognition overall to TD controls for both the low-pass and band-pass filtered stimuli, but showed significantly impaired acoustic learning during the experiment from low-pass filtered targets. Children with oral SLIs aged 9 years showed significantly poorer recognition of band pass filtered targets compared to their TD controls, and showed comparable acoustic learning effects to TD children during the experiment. The SLI samples were also divided into children with and without phonological difficulties. The children with both SLI and phonological difficulties were impaired in recognizing both kinds of filtered speech. These data are suggestive of impaired temporal sampling of the speech signal at different modulation rates by children with different kinds of developmental language disorder. Both SLI and dyslexic samples showed impaired discrimination of amplitude rise times. Implications of these findings for a temporal sampling framework for understanding developmental language disorders are discussed. PMID:27303348
Goswami, Usha; Cumming, Ruth; Chait, Maria; Huss, Martina; Mead, Natasha; Wilson, Angela M; Barnes, Lisa; Fosker, Tim
Here we use two filtered speech tasks to investigate children's processing of slow (dyslexia (Experiment 1) or speech and language impairments (SLIs, Experiment 2) to groups of typically-developing (TD) children age-matched to each disorder group. Ten nursery rhymes were filtered so that their modulation frequencies were either low-pass filtered (dyslexia aged 10 years showed equivalent recognition overall to TD controls for both the low-pass and band-pass filtered stimuli, but showed significantly impaired acoustic learning during the experiment from low-pass filtered targets. Children with oral SLIs aged 9 years showed significantly poorer recognition of band pass filtered targets compared to their TD controls, and showed comparable acoustic learning effects to TD children during the experiment. The SLI samples were also divided into children with and without phonological difficulties. The children with both SLI and phonological difficulties were impaired in recognizing both kinds of filtered speech. These data are suggestive of impaired temporal sampling of the speech signal at different modulation rates by children with different kinds of developmental language disorder. Both SLI and dyslexic samples showed impaired discrimination of amplitude rise times. Implications of these findings for a temporal sampling framework for understanding developmental language disorders are discussed. PMID:27303348
Mitterer, Holger; Kim, Sahyang; Cho, Taehong
In connected speech, phonological assimilation to neighboring words can lead to pronunciation variants (e.g., "garden bench" [arrow right] "garde'm' bench"). A large body of literature suggests that listeners use the phonetic context to reconstruct the intended word for assimilation types that often lead to incomplete assimilations (e.g., a…
Blood, Gordon W.; Blood, Ingrid M.; Coniglio, Amy D.; Finke, Erinn H.; Boyle, Michael P.
Children with autism spectrum disorders (ASD) are primary targets for bullies and victimization. Research shows school personnel may be uneducated about bullying and ways to intervene. Speech-language pathologists (SLPs) in schools often work with children with ASD and may have victims of bullying on their caseloads. These victims may feel most…
Compton, Mary V.; Tucker, Denise A.; Flynn, Perry F.
This study examined the level of preparedness of North Carolina speech-language pathologists (SLPs) who serve school-aged children with cochlear implants (CIs). A survey distributed to 190 school-based SLPs in North Carolina revealed that 79% of the participants felt they had little to no confidence in managing CI technology or in providing…
Hazan, Valerie; Messaoud-Galusi, Souhila; Rosen, Stuart
Purpose: In this study, the authors aimed to determine whether children with dyslexia (hereafter referred to as "DYS children") are more affected than children with average reading ability (hereafter referred to as "AR children") by talker and intonation variability when perceiving speech in noise. Method: Thirty-four DYS and 25 AR children were…
Lansing, Charissa R.; McConkie, George W.
Two experiments were conducted to test the hypothesis that visual information related to segmental versus prosodic aspects of speech is distributed differently on the face of the talker. Results indicate that information in the upper part of the talker's face is more critical for intonation pattern decisions than for decisions about word segments…
Obrig, Hellmuth; Mentzel, Julia; Rossi, Sonja
SEE CAPPA DOI101093/BRAIN/AWW090 FOR A SCIENTIFIC COMMENTARY ON THIS ARTICLE : The phonological structure of speech supports the highly automatic mapping of sound to meaning. While it is uncontroversial that phonotactic knowledge acts upon lexical access, it is unclear at what stage these combinatorial rules, governing phonological well-formedness in a given language, shape speech comprehension. Moreover few studies have investigated the neuronal network affording this important step in speech comprehension. Therefore we asked 70 participants-half of whom suffered from a chronic left hemispheric lesion-to listen to 252 different monosyllabic pseudowords. The material models universal preferences of phonotactic well-formedness by including naturally spoken pseudowords and digitally reversed exemplars. The latter partially violate phonological structure of all human speech and are rich in universally dispreferred phoneme sequences while preserving basic auditory parameters. Language-specific constraints were modelled in that half of the naturally spoken pseudowords complied with the phonotactics of the native language of the monolingual participants (German) while the other half did not. To ensure universal well-formedness and naturalness, the latter stimuli comply with Slovak phonotactics and all stimuli were produced by an early bilingual speaker. To maximally attenuate lexico-semantic influences, transparent pseudowords were avoided and participants had to detect immediate repetitions, a task orthogonal to the contrasts of interest. The results show that phonological 'well-formedness' modulates implicit processing of speech at different levels: universally dispreferred phonological structure elicits early, medium and late latency differences in the evoked potential. On the contrary, the language-specific phonotactic contrast selectively modulates a medium latency component of the event-related potentials around 400 ms. Using a novel event-related potential
Petersen, Bjørn; Sørensen, Stine Derdau; Pedersen, Ellen Raben;
measures of rehabilitation are important throughout adolescence. Music training may provide a beneficial method of strengthening not only music perception, but also linguistic skills, particularly prosody. The purpose of this study was to examine perception of music and speech and music engagement of...... enjoyment. RESULTS CI users significantly improved their overall music perception and discrimination of melodic contour and rhythm in particular. No effect of the music training was found on discrimination of emotional prosody or speech. The CI users described levels of music engagement and enjoyment that...... were comparable to the NH reference. Furthermore, in general, the adolescent CI users gave positive ratings of the quality of music through their implant. The CI participants showed great commitment, but found music making activities more relevant than computer based training. DISCUSSION Given the...
Full Text Available Noise-vocoding is a transformation which, when applied to speech, severely reduces spectral resolution and eliminates periodicity, yielding a stimulus that sounds like a harsh whisper (Scott, Blank et al. 2000. This process simulates a cochlear implant, where the activity of many thousand hair cells in the inner ear is replaced by direct stimulation of the auditory nerve by a small number of tonotopically-arranged electrodes. Although a cochlear implant offers a powerful means of restoring some degree of hearing to profoundly deaf individuals, the outcomes for spoken communication are highly variable (Moore and Shannon 2009. Some variability may arise from differences in peripheral representation (e.g. the degree of residual nerve survival but some may reflect differences in higher-order linguistic processing. In order to explore this possibility, we used noise-vocoding to explore speech recognition and perceptual learning in normal-hearing listeners tested across several levels of the linguistic hierarchy: segments (consonants and vowels, single words, and sentences. Listeners improved significantly on all tasks across two test sessions. In the first session, individual differences analyses revealed two independently varying sources of variability: one lexico-semantic in nature and implicating the recognition of words and sentences, and the other an acoustic-phonetic factor associated with words and segments. However, consequent to learning, by the second session there was a more uniform covariance pattern concerning all stimulus types. A further analysis of phonetic feature recognition allowed greater insight into learning-related changes in perception and showed that, surprisingly, participants did not make full use of cues that were preserved in the stimuli (e.g. vowel duration. We discuss these findings in relation cochlear implantation, and suggest auditory training strategies to maximise speech recognition performance in the absence of
Audiovisual quality assessment is one of the major challenges in multimedia communications. Traditionally, algorithm-based (objective) assessment methods have focused primarily on the compression artifacts. However, compression is only one of the numerous factors influencing the perception. In co...
The University of Arizona's Agriculture Department has found that video cassette systems and 8 mm films are excellent audiovisual aids to classroom instruction at the high school level in small gasoline engines. Each system is capable of improving the instructional process for motor skill development. (MW)
Zekveld, A.A.; Kramer, S.E.; Kessens, J.M.; Vlaming, M.S.M.G.; Houtgast, T.
OBJECTIVES: The aim of this study was to evaluate the benefit that listeners obtain from visually presented output from an automatic speech recognition (ASR) system during listening to speech in noise. DESIGN: Auditory-alone and audiovisual speech reception thresholds (SRTs) were measured. The SRT i
Mitterer, H.; Kim, S.; Cho, T.
In connected speech, phonological assimilation to neighboring words can lead to pronunciation variants (e.g., 'garden bench'→ "gardem bench"). A large body of literature suggests that listeners use the phonetic context to reconstruct the intended word for assimilation types that often lead to incomplete assimilations (e.g., a pronunciation of "garden" that carries cues for both a labial [m] and an alveolar [n]). In the current paper, we show that a similar context effect is observed for an as...
Gilbers, Steven; Fuller, Christina; Gilbers, Dicky; Broersma, Mirjam; Goudbeek, Martijn; Free, Rolien; Başkent, Deniz
In cochlear implants (CIs), acoustic speech cues, especially for pitch, are delivered in a degraded form. This study’s aim is to assess whether due to degraded pitch cues, normal-hearing listeners and CI users employ different perceptual strategies to recognize vocal emotions, and, if so, how these differ. Voice actors were recorded pronouncing a nonce word in four different emotions: anger, sadness, joy, and relief. These recordings’ pitch cues were phonetically analyzed. The recordings were...
Van den Bogaert, Tim; Doclo, Simon; Wouters, Jan; Moonen, Marc
Multi-microphone noise reduction algorithms are commonly implemented in modern hearing aids to improve speech intelligibility in noisy environments. The development of these algorithms has mostly focused on monaural systems. The human auditory system is a binaural system which compares and combines the signals received by both ears to perceive and localize a single sound source. Providing two monaural, independently operating, noise reduction systems (a bilateral configuration) to the hearing...
Giraud, Anne-Lise; Kleinschmidt, Andreas; Poeppel, David; Lund, Torben E; Frackowiak, Richard S J; Laufs, Helmut
Across multiple timescales, acoustic regularities of speech match rhythmic properties of both the auditory and motor systems. Syllabic rate corresponds to natural jaw-associated oscillatory rhythms, and phonemic length could reflect endogenous oscillatory auditory cortical properties. Hemispheric...... spontaneous EEG power variations within the gamma range (phonemic rate) correlate best with left auditory cortical synaptic activity, while fluctuations within the theta range correlate best with that in the right. Power fluctuations in both ranges correlate with activity in the mouth premotor region...
Schlueter, Anne; Brand, Thomas; Lemke, Ulrike; Nitzschner, Stefan; Kollmeier, Birger; Holube, Inga
Positive signal-to-noise ratios (SNRs) characterize listening situations most relevant for hearing-impaired listeners in daily life and should therefore be considered when evaluating hearing aid algorithms. For this, a speech-in-noise test was developed and evaluated, in which the background noise is presented at fixed positive SNRs and the speech rate (i.e., the time compression of the speech material) is adaptively adjusted. In total, 29 younger and 12 older normal-hearing, as well as 24 older hearing-impaired listeners took part in repeated measurements. Younger normal-hearing and older hearing-impaired listeners conducted one of two adaptive methods which differed in adaptive procedure and step size. Analysis of the measurements with regard to list length and estimation strategy for thresholds resulted in a practical method measuring the time compression for 50% recognition. This method uses time-compression adjustment and step sizes according to Versfeld and Dreschler [(2002). J. Acoust. Soc. Am. 111, 401-408], with sentence scoring, lists of 30 sentences, and a maximum likelihood method for threshold estimation. Evaluation of the procedure showed that older participants obtained higher test-retest reliability compared to younger participants. Depending on the group of listeners, one or two lists are required for training prior to data collection. PMID:26627804
Helfer, Karen S.; Freyman, Richard L.
A large sentence corpus has been developed for use in speech recognition research. Sentences (n=881, three scoring words per sentence) were developed under 23 topics. In the first phase of development subjects rated each individual scoring word for relatedness to its given topic on a Likert scale. Next, two groups of young, normal-hearing listeners (n=16/group) listened and responded to the recordings of the sentences (spoken by a female talker) presented with one of two types of maskers: steady-state noise (S:N=-13 dB) or two other females speaking random sentences (S:N=-8 dB). Each subject responded to half of the sentences with topic supplied and half with no topic supplied. Data analyses focused on addressing two questions: whether supplementation of topic would be more important in the presence of the speech masker versus the noise masker, and how the degree of relatedness of each key word to the topic influenced the effect of topic on recognition. The data showed little difference in how beneficial the topic was for speech versus noise maskers. Moreover, there was a complex relationship between effect of topic, type of masker, and position of the word in the sentence. [Work supported by NIDCD DC01625.
Full Text Available Many figurative expressions are fully conventionalized in everyday speech. Regarding the neural basis of figurative language processing, research has predominantly focused on metaphoric expressions in minimal semantic context. It remains unclear in how far metaphoric expressions during continuous text comprehension activate similar neural networks as isolated metaphors. We therefore investigated the processing of similes (figurative language, e.g. He smokes like a chimney! occurring in a short story.Sixteen healthy, male, native German speakers listened to similes that came about naturally in a short story, while blood-oxygenation-level-dependent (BOLD responses were measured with functional magnetic resonance imaging (fMRI. For the event-related analysis, similes were contrasted with non-figurative control sentences. The stimuli differed with respect to figurativeness, while they were matched for frequency of words, number of syllables, plausibility and comprehensibility.Similes contrasted with control sentences resulted in enhanced BOLD responses in the left inferior (IFG and adjacent middle frontal gyrus. Concrete control sentences as compared to similes activated the bilateral middle temporal gyri as well as the right precuneus and the left middle frontal gyrus.Activation of the left IFG for similes in a short story is consistent with results on single sentence metaphor processing. The findings strengthen the importance of the left inferior frontal region in the processing of abstract figurative speech during continuous, ecologically-valid speech comprehension; the processing of concrete semantic contents goes along with a down-regulation of bilateral temporal regions.
Lewkowicz, David J.; Flom, Ross
Binding is key in multisensory perception. This study investigated the audio-visual (A-V) temporal binding window in 4-, 5-, and 6-year-old children (total N = 120). Children watched a person uttering a syllable whose auditory and visual components were either temporally synchronized or desynchronized by 366, 500, or 666 ms. They were asked…
Full Text Available Arjun SinghDepartment of Pathology, Sri Venkateshwara Medical College Hospital and Research Centre, Pondicherry, IndiaPurpose: We use different methods to train our undergraduates. The patient-oriented problem-solving (POPS system is an innovative teaching–learning method that imparts knowledge, enhances intrinsic motivation, promotes self learning, encourages clinical reasoning, and develops long-lasting memory. The aim of this study was to develop POPS in teaching pathology, assess its effectiveness, and assess students’ preference for POPS over didactic lectures.Method: One hundred fifty second-year MBBS students were divided into two groups: A and B. Group A was taught by POPS while group B was taught by traditional lectures. Pre- and post-test numerical scores of both groups were evaluated and compared. Students then completed a self-structured feedback questionnaire for analysis.Results: The mean (SD difference in pre- and post-test scores of groups A and B was 15.98 (3.18 and 7.79 (2.52, respectively. The significance of the difference between scores of group A and group B teaching methods was 16.62 (P < 0.0001, as determined by the z-test. Improvement in post-test performance of group A was significantly greater than of group B, demonstrating the effectiveness of POPS. Students responded that POPS facilitates self-learning, helps in understanding topics, creates interest, and is a scientific approach to teaching. Feedback response on POPS was strong in 57.52% of students, moderate in 35.67%, and negative in only 6.81%, showing that 93.19% students favored POPS over simple lectures.Conclusion: It is not feasible to enforce the PBL method of teaching throughout the entire curriculum; However, POPS can be incorporated along with audiovisual aids to break the monotony of dialectic lectures and as alternative to PBL.Keywords: medical education, problem-solving exercise, problem-based learning
Ogawa, Akitoshi; Bordier, Cecile; Macaluso, Emiliano
The use of naturalistic stimuli to probe sensory functions in the human brain is gaining increasing interest. Previous imaging studies examined brain activity associated with the processing of cinematographic material using both standard "condition-based" designs, as well as "computational" methods based on the extraction of time-varying features of the stimuli (e.g. motion). Here, we exploited both approaches to investigate the neural correlates of complex visual and auditory spatial signals in cinematography. In the first experiment, the participants watched a piece of a commercial movie presented in four blocked conditions: 3D vision with surround sounds (3D-Surround), 3D with monaural sound (3D-Mono), 2D-Surround, and 2D-Mono. In the second experiment, they watched two different segments of the movie both presented continuously in 3D-Surround. The blocked presentation served for standard condition-based analyses, while all datasets were submitted to computation-based analyses. The latter assessed where activity co-varied with visual disparity signals and the complexity of auditory multi-sources signals. The blocked analyses associated 3D viewing with the activation of the dorsal and lateral occipital cortex and superior parietal lobule, while the surround sounds activated the superior and middle temporal gyri (S/MTG). The computation-based analyses revealed the effects of absolute disparity in dorsal occipital and posterior parietal cortices and of disparity gradients in the posterior middle temporal gyrus plus the inferior frontal gyrus. The complexity of the surround sounds was associated with activity in specific sub-regions of S/MTG, even after accounting for changes of sound intensity. These results demonstrate that the processing of naturalistic audio-visual signals entails an extensive set of visual and auditory areas, and that computation-based analyses can track the contribution of complex spatial aspects characterizing such life-like stimuli. PMID
Full Text Available The use of naturalistic stimuli to probe sensory functions in the human brain is gaining increasing interest. Previous imaging studies examined brain activity associated with the processing of cinematographic material using both standard "condition-based" designs, as well as "computational" methods based on the extraction of time-varying features of the stimuli (e.g. motion. Here, we exploited both approaches to investigate the neural correlates of complex visual and auditory spatial signals in cinematography. In the first experiment, the participants watched a piece of a commercial movie presented in four blocked conditions: 3D vision with surround sounds (3D-Surround, 3D with monaural sound (3D-Mono, 2D-Surround, and 2D-Mono. In the second experiment, they watched two different segments of the movie both presented continuously in 3D-Surround. The blocked presentation served for standard condition-based analyses, while all datasets were submitted to computation-based analyses. The latter assessed where activity co-varied with visual disparity signals and the complexity of auditory multi-sources signals. The blocked analyses associated 3D viewing with the activation of the dorsal and lateral occipital cortex and superior parietal lobule, while the surround sounds activated the superior and middle temporal gyri (S/MTG. The computation-based analyses revealed the effects of absolute disparity in dorsal occipital and posterior parietal cortices and of disparity gradients in the posterior middle temporal gyrus plus the inferior frontal gyrus. The complexity of the surround sounds was associated with activity in specific sub-regions of S/MTG, even after accounting for changes of sound intensity. These results demonstrate that the processing of naturalistic audio-visual signals entails an extensive set of visual and auditory areas, and that computation-based analyses can track the contribution of complex spatial aspects characterizing such life
Jepsen, Morten Løve
A better understanding of how the human auditory system represents and analyzes sounds and how hearing impairment affects such processing is of great interest for researchers in the fields of auditory neuroscience, audiology, and speech communication as well as for applications in hearing-instrument...... was shown that an accurate simulation of cochlear input-output functions, in addition to the audiogram, played a major role in accounting both for sensitivity and supra-threshold processing. Finally, the model was used as a front-end in a framework developed to predict consonant discrimination in a...
Noordenbos, M. W.; Segers, E.; Serniclaes, W.; Mitterer, H.; Verhoeven, L.
There is ample evidence that individuals with dyslexia have a phonological deficit. A growing body of research also suggests that individuals with dyslexia have problems with categorical perception, as evidenced by weaker discrimination of between-category differences and better discrimination of within-category differences compared to average…
Rivers, Kenyatta O.; Perkins, Rosalie A.; Carson, Cecyle P.
Background: Formal training in dealing with death and dying issues is not a standard content area in communication sciences and disorders programmes' curricula. At the same time, it cannot be presumed that pre-professional students' personal background equips them to deal with these issues. Aim: To investigate the perceptions of pre-professional…
Auditory Neuropathy Spectrum Disorder (ANSD) is diagnosed by the presence of outer hair cell function, and absence or severe abnormality of the auditory brainstem response (ABR). Within the spectrum of ANSD, level of severity varies greatly in two domains: hearing thresholds can range from normal levels to a profound hearing loss, and degree of speech perception impairment also varies. The latter gives a meaningful indication of severity in ANSD. As the ABR does not relate to functional perfo...
Strelcyk, Olaf; Dau, Torsten
performance than the normally hearing in terms of frequency selectivity and fine-structure processing, despite normal audiometric thresholds at the test frequencies. However, the binaural fine-structure processing was not found to be particularly vulnerable to interfering noise in these listeners....... consisted of groups with homogeneous, symmetric audiograms. The perceptual listening experiments assessed the intelligibility of full-spectrum as well as low-pass filtered speech in the presence of stationary and fluctuating interferers, the individual's frequency selectivity and the integrity of temporal...... fine-structure processing. The latter was addressed in a binaural and a monaural experiment. In the binaural experiment, the lateralization threshold was measured for low-frequency tones with ongoing interaural phase delays. In the monaural experiment, detection thresholds for low-rate frequency...
Kitamura, Miho S; Watanabe, Katsumi; Kitagawa, Norimichi
It has been shown that positive emotions can facilitate integrative and associative information processing in cognitive functions. The present study examined whether emotions in observers can also enhance perceptual integrative processes. We tested 125 participants in total for revealing the effects of emotional states and traits in observers on the multisensory binding between auditory and visual signals. Participants in Experiment 1 observed two identical visual disks moving toward each other, coinciding, and moving away, presented with a brief sound. We found that for participants with lower depressive tendency, induced happy moods increased the width of the temporal binding window of the sound-induced bounce percept in the stream/bounce display, while no effect was found for the participants with higher depressive tendency. In contrast, no effect of mood was observed for a simple audiovisual simultaneity discrimination task in Experiment 2. These results provide the first empirical evidence of a dependency of multisensory binding upon emotional states and traits, revealing that positive emotions can facilitate the multisensory binding processes at a perceptual level. PMID:26834585
Bürki-Cohen, J; Grosjean, F; Miller, J L
The categorical perception paradigm was used to investigate whether French-English bilinguals categorize a code-switched word as French or English on the basis of its acoustic-phonetic information alone or whether they are influenced by the base-language context in which the word occurs, that is, by the language in which the majority of words are spoken. Subjects identified stimuli from computer-edited series that ranged from an English to a French word as either the English or the French endpoint. The stimuli were preceded by either an English or a French context sentence. In accord with previous studies (Grosjean, 1988), it was found that the base language had a contrastive effect on the perception of a code-switched word when the endpoints of the between-language series were phonetically marked as English and French, respectively. When the endpoints of the series were phonetically unmarked and thus compatible with either language, however, no effect of the base language was found; in particular, we failed to find the assimilative effect that has been observed with other paradigms (Grosjean, 1988; Soares and Grosjean, 1984; Macnamara and Kushnir, 1971). The current results provide confirming evidence that the perception of a code-switched word is influenced by the base-language context in which it occurs and, moreover, that the nature of the effect depends on the acoustic-phonetic characteristics of the code-switched word. In addition, the finding that a contrastive effect occurs across all paradigms used to date, but that an assimilative effect occurs in only some paradigms, suggests that these two context effects may arise at different stages of processing. PMID:2485850
Full Text Available This article questions how different sorts of audio-visual mappings may be perceived. Clearly perceivable cause and effect relationships can be problematic if one desires the audience to experience the music. Indeed perception would bias those sonic qualities that fit previous concepts of causation, subordinating other sonic qualities, which may form the relations between the sounds themselves. The question is, how can an audio-visual mapping produce a sense of causation, and simultaneously confound the actual cause-effect relationships. We call this a fungible audio-visual mapping; the present investigation seeks to glean its constitution and aspect. We report a study, which draws upon methods from experimental psychology to inform audio-visual instrument design and composition. The participants are shown several audio-visual mapping prototypes, and posed quantitative and qualitative questions. These questions respect to their sense of causation, and their sense of understanding the cause-effect relationships. The study shows that a fungible mapping requires both synchronized and seemingly non-related components – sufficient complexity to be confusing. As the specific cause-effect concepts remain inconclusive, the sense of causation embraces the whole.
Full Text Available This work analyzed the perceptual attributes of natural dynamic audiovisual scenes. We presented thirty participants with 19 natural scenes in a similarity categorization task, followed by a semi-structured interview. The scenes were reproduced with an immersive audiovisual display. Natural scene perception has been studied mainly with unimodal settings, which have identified motion as one of the most salient attributes related to visual scenes, and sound intensity along with pitch trajectories related to auditory scenes. However, controlled laboratory experiments with natural multimodal stimuli are still scarce. Our results show that humans pay attention to similar perceptual attributes in natural scenes, and a two-dimensional perceptual map of the stimulus scenes and perceptual attributes was obtained in this work. The exploratory results show the amount of movement, perceived noisiness, and eventfulness of the scene to be the most important perceptual attributes in naturalistically reproduced real-world urban environments. We found the scene gist properties openness and expansion to remain as important factors in scenes with no salient auditory or visual events. We propose that the study of scene perception should move forward to understand better the processes behind multimodal scene processing in real-world environments. We publish our stimulus scenes as spherical video recordings and sound field recordings in a publicly available database.
LIU Peng; WANG Zuoying
In this paper we address the problem of audio-visual speech recognition in the framework of the multi-stream hidden Markov model. Stream weight training based on minimum classification error criterion is discussed for use in large vocabulary continuous speech recognition (LVCSR). We present the lattice re-scoring and Viterbi approaches for calculating the loss function of continuous speech. The experimental results show that in the case of clean audio, the system performance can be improved by 36.1% in relative word error rate reduction when using state-based stream weights trained by a Viterbi approach, compared to an audio only speech recognition system. Further experimental results demonstrate that our audio-visual LVCSR system provides significant enhancement of robustness in noisy environments.
Clark, G M
. The most recent development, however, presents temporal frequency information as amplitude variations at a constant rate of stimulation. 8. As additional speech frequencies have been encoded as place of stimulation, the mean speech perception scores have continued to increase and are now better than the average scores that severely-profoundly deaf adults and children with some residual hearing obtain with a hearing aid. PMID:8911712
Kim, Min-Beom; Shim, Hyun-Yong; Jin, Sun Hwa; Kang, Soojin; Woo, Jihwan; Han, Jong Chul; Lee, Ji Young; Kim, Martha; Cho, Yang-Sun
Evidence of visual-auditory cross-modal plasticity in deaf individuals has been widely reported. Superior visual abilities of deaf individuals have been shown to result in enhanced reactivity to visual events and/or enhanced peripheral spatial attention. The goal of this study was to investigate the association between visual-auditory cross-modal plasticity and speech perception in post-lingually deafened, adult cochlear implant (CI) users. Post-lingually deafened adults with CIs (N = 14) and a group of normal hearing, adult controls (N = 12) participated in this study. The CI participants were divided into a good performer group (good CI, N = 7) and a poor performer group (poor CI, N = 7) based on word recognition scores. Visual evoked potentials (VEP) were recorded from the temporal and occipital cortex to assess reactivity. Visual field (VF) testing was used to assess spatial attention and Goldmann perimetry measures were analyzed to identify differences across groups in the VF. The association of the amplitude of the P1 VEP response over the right temporal or occipital cortex among three groups (control, good CI, poor CI) was analyzed. In addition, the association between VF by different stimuli and word perception score was evaluated. The P1 VEP amplitude recorded from the right temporal cortex was larger in the group of poorly performing CI users than the group of good performers. The P1 amplitude recorded from electrodes near the occipital cortex was smaller for the poor performing group. P1 VEP amplitude in right temporal lobe was negatively correlated with speech perception outcomes for the CI participants (r = -0.736, P = 0.003). However, P1 VEP amplitude measures recorded from near the occipital cortex had a positive correlation with speech perception outcome in the CI participants (r = 0.775, P = 0.001). In VF analysis, CI users showed narrowed central VF (VF to low intensity stimuli). However, their far peripheral VF (VF to high intensity stimuli) was
Full Text Available Evidence of visual-auditory cross-modal plasticity in deaf individuals has been widely reported. Superior visual abilities of deaf individuals have been shown to result in enhanced reactivity to visual events and/or enhanced peripheral spatial attention. The goal of this study was to investigate the association between visual-auditory cross-modal plasticity and speech perception in post-lingually deafened, adult cochlear implant (CI users. Post-lingually deafened adults with CIs (N = 14 and a group of normal hearing, adult controls (N = 12 participated in this study. The CI participants were divided into a good performer group (good CI, N = 7 and a poor performer group (poor CI, N = 7 based on word recognition scores. Visual evoked potentials (VEP were recorded from the temporal and occipital cortex to assess reactivity. Visual field (VF testing was used to assess spatial attention and Goldmann perimetry measures were analyzed to identify differences across groups in the VF. The association of the amplitude of the P1 VEP response over the right temporal or occipital cortex among three groups (control, good CI, poor CI was analyzed. In addition, the association between VF by different stimuli and word perception score was evaluated. The P1 VEP amplitude recorded from the right temporal cortex was larger in the group of poorly performing CI users than the group of good performers. The P1 amplitude recorded from electrodes near the occipital cortex was smaller for the poor performing group. P1 VEP amplitude in right temporal lobe was negatively correlated with speech perception outcomes for the CI participants (r = -0.736, P = 0.003. However, P1 VEP amplitude measures recorded from near the occipital cortex had a positive correlation with speech perception outcome in the CI participants (r = 0.775, P = 0.001. In VF analysis, CI users showed narrowed central VF (VF to low intensity stimuli. However, their far peripheral VF (VF to high intensity
Mentovich, Avital; Huq, Aziz; Cerf, Moran
The U.S. Supreme Court has increasingly expanded the scope of constitutional rights granted to corporations and other collective entities. Although this tendency receives widespread public and media attention, little empirical research examines how people ascribe rights, commonly thought to belong to natural persons, to corporations. This article explores this issue in 3 studies focusing on different rights (religious liberty, privacy, and free speech). We examined participants' willingness to grant a given right while manipulating the type of entity at stake (from small businesses, to larger corporations, to for-profit and nonprofit companies), and the identity of the right holder (from employees, to owners, to the company itself as a separate entity). We further examined the role of political ideology in perceptions of rights. Results indicated a significant decline in the degree of recognition of entities' rights (the company itself) in comparison to natural persons' rights (owners and employees). Results also demonstrated an effect of the type of entity at stake: Larger, for-profit businesses were less likely to be viewed as rights holders compared with nonprofit entities. Although both tendencies persisted across the ideological spectrum, ideological differences emerged in the relations between corporate and individual rights: these were positively related among conservatives but negatively related among liberals. Finally, we found that the desire to protect citizens (compared with businesses) underlies individuals' willingness to grant rights to companies. These findings show that people (rather than corporations) are more appropriate recipients of rights, and can explain public backlash to judicial expansions of corporate rights. (PsycINFO Database Record PMID:26502001
Full Text Available There is a wide range of acoustic and visual variability across different talkers and different speaking contexts. Listeners with normal hearing accommodate that variability in ways that facilitate efficient perception, but it is not known whether listeners with cochlear implants can do the same. In this study, listeners with normal hearing (NH and listeners with cochlear implants (CIs were tested for accommodation to auditory and visual phonetic contexts created by gender-driven speech differences as well as vowel coarticulation and lip rounding in both consonants and vowels. Accommodation was measured as the shifting of perceptual boundaries between /s/ and /ʃ/ sounds in various contexts, as modeled by mixed-effects logistic regression. Owing to the spectral contrasts thought to underlie these context effects, CI listeners were predicted to perform poorly, but showed considerable success. Listeners with cochlear implants not only showed sensitivity to auditory cues to gender, they were also able to use visual cues to gender (i.e. faces as a supplement or proxy for information in the acoustic domain, in a pattern that was not observed for listeners with normal hearing. Spectrally-degraded stimuli heard by listeners with normal hearing generally did not elicit strong context effects, underscoring the limitations of noise vocoders and/or the importance of experience with electric hearing. Visual cues for consonant lip rounding and vowel lip rounding were perceived in a manner consistent with coarticulation and were generally used more heavily by listeners with CIs. Results suggest that listeners with cochlear implants are able to accommodate various sources of acoustic variability either by attending to appropriate acoustic cues or by inferring them via the visual signal.
Rosana Maria Tristão
Full Text Available A fala humana é um som de grande complexidade, cujo processamento perceptual, produção e relações com a linguagem e a cognição necessitam de uma análise integrada, tanto do ponto de vista do conhecimento disponível como também das especificidades metodológicas. Neste artigo faz-se uma breve revisão da literatura sobre as principais aquisições e desenvolvimento da linguagem no primeiro ano de vida de bebês com desenvolvimento normal com enfoque na percepção da fala humana. Busca-se, também, analisar a ocorrência de distúrbios auditivos que podem causar alterações na percepção da fala, com possíveis implicações para o desenvolvimento pré-lingüístico. Atenção especial é dada ao desenvolvimento da habilidade de percepção de fala e de linguagem em bebês com síndrome de Down. É analisada a predisposição, nesta população, a problemas audiológicos, sua relação com alterações no desenvolvimento de linguagem, e a tendência apresentada no primeiro ano de vida para padrões diferenciados de atenção à fala.Human speech is a highly complex sound; whose perceptual processing, production and relations to language and cognition require an integrated analysis, not only from the viewpoint of available knowledge but also of its methodological specificities. This article presents a brief review of the literature on the main acquisitions and development of language in the first year of life of normally developing infants, with emphasis on speech perception. One also analyzes the occurrence of auditory disturbances in the first year of life that could jeopardize speech perception, with possible implications for pre-linguistic development. Special attention is give to the development of speech perception and language in Down syndrome infants. The predisposition to audiologic problems, its relation to impairment in the development of language, and the tendency presented in the first year of life of differential patterns
Bonnard, Damien; Lautissier, Sylvie; Bosset-Audoit, Amélie; Coriat, Géraldine; Beraha, Max; Maunoury, Antoine; Martel, Jacques; Darrouzet, Vincent; Bébéar, Jean-Pierre; Dauman, René
An alternative to bilateral cochlear implantation is offered by the Neurelec Digisonic(®) SP Binaural cochlear implant, which allows stimulation of both cochleae within a single device. The purpose of this prospective study was to compare a group of Neurelec Digisonic(®) SP Binaural implant users (denoted BINAURAL group, n = 7) with a group of bilateral adult cochlear implant users (denoted BILATERAL group, n = 6) in terms of speech perception, sound localization, and self-assessment of health status and hearing disability. Speech perception was assessed using word recognition at 60 dB SPL in quiet and in a 'cocktail party' noise delivered through five loudspeakers in the hemi-sound field facing the patient (signal-to-noise ratio = +10 dB). The sound localization task was to determine the source of a sound stimulus among five speakers positioned between -90° and +90° from midline. Change in health status was assessed using the Glasgow Benefit Inventory and hearing disability was evaluated with the Abbreviated Profile of Hearing Aid Benefit. Speech perception was not statistically different between the two groups, even though there was a trend in favor of the BINAURAL group (mean percent word recognition in the BINAURAL and BILATERAL groups: 70 vs. 56.7% in quiet, 55.7 vs. 43.3% in noise). There was also no significant difference with regard to performance in sound localization and self-assessment of health status and hearing disability. On the basis of the BINAURAL group's performance in hearing tasks involving the detection of interaural differences, implantation with the Neurelec Digisonic(®) SP Binaural implant may be considered to restore effective binaural hearing. Based on these first comparative results, this device seems to provide benefits similar to those of traditional bilateral cochlear implantation, with a new approach to stimulate both auditory nerves. PMID:23548561
Desantis, Andrea; Haggard, Patrick
To form a coherent representation of the objects around us, the brain must group the different sensory features composing these objects. Here, we investigated whether actions contribute in this grouping process. In particular, we assessed whether action-outcome learning and prediction contribute to audiovisual temporal binding. Participants were presented with two audiovisual pairs: one pair was triggered by a left action, and the other by a right action. In a later test phase, the audio and visual components of these pairs were presented at different onset times. Participants judged whether they were simultaneous or not. To assess the role of action-outcome prediction on audiovisual simultaneity, each action triggered either the same audiovisual pair as in the learning phase ('predicted' pair), or the pair that had previously been associated with the other action ('unpredicted' pair). We found the time window within which auditory and visual events appeared simultaneous increased for predicted compared to unpredicted pairs. However, no change in audiovisual simultaneity was observed when audiovisual pairs followed visual cues, rather than voluntary actions. This suggests that only action-outcome learning promotes temporal grouping of audio and visual effects. In a second experiment we observed that changes in audiovisual simultaneity do not only depend on our ability to predict what outcomes our actions generate, but also on learning the delay between the action and the multisensory outcome. When participants learned that the delay between action and audiovisual pair was variable, the window of audiovisual simultaneity for predicted pairs increased, relative to a fixed action-outcome pair delay. This suggests that participants learn action-based predictions of audiovisual outcome, and adapt their temporal perception of outcome events based on such predictions. PMID:27131076
Mirman, Daniel; McClelland, James L.; Holt, Lori L.; Magnuson, James S.
The effects of lexical context on phonological processing are pervasive and there have been indications that such effects may be modulated by attention. However, attentional modulation in speech processing is neither well documented nor well understood. Experiment 1 demonstrated attentional modulation of lexical facilitation of speech sound…
Li, Qi; Yu, Hongtao; Wu, Yan; Gao, Ning
The integration of multiple sensory inputs is essential for perception of the external world. The spatial factor is a fundamental property of multisensory audiovisual integration. Previous studies of the spatial constraints on bimodal audiovisual integration have mainly focused on the spatial congruity of audiovisual information. However, the effect of spatial reliability within audiovisual information on bimodal audiovisual integration remains unclear. In this study, we used event-related potentials (ERPs) to examine the effect of spatial reliability of task-irrelevant sounds on audiovisual integration. Three relevant ERP components emerged: the first at 140-200ms over a wide central area, the second at 280-320ms over the fronto-central area, and a third at 380-440ms over the parieto-occipital area. Our results demonstrate that ERP amplitudes elicited by audiovisual stimuli with reliable spatial relationships are larger than those elicited by stimuli with inconsistent spatial relationships. In addition, we hypothesized that spatial reliability within an audiovisual stimulus enhances feedback projections to the primary visual cortex from multisensory integration regions. Overall, our findings suggest that the spatial linking of visual and auditory information depends on spatial reliability within an audiovisual stimulus and occurs at a relatively late stage of processing. PMID:27392755
Wilson, Ian; Hashimoto Yurika
Much crosslinguistic research exists on the production and perception of voice onset time (VOT). However, most research on the perception of VOT uses synthetic stimuli instead of natural speech stimuli. Effects of synthetic speech on the perception of VOT are not known, but more research needs to be done to see if there are differences between perception using synthetic speech and perception using natural speech. This pilot study uses natural speech to investigate perception of Japanese VO...
Hannemann R; Eulitz C
Abstract Background How does the brain repair obliterated speech and cope with acoustically ambivalent situations? A widely discussed possibility is to use top-down information for solving the ambiguity problem. In the case of speech, this may lead to a match of bottom-up sensory input with lexical expectations resulting in resonant states which are reflected in the induced gamma-band activity (GBA). Methods In the present EEG study, we compared the subject's pre-attentive GBA responses to ob...
The Chinese audiovisual market is to impose a ban on audiovisual product dealers whose licenses have been revoked for violatingthe law. This ban will prohibit them from dealing in audiovisual products for ten years. Their names are to be included on a blacklist made known to the public.
A history professor relates his experiences producing and using audio-visual material and warns teachers not to rely on audio-visual aids for classroom presentations. Includes examples of popular audio-visual aids on Canada that communicate unintended, inaccurate, or unclear ideas. Urges teachers to exercise caution in the selection and use of…
Law, Sam-Po; Fung, Roxana; Kung, Carmen
This study investigated a theoretically challenging dissociation between good production and poor perception of tones among neurologically unimpaired native speakers of Cantonese. The dissociation is referred to as the near-merger phenomenon in sociolinguistic studies of sound change. In a passive oddball paradigm, lexical and nonlexical syllables of the T1/T6 and T4/T6 contrasts were presented to elicit the mismatch negativity (MMN) and P3a from two groups of participants, those who could produce and distinguish all tones in the language (Control) and those who could produce all tones but specifically failed to distinguish between T4 and T6 in perception (Dissociation). The presence of MMN to T1/T6 and null response to T4/T6 of lexical syllables in the dissociation group confirmed the near-merger phenomenon. The observation that the control participants exhibited a statistically reliable MMN to lexical syllables of T1/T6, weaker responses to nonlexical syllables of T1/T6 and lexical syllables of T4/T6, and finally null response to nonlexical syllables of T4/T6, suggests the involvement of top-down processing in speech perception. Furthermore, the stronger P3a response of the control group, compared with the dissociation group in the same experimental conditions, may be taken to indicate higher cognitive capability in attention switching, auditory attention or memory in the control participants. This cognitive difference, together with our speculation that constant top-down predictions without complete bottom-up analysis of acoustic signals in speech recognition may reduce one’s sensitivity to small acoustic contrasts, account for the occurrence of dissociation in some individuals but not others. PMID:23342146
Full Text Available This study investigated a theoretically challenging dissociation between good production and poor perception of tones among neurologically unimpaired native speakers of Cantonese. The dissociation is referred to as the near-merger phenomenon in sociolinguistic studies of sound change. In a passive oddball paradigm, lexical and nonlexical syllables of the T1/T6 and T4/T6 contrasts were presented to elicit the mismatch negativity (MMN and P3a from two groups of participants, those who could produce and distinguish all tones in the language (Control and those who could produce all tones but specifically failed to distinguish between T4 and T6 in perception (Dissociation. The presence of MMN to T1/T6 and null response to T4/T6 of lexical syllables in the dissociation group confirmed the near-merger phenomenon. The observation that the control participants exhibited a statistically reliable MMN to lexical syllables of T1/T6, weaker responses to nonlexical syllables of T1/T6 and lexical syllables of T4/T6, and finally null response to nonlexical syllables of T4/T6, suggests the involvement of top-down processing in speech perception. Furthermore, the stronger P3a response of the control group, compared with the dissociation group in the same experimental conditions, may be taken to indicate higher cognitive capability in attention switching, auditory attention or memory in the control participants. This cognitive difference, together with our speculation that constant top-down predictions without complete bottom-up analysis of acoustic signals in speech recognition may reduce one's sensitivity to small acoustic contrasts, account for the occurrence of dissociation in some individuals but not others.
Law, Sam-Po; Fung, Roxana; Kung, Carmen
This study investigated a theoretically challenging dissociation between good production and poor perception of tones among neurologically unimpaired native speakers of Cantonese. The dissociation is referred to as the near-merger phenomenon in sociolinguistic studies of sound change. In a passive oddball paradigm, lexical and nonlexical syllables of the T1/T6 and T4/T6 contrasts were presented to elicit the mismatch negativity (MMN) and P3a from two groups of participants, those who could produce and distinguish all tones in the language (Control) and those who could produce all tones but specifically failed to distinguish between T4 and T6 in perception (Dissociation). The presence of MMN to T1/T6 and null response to T4/T6 of lexical syllables in the dissociation group confirmed the near-merger phenomenon. The observation that the control participants exhibited a statistically reliable MMN to lexical syllables of T1/T6, weaker responses to nonlexical syllables of T1/T6 and lexical syllables of T4/T6, and finally null response to nonlexical syllables of T4/T6, suggests the involvement of top-down processing in speech perception. Furthermore, the stronger P3a response of the control group, compared with the dissociation group in the same experimental conditions, may be taken to indicate higher cognitive capability in attention switching, auditory attention or memory in the control participants. This cognitive difference, together with our speculation that constant top-down predictions without complete bottom-up analysis of acoustic signals in speech recognition may reduce one's sensitivity to small acoustic contrasts, account for the occurrence of dissociation in some individuals but not others. PMID:23342146
Lüttke, Claudia S; Ekman, Matthias; van Gerven, Marcel A J; de Lange, Floris P
Visual information can alter auditory perception. This is clearly illustrated by the well-known McGurk illusion, where an auditory/aba/ and a visual /aga/ are merged to the percept of 'ada'. It is less clear however whether such a change in perception may recalibrate subsequent perception. Here we asked whether the altered auditory perception due to the McGurk illusion affects subsequent auditory perception, i.e. whether this process of fusion may cause a recalibration of the auditory boundaries between phonemes. Participants categorized auditory and audiovisual speech stimuli as /aba/, /ada/ or /aga/ while activity patterns in their auditory cortices were recorded using fMRI. Interestingly, following a McGurk illusion, an auditory /aba/ was more often misperceived as 'ada'. Furthermore, we observed a neural counterpart of this recalibration in the early auditory cortex. When the auditory input /aba/ was perceived as 'ada', activity patterns bore stronger resemblance to activity patterns elicited by /ada/ sounds than when they were correctly perceived as /aba/. Our results suggest that upon experiencing the McGurk illusion, the brain shifts the neural representation of an /aba/ sound towards /ada/, culminating in a recalibration in perception of subsequent auditory input. PMID:27611960
Paraskevopoulos, Evangelos; Kuchenbuch, Anja; Herholz, Sibylle C; Pantev, Christo
Perception of everyday life events relies mostly on multisensory integration. Hence, studying the neural correlates of the integration of multiple senses constitutes an important tool in understanding perception within an ecologically valid framework. The present study used magnetoencephalography in human subjects to identify the neural correlates of an audiovisual incongruency response, which is not generated due to incongruency of the unisensory physical characteristics of the stimulation but from the violation of an abstract congruency rule. The chosen rule-"the higher the pitch of the tone, the higher the position of the circle"-was comparable to musical reading. In parallel, plasticity effects due to long-term musical training on this response were investigated by comparing musicians to non-musicians. The applied paradigm was based on an appropriate modification of the multifeatured oddball paradigm incorporating, within one run, deviants based on a multisensory audiovisual incongruent condition and two unisensory mismatch conditions: an auditory and a visual one. Results indicated the presence of an audiovisual incongruency response, generated mainly in frontal regions, an auditory mismatch negativity, and a visual mismatch response. Moreover, results revealed that long-term musical training generates plastic changes in frontal, temporal, and occipital areas that affect this multisensory incongruency response as well as the unisensory auditory and visual mismatch responses. PMID:23238733
Burgess, Stephen R.
Examines the influences of speech perception, oral language ability, emergent literacy, and the home literacy environment on the growth of phonological sensitivity. Finds, overall, the combination of predictors explained a significant proportion of the variance in phonological sensitivity and its growth. Discusses results in terms of their…
Fernández Martínez, Fernando; Lucas Cuesta, Juan Manuel; Barra Chicote, Roberto; Ferreiros López, Javier; Macías Guarasa, Javier
In this paper, we describe a new multi-purpose audio-visual database on the context of speech interfaces for controlling household electronic devices. The database comprises speech and video recordings of 19 speakers interacting with a HIFI audio box by means of a spoken dialogue system. Dialogue management is based on Bayesian Networks and the system is provided with contextual information handling strategies. Each speaker was requested to fulﬁl different sets of speciﬁc goals following pred...
Bayard, Clémence; Colin, Cécile; Leybaert, Jacqueline
Speech perception for both hearing and deaf people involves an integrative process between auditory and lip-reading information. In order to disambiguate information from lips, manual cues from Cued Speech may be added. Cued Speech (CS) is a system of manual aids developed to help deaf people to clearly and completely understand speech visually (Cornett, 1967). Within this system, both labial and manual information, as lone input sources, remain ambiguous. Perceivers, therefore, have to combine both types of information in order to get one coherent percept. In this study, we examined how audio-visual (AV) integration is affected by the presence of manual cues and on which form of information (auditory, labial or manual) the CS receptors primarily rely. To address this issue, we designed a unique experiment that implemented the use of AV McGurk stimuli (audio /pa/ and lip-reading /ka/) which were produced with or without manual cues. The manual cue was congruent with either auditory information, lip information or the expected fusion. Participants were asked to repeat the perceived syllable aloud. Their responses were then classified into four categories: audio (when the response was /pa/), lip-reading (when the response was /ka/), fusion (when the response was /ta/) and other (when the response was something other than /pa/, /ka/ or /ta/). Data were collected from hearing impaired individuals who were experts in CS (all of which had either cochlear implants or binaural hearing aids; N = 8), hearing-individuals who were experts in CS (N = 14) and hearing-individuals who were completely naïve of CS (N = 15). Results confirmed that, like hearing-people, deaf people can merge auditory and lip-reading information into a single unified percept. Without manual cues, McGurk stimuli induced the same percentage of fusion responses in both groups. Results also suggest that manual cues can modify the AV integration and that their impact differs between hearing and deaf people
Full Text Available Speech perception for both hearing and deaf people involves an integrative process between auditory and lip-reading information. In order to disambiguate information from lips, manual cues from Cued Speech may be added. Cued Speech (CS is a system of manual aids developed to help deaf people to clearly and completely understand speech visually (Cornett, 1967. Within this system, both labial and manual information, as lone input sources, remain ambiguous. Perceivers, therefore, have to combine both types of information in order to get one coherent percept. In this study, we examined how audio-visual (AV integration is affected by the presence of manual cues and on which form of information (auditory, labial or manual the CS receptors primarily rely. To address this issue, we designed a unique experiment that implemented the use of AV McGurk stimuli (audio /pa/ and lip-reading /ka/ which were produced with or without manual cues. The manual cue was congruent with either auditory information, lip information or the expected fusion. Participants were asked to repeat the perceived syllable aloud. Their responses were then classified into four categories: audio (when the response was /pa/, lip-reading (when the response was /ka/, fusion (when the response was /ta/ and other (when the response was something other than /pa/, /ka/ or /ta/. Data were collected from hearing impaired individuals who were experts in CS (all of which had either cochlear implants or binaural hearing aids; N=8, hearing-individuals who were experts in CS (N = 14 and hearing-individuals who were completely naïve of CS (N = 15. Results confirmed that, like hearing-people, deaf people can merge auditory and lip-reading information into a single unified percept. Without manual cues, McGurk stimuli induced the same percentage of fusion responses in both groups. Results also suggest that manual cues can modify the AV integration and that their impact differs between hearing and deaf
Van der Burg, Erik; Alais, David; Cass, John
To combine information from different sensory modalities, the brain must deal with considerable temporal uncertainty. In natural environments, an external event may produce simultaneous auditory and visual signals yet they will invariably activate the brain asynchronously due to different propagation speeds for light and sound, and different neural response latencies once the signals reach the receptors. One strategy the brain uses to deal with audiovisual timing variation is to adapt to a prevailing asynchrony to help realign the signals. Here, using psychophysical methods in human subjects, we investigate audiovisual recalibration and show that it takes place extremely rapidly without explicit periods of adaptation. Our results demonstrate that exposure to a single, brief asynchrony is sufficient to produce strong recalibration effects. Recalibration occurs regardless of whether the preceding trial was perceived as synchronous, and regardless of whether a response was required. We propose that this rapid recalibration is a fast-acting sensory effect, rather than a higher-level cognitive process. An account in terms of response bias is unlikely due to a strong asymmetry whereby stimuli with vision leading produce bigger recalibrations than audition leading. A fast-acting recalibration mechanism provides a means for overcoming inevitable audiovisual timing variation and serves to rapidly realign signals at onset to maximize the perceptual benefits of audiovisual integration. PMID:24027264
Documentary makers, journalists, news editors, and other media professionals routinely require previously recorded audiovisual material for new productions. For example, a news editor might wish to reuse footage from overseas services for the evening news, or a documentary maker describing the histo
This research aimed to improve the skill in appreciating dances owned by the students of Primary Teacher Education of Makassar State University, to improve the perception towards audio-visual based art appreciation, to increase the students’ interest in audio-visual based art education subject, and to increase the students’ responses to the subject. This research was classroom action research using the research design created by Kemmis & MC. Taggart, which was conducted to 42 students of Prim...
Ferguson, Melanie A.; Hall, Rebecca L.; Riley, Alison; Moore, David R.
Purpose: Parental reports of communication, listening, and behavior in children receiving a clinical diagnosis of specific language impairment (SLI) or auditory processing disorder (APD) were compared with direct tests of intelligence, memory, language, phonology, literacy, and speech intelligibility. The primary aim was to identify whether there…
Wallace, Sarah E.
Team-based learning (TBL), although found to increase student engagement and higher-level thinking, has not been examined in the field of speech-language pathology. The purpose of this study was to examine the effect of integrating TBL into a capstone course in evidence-based practice (EBP). The researcher evaluated 27 students' understanding of…
Full Text Available Se presenta el desarrollo de un sistema automático de reconocimiento audiovisual del habla enfocado en el reconocimiento de comandos. La representación del audio se realizó mediante los coeficientes cepstrales de Mel y las primeras dos derivadas temporales. Para la caracterización del vídeo se hizo seguimiento automático de características visuales de alto nivel a través de toda la secuencia. Para la inicialización automática del algoritmo se emplearon transformaciones de color y contornos activos con información de flujo del vector gradiente ("GVF snakes" sobre la región labial, mientras que para el seguimiento se usaron medidas de similitud entre vecindarios y restricciones morfológicas definidas en el estándar MPEG-4. Inicialmente, se presenta el diseño del sistema de reconocimiento automático del habla, empleando únicamente información de audio (ASR, mediante Modelos Ocultos de Markov (HMMs y un enfoque de palabra aislada; posteriormente, se muestra el diseño de los sistemas empleando únicamente características de vídeo (VSR, y empleando características de audio y vídeo combinadas (AVSR. Al final se comparan los resultados de los tres sistemas para una base de datos propia en español y francés, y se muestra la influencia del ruido acústico, mostrando que el sistema de AVSR es más robusto que ASR y VSR.We present the development of an automatic audiovisual speech recognition system focused on the recognition of commands. Signal audio representation was done using Mel cepstral coefficients and their first and second order time derivatives. In order to characterize the video signal, a set of high-level visual features was tracked throughout the sequences. Automatic initialization of the algorithm was performed using color transformations and active contour models based on Gradient Vector Flow (GVF Snakes on the lip region, whereas visual tracking used similarity measures across neighborhoods and morphological
Most, Tova; Harel, Tamar; Shpak, Talma; Luntz, Michal
Purpose: The purpose of the study was to evaluate the contribution of acoustic hearing to the perception of suprasegmental features by adults who use a cochlear implant (CI) and a hearing aid (HA) in opposite ears. Method: 23 adults participated in this study. Perception of suprasegmental features--intonation, syllable stress, and word…
Full Text Available The present study aimed to investigate how different Voice Onset Time (VOT patterns are categorized by native speakers of American English and Brazilian Learners of English. American English and Brazilian Portuguese diverge as to the voicing pattern of plosive consonants, for the VOT cue plays different roles in the distinction between voiced and voiceless consonant categories in each system. This study contrasted four VOT patterns (Negative VOT, Zero VOT, Positive VOT and a manipulated pattern, named Artificial Zero VOT in two perceptual tasks (AxB discrimination and identification tests, and verified how the two groups of participants categorized these patterns. Results reinforce the idea that speech perception is multimodal and, therefore, the action of multiple cues must be taken into account when we consider phonetic-phonological processes.
Differences can be perceived as gradual and quantitative, as with different shades of gray, or they can be perceived as more abrupt and qualitative, as with different colors. The first is called continuous perception and the second categorical perception. Categorical perception (CP) can be inborn or can be induced by learning. Formerly thought to be peculiar to speech and color perception, CP turns out to be far more general, and may be related to how the neural networks in our brains detect ...
Cecere, Roberto; Gross, Joachim; Thut, Gregor
The ability to integrate auditory and visual information is critical for effective perception and interaction with the environment, and is thought to be abnormal in some clinical populations. Several studies have investigated the time window over which audiovisual events are integrated, also called the temporal binding window, and revealed asymmetries depending on the order of audiovisual input (i.e. the leading sense). When judging audiovisual simultaneity, the binding window appears narrower and non-malleable for auditory-leading stimulus pairs and wider and trainable for visual-leading pairs. Here we specifically examined the level of independence of binding mechanisms when auditory-before-visual vs. visual-before-auditory input is bound. Three groups of healthy participants practiced audiovisual simultaneity detection with feedback, selectively training on auditory-leading stimulus pairs (group 1), visual-leading stimulus pairs (group 2) or both (group 3). Subsequently, we tested for learning transfer (crossover) from trained stimulus pairs to non-trained pairs with opposite audiovisual input. Our data confirmed the known asymmetry in size and trainability for auditory-visual vs. visual-auditory binding windows. More importantly, practicing one type of audiovisual integration (e.g. auditory-visual) did not affect the other type (e.g. visual-auditory), even if trainable by within-condition practice. Together, these results provide crucial evidence that audiovisual temporal binding for auditory-leading vs. visual-leading stimulus pairs are independent, possibly tapping into different circuits for audiovisual integration due to engagement of different multisensory sampling mechanisms depending on leading sense. Our results have implications for informing the study of multisensory interactions in healthy participants and clinical populations with dysfunctional multisensory integration. PMID:27003546
Full Text Available Disorders of music and speech perception, known as amusia and aphasia, have traditionally been regarded as dissociated deficits based on studies of brain damaged patients. This has been taken as evidence that music and speech are perceived by largely separate and independent networks in the brain. However, recent studies of congenital amusia have broadened this view by showing that the deficit is associated with problems in perceiving speech prosody, especially intonation and emotional prosody. In the present study the association between the perception of music and speech prosody was investigated with healthy Finnish adults (n = 61 using an on-line music perception test including the Scale subtest of Montreal Battery of Evaluation of Amusia (MBEA and Off-Beat and Out-of-key tasks as well as a prosodic verbal task that measures the perception of word stress. Regression analyses showed that there was a clear association between prosody perception and music perception, especially in the domain of rhythm perception. This association was evident after controlling for music education, age, pitch perception, visuospatial perception and working memory. Pitch perception was significantly associated with music perception but not with prosody perception. The association between music perception and visuospatial perception (measured using analogous tasks was less clear. Overall, the pattern of results indicates that there is a robust link between music and speech perception and that this link can be mediated by rhythmic cues (time and stress.
Hausen, Maija; Torppa, Ritva; Salmela, Viljami R; Vainio, Martti; Särkämö, Teppo
Disorders of music and speech perception, known as amusia and aphasia, have traditionally been regarded as dissociated deficits based on studies of brain damaged patients. This has been taken as evidence that music and speech are perceived by largely separate and independent networks in the brain. However, recent studies of congenital amusia have broadened this view by showing that the deficit is associated with problems in perceiving speech prosody, especially intonation and emotional prosody. In the present study the association between the perception of music and speech prosody was investigated with healthy Finnish adults (n = 61) using an on-line music perception test including the Scale subtest of Montreal Battery of Evaluation of Amusia (MBEA) and Off-Beat and Out-of-key tasks as well as a prosodic verbal task that measures the perception of word stress. Regression analyses showed that there was a clear association between prosody perception and music perception, especially in the domain of rhythm perception. This association was evident after controlling for music education, age, pitch perception, visuospatial perception, and working memory. Pitch perception was significantly associated with music perception but not with prosody perception. The association between music perception and visuospatial perception (measured using analogous tasks) was less clear. Overall, the pattern of results indicates that there is a robust link between music and speech perception and that this link can be mediated by rhythmic cues (time and stress). PMID:24032022
Locsei, Gusztav; Pedersen, Julie Hefting; Laugesen, Søren;
HI listeners, group differences in binaural benefit due to spatial separation of the maskers from the target remained small. Neither the FDT nor the IPDT tasks showed a clear correlation pattern with the SRTs or with the amount of binaural benefit, respectively. The results suggest that, although HI...... listeners with normal hearing in the low-frequency range might have elevated SRTs, the binaural benefit they experience due to spatial separation of competing sources can remain similar to that of NH listeners.......This study investigated the role of temporal fine structure (TFS) coding in spatially complex, lateralized listening tasks. Speech reception thresholds (SRTs) were measured in young normal-hearing (NH) and two groups of elderly hearing-impaired (HI) listeners in the presence of speech-shaped noise...
Kartushina, Natalia; Hervais-Adelman, Alexis; Frauenfelder, Ulrich Hans; Golestani, Narly
Second-language learners often experience major difficulties in producing non-native speech sounds. This paper introduces a training method that uses a real-time analysis of the acoustic properties of vowels produced by non-native speakers to provide them with immediate, trial-by-trial visual feedback about their articulation alongside that of the same vowels produced by native speakers. The Mahalanobis acoustic distance between non-native productions and target native acoustic spaces was use...
Ma, Ning; Morris, Saffron; Kitterick, Pádraig Thomas
Objectives: This study used vocoder simulations with normal-hearing (NH) listeners to (1) measure their ability to integrate speech information from an NH ear and a simulated cochlear implant (CI), and (2) investigate whether binaural integration is disrupted by a mismatch in the delivery of spectral information between the ears arising from a misalignment in the mapping of frequency to place. Design: Eight NH volunteers participated in the study and listened to sentences embedded in backgrou...
Perrier, Pascal; Fuchs, Susanne
International audience The first section provides a description of the concepts of “motor equivalence” and “degrees of freedom”. It is illustrated with a few examples of motor tasks in general and of speech production tasks in particular. In the second section, the methodology used to investigate experimentally motor equivalence phenomena in speech production is presented. It is mainly based on paradigms that perturb the perception-action loop during on-going speech, either by limiting the...
Moers, Donata; Wagner, Petra
This paper describes work in progress concerning the ad- equate modeling of fast speech in unit selection speech synthesis systems, mostly having in mind blind and visually impaired users. Initially, a survey of the main characteristics of fast speech will be given. Subsequently, strategies for fast speech production will be discussed. Certain requirements concerning the ability of a speaker of a fast speech unit selection inventory are drawn. The following section deals with a perception ...
Rachel N Denison
Full Text Available Synchrony between events in different senses has long been considered the critical temporal cue for multisensory integration. Here, using rapid streams of auditory and visual events, we demonstrate how humans can use temporal structure (rather than mere temporal coincidence to detect multisensory relatedness. We find psychophysically that participants can detect matching auditory and visual streams via shared temporal structure for crossmodal lags of up to 200 ms. Performance on this task reproduced features of past findings based on explicit timing judgments but did not show any special advantage for perfectly synchronous streams. Importantly, the complexity of temporal patterns influences sensitivity to correspondence. Stochastic, irregular streams – with richer temporal pattern information – led to higher audio-visual matching sensitivity than predictable, rhythmic streams. Our results reveal that temporal structure and its complexity are key determinants for human detection of audio-visual correspondence. The distinctive emphasis of our new paradigms on temporal patterning could be useful for studying special populations with suspected abnormalities in audio-visual temporal perception and multisensory integration.
In 1973, the most renowned researchers in Visual Anthropology met at the ninth International Congress of Anthropology and Sociology to discuss the role of film and photography in ethnographic research and to systematize the almost century-old experiences of bringing together description, ethnography, photography and film. Opening the meeting, Dean Margaret Mead enthusiastically defended the use of audiovisual instruments in research. Considering that Anthropology explicitly ...
Altvater-Mackensen, Nicole; Grossmann, Tobias
Infants' language exposure largely involves face-to-face interactions providing acoustic and visual speech cues but also social cues that might foster language learning. Yet, both audiovisual speech information and social information have so far received little attention in research on infants' early language development. Using a preferential…
... your treatment plan may include seeing a speech therapist , a person who is trained to treat speech disorders. How often you have to see the speech therapist will vary — you'll probably start out seeing ...
... Spotlight Fundraising Ideas Vehicle Donation Volunteer Efforts Speech Development skip to submenu Parents & Individuals Information for Parents & Individuals Speech Development To download the PDF version of this factsheet, ...
Franck, Bas A. M.; Dreschler, Wouter A.; Lyzenga, Johannes
In this study we investigated the reliability and convergence characteristics of an adaptive multidirectional pattern search procedure, relative to a nonadaptive multidirectional pattern search procedure. The procedure was designed to optimize three speech-processing strategies. These comprise noise reduction, spectral enhancement, and spectral lift. The search is based on a paired-comparison paradigm, in which subjects evaluated the listening comfort of speech-in-noise fragments. The procedural and nonprocedural factors that influence the reliability and convergence of the procedure are studied using various test conditions. The test conditions combine different tests, initial settings, background noise types, and step size configurations. Seven normal hearing subjects participated in this study. The results indicate that the reliability of the optimization strategy may benefit from the use of an adaptive step size. Decreasing the step size increases accuracy, while increasing the step size can be beneficial to create clear perceptual differences in the comparisons. The reliability also depends on starting point, stop criterion, step size constraints, background noise, algorithms used, as well as the presence of drifting cues and suboptimal settings. There appears to be a trade-off between reliability and convergence, i.e., when the step size is enlarged the reliability improves, but the convergence deteriorates. .
Sadaghiani, Mohammad Hossein/Mr.
Audio-visual speech recognition and visual speech synthesisers are used as interfaces between humans and machines. Such interactions specifically rely on the analysis and synthesis of both audio and visual information, which humans use for face-to-face communication. Currently, there is no global standard to describe these interactions nor is there a standard mathematical tool to describe lip movements. Furthermore, the visual lip movement for each phoneme is considered in isolation rather th...
Nasir, Sazzad M.; Ostry, David J.
Is plasticity in sensory and motor systems linked? Here, in the context of speech motor learning and perception, we test the idea sensory function is modified by motor learning and, in particular, that speech motor learning affects a speaker's auditory map. We assessed speech motor learning by using a robotic device that displaced the jaw and selectively altered somatosensory feedback during speech. We found that with practice speakers progressively corrected for the mechanical perturbation a...
Full Text Available The distinctive feature of wavelet transforms applications it is used in speech signals. Some problem are produced in speech signals their synthesis, analysis compression and classification. A method being evaluated uses wavelets for speech analysis and synthesis distinguishing between voiced and unvoiced speech, determining pitch, and methods for choosing optimum wavelets for speech compression are discussed. This comparative perception results that are obtained by listening to the synthesized speech using both scalar and vector quantized wavelet parameters are reported in this paper.
Full Text Available The work presents a methodology for the analysis of journalistic audiovisual narratives, and instrument of critical reading of news contents and formats which utilize audiovisual language and multimedia resources on TV and on the web. It is assumed that the comprehension of the dynamic combinations of the elements which constitute the audiovisual text contributes to a better perception of the meanings of the news, and that uses of the digital tools in a critical and creative way can collaborate in the practice of citizenship and in the perfection of current journalistic practice, highlighting the importance of the training of future professionals. The methodology proposed here is supported by technical references established in the possible dialogues of the research works in the journalism field itself with the contributions of Media Literacy, of Televisual Analysis, of Cultural Studies and of Discourse Analysis.
A report on the proceedings and ideas expressed at a one day seminar on "Audio-Visual Equipment--Its Uses and Applications for Teaching and Research in Universities." The seminar was organized by England's National Committee for Audio-Visual Aids in Education in conjunction with the British Universities Film Council. (LS)
Cowan, Gloria; Khatchadourian, Desiree
Women are more intolerant of hate speech than men. This study examined relationality measures as mediators of gender differences in the perception of the harm of hate speech and the importance of freedom of speech. Participants were 107 male and 123 female college students. Questionnaires assessed the perceived harm of hate speech, the importance…
Temporal proximity is one of the key factors determining whether events in different modalities are integrated into a unified percept. Sensitivity to audiovisual temporal asynchrony has been studied in adults in great detail. However, how such sensitivity matures during childhood is poorly understood. We examined perception of audiovisual temporal asynchrony in 7- to 8-year-olds, 10- to 11-year-olds, and adults by using a simultaneity judgment task (SJT). Additionally, we evaluated whether nonverbal intelligence, verbal ability, attention skills, or age influenced children's performance. On each trial, participants saw an explosion-shaped figure and heard a 2-kHz pure tone. These occurred at the following stimulus onset asynchronies (SOAs): 0, 100, 200, 300, 400, and 500 ms. In half of all trials, the visual stimulus appeared first (VA condition), and in the other half, the auditory stimulus appeared first (AV condition). Both groups of children were significantly more likely than adults to perceive asynchronous events as synchronous at all SOAs exceeding 100 ms, in both VA and AV conditions. Furthermore, only adults exhibited a significant shortening of reaction time (RT) at long SOAs compared to medium SOAs. Sensitivities to the VA and AV temporal asynchronies showed different developmental trajectories, with 10- to 11-year-olds outperforming 7- to 8-year-olds at the 300- to 500-ms SOAs, but only in the AV condition. Lastly, age was the only predictor of children's performance on the SJT. These results provide an important baseline against which children with developmental disorders associated with impaired audiovisual temporal function-such as autism, specific language impairment, and dyslexia-may be compared. PMID:26569563
Lametti, Daniel R.; Krol, Sonia A.; Shiller, Douglas M.; Ostry, David J.
The perception of speech is notably malleable in adults, yet alterations in perception seem to have little impact on speech production. We hypothesized that speech perceptual training might immediately influence speech motor learning. To test this, we paired a speech perceptual training task with a speech motor learning task. Subjects performed a series of perceptual tests designed to measure and then manipulate the perceptual distinction between the words “head” and “had”. Subjects then prod...
Full Text Available Michel Foucault ensina que toda fala sistemática - inclusive aquela que se afirma “neutra” ou “uma desinteressada visão objetiva do que acontece” - é, na verdade, mecanismo de articulação do saber e, na seqüência, de formação de poder. O aparecimento de novas tecnologias, especialmente as digitais, no campo da produção audiovisual, provoca uma avalanche de declarações de cineastas, ensaios de acadêmicos e previsões de demiurgos da mídia.
Ito, Takayuki; Johns, Alexis R.; Ostry, David J.
Purpose: Somatosensory information associated with speech articulatory movements affects the perception of speech sounds and vice versa, suggesting an intimate linkage between speech production and perception systems. However, it is unclear which cortical processes are involved in the interaction between speech sounds and orofacial somatosensory…
Truong, Khiet P.; Leeuwen, van, M.; Neerincx, Mark A; Jong, de, P.
In this paper, we describe emotion recognition experiments carried out for spontaneous aﬀective speech with the aim to compare the added value of annotation of felt emotion versus annotation of perceived emotion. Using speech material available in the TNO-GAMING corpus (a corpus containing audiovisual recordings of people playing videogames), speech-based aﬀect recognizers were developed that can predict Arousal and Valence scalar values. Two types of recognizers were developed in parallel: o...
Denda, Yuki; Nishiura, Takanobu; Yamashita, Yoichi
This paper proposes a robust omnidirectional audio-visual (AV) talker localizer for AV applications. The proposed localizer consists of two innovations. One of them is robust omnidirectional audio and visual features. The direction of arrival (DOA) estimation using an equilateral triangular microphone array, and human position estimation using an omnidirectional video camera extract the AV features. The other is a dynamic fusion of the AV features. The validity criterion, called the audioor visual-localization counter, validates each audio- or visual-feature. The reliability criterion, called the speech arriving evaluator, acts as a dynamic weight to eliminate any prior statistical properties from its fusion procedure. The proposed localizer can compatibly achieve talker localization in a speech activity and user localization in a non-speech activity under the identical fusion rule. Talker localization experiments were conducted in an actual room to evaluate the effectiveness of the proposed localizer. The results confirmed that the talker localization performance of the proposed AV localizer using the validity and reliability criteria is superior to that of conventional localizers.
Full Text Available Severe to profound prelingual deafness that is either congenital or acquired is estimated to occur in 0.5 to 3 per 1000 live births. This is often associated with early delays in language development, speech perception, socialization and results in lower academic achievement. These de velopmental and behavioral problems are severe as 90 % of children are born to normal patients whereas with deaf parents it is less as they have a mutual communication. After much research in this field the first 22 channel cochlear implant surgery was don e in 1982. The number of prelingually deafened adults seeking cochlear implant is increasing as these individuals can derive substantial benefit, although their performance is poorer than adults with post - lingual deafness. MATERIAL AND METHODS : The present prospective study was conducted in the Department of ENT, Pt. J.N.M. Medical College and Dr. B. R.A.M. Hospital, Raipur (C.G. The subject selected were prelingually deafened individuals who were undergoing post cochlear implant speech therapy in the Depar tment. This study included individuals, who underwent cochlear implant surgery in this Department during the period of July, 2008 to September, 2010 and the age was within 10 years at the time of surgery. The study was designed as a prospective longitudina l analysis to asses functioning of patients, who underwent cochlear implantation. A total 37 cochlear implant surgeries were carried out in Department. Of these 3 cases were outside the age criteria of the present study and another 2 cases were lost in fol low up. Pre - operatively, detailed information of subject including the age, sex and address as well as contact number was collected. Then a General Examination was followed with reference to Built, Nutrition, Pulse, and Blood pressure, Oedema, Cyanosis, Cl ubbing and Citrus. A systemic examination was also performed. A Local Examination with special emphasis to tympanic membrane and any middle ear
言语感知遵循音不离词，词不离句的原则。除了语音特征、音位和单词三个感知单元外，句子单元也参与了言语感知的过程。在这一感知过程中，句子语境分别从句法和语义两方面对词汇的识别发生影响。在句法方面，句子层依据句法规则对词汇层产生自上而下的反馈作用，通过词类限制和曲折形态特征核查等方式实现对词汇层上备选单词的筛选；在语义方面，句子层根据语义限制条件对备选单词产生激活或抑制作用。%Phonemes, words and sentences are interconnected in speech perception. Besides phonetic features, phonemes and words, sentences are also engaged in speech perception. In speech perception, sentimental contexts exert influence on word recog-nition both syntactically and semantically. Syntactically, sentence levels exert top-down feedback effect on world levels according to syntactic rules, screening the candidates on word levels by constraining their part of speech or checking their inflectional fea-tures. Semantically, sentence levels activate pr inhibit the candidates by exerting semantic constraints.
Chen, Yuchun; Tsao, Feng-Ming; Liu, Huei-Mei
This study used a longitudinal design to examine the development of mismatch responses (MMRs) to Mandarin lexical tones, an index of neural speech discriminative responses, in late talkers and typical controls at 3, 5, and 6 years of age. Lexical tones are phonetic suprasegments that distinguish the lexical meanings of syllables in tonal languages. The 2 year-old late talkers were later divided into persistent language delay and late bloomer groups according to their performance on standardized language tests at 4 years. Results showed that children with persistent language delay demonstrated more positive mismatch responses than the typical controls at 3 years of age. At the age of 5, no group difference were found in the amplitude of MMRs, but the maturation of MMRs could be observed in the change of topography, with more prominent negative response in the frontal sites only in the typical group. Correlations were found between the index of MMRs at 3 years and children's language performance outcome at 6 years. Our results indicate that the development of fine-grained tone representations is delayed in late-talking children between 3 and 5 years and may be one of the underlying mechanisms which associated with later language performance. PMID:27061247
This thesis examines how we perceive an audiovisual narrative - here defined as film, television and video games - and seeks to establish a descriptive framework for auditory stimuli and their narrative functions in this regard. I initially adopt the viewpoint of cognitive psychology an account for basic information processing operations. I then discuss audiovisual perception in terms of the effects of sensory integration between the visual and auditory modalities on the construction of meani...
Speech is important in human life; it forms every individual and gives him the opportunity to establish communication with his surroundings. Communication is disturbed when a person is confronted with speech and language disorder. An educator must be especially familiar with the stages of development of speech perception and speech and language disorders, because speech and language disorders affect children as early as in the preschool period. In the theoretical part, I used a variety of lit...
Tremblay, Stéphanie; Shiller, Douglas M; Ostry, David J
The hypothesis that speech goals are defined acoustically and maintained by auditory feedback is a central idea in speech production research. An alternative proposal is that speech production is organized in terms of control signals that subserve movements and associated vocal-tract configurations. Indeed, the capacity for intelligible speech by deaf speakers suggests that somatosensory inputs related to movement play a role in speech production-but studies that might have documented a somatosensory component have been equivocal. For example, mechanical perturbations that have altered somatosensory feedback have simultaneously altered acoustics. Hence, any adaptation observed under these conditions may have been a consequence of acoustic change. Here we show that somatosensory information on its own is fundamental to the achievement of speech movements. This demonstration involves a dissociation of somatosensory and auditory feedback during speech production. Over time, subjects correct for the effects of a complex mechanical load that alters jaw movements (and hence somatosensory feedback), but which has no measurable or perceptible effect on acoustic output. The findings indicate that the positions of speech articulators and associated somatosensory inputs constitute a goal of speech movements that is wholly separate from the sounds produced. PMID:12815431
Full Text Available This research aimed to improve the skill in appreciating dances owned by the students of Primary Teacher Education of Makassar State University, to improve the perception towards audio-visual based art appreciation, to increase the students’ interest in audio-visual based art education subject, and to increase the students’ responses to the subject. This research was classroom action research using the research design created by Kemmis & MC. Taggart, which was conducted to 42 students of Primary Teacher Education of Makassar State University. The data collection was conducted using observation, questionnaire, and interview. The techniques of data analysis applied in this research were descriptive qualitative and quantitative. The results of this research were: (1 the students’ achievement in audio-visual based dance appreciation improved: precycle 33,33%, cycle I 42,85% and cycle II 83,33%, (2 the students’ perception towards the audio-visual based dance appreciation improved: cycle I 59,52%, and cycle II 71,42%. The students’ perception towards the subject obtained through structured interview in cycle I and II was 69,83% in a high category, (3 the interest of the students in the art education subject, especially audio-visual based dance appreciation, increased: cycle I 52,38% and cycle II 64,28%, and the students’ interest in the subject obtained through structured interview was 69,50 % in a high category. (3 the students’ response to audio-visual based dance appreciation increased: cycle I 54,76% and cycle II 69,04% in a good category.
Leone, Dorothy; Levy, Erika S.
Purpose: Much of a child's day is spent listening to speech in the presence of background noise. Although accurate vowel perception is important for listeners' accurate speech perception and comprehension, little is known about children's vowel perception in noise. "Clear speech" is a speech style frequently used by talkers in the…
Full Text Available The masking effect of a piano composition, played at different speeds and in different octaves, on speech-perception thresholds was investigated in 15 normal-hearing and 14 moderately-hearing-impaired subjects. Running speech (just follow conversation, JFC testing and use of hearing aids increased the everyday validity of the findings. A comparison was made with standard audiometric noises [International Collegium of Rehabilitative Audiology (ICRA noise and speech spectrum-filtered noise (SPN]. All masking sounds, music or noise, were presented at the same equivalent sound level (50 dBA. The results showed a significant effect of piano performance speed and octave (P<.01. Low octave and fast tempo had the largest effect; and high octave and slow tempo, the smallest. Music had a lower masking effect than did ICRA noise with two or six speakers at normal vocal effort (P<.01 and SPN (P<.05. Subjects with hearing loss had higher masked thresholds than the normal-hearing subjects (P<.01, but there were smaller differences between masking conditions (P<.01. It is pointed out that music offers an interesting opportunity for studying masking under realistic conditions, where spectral and temporal features can be varied independently. The results have implications for composing music with vocal parts, designing acoustic environments and creating a balance between speech perception and privacy in social settings.
Ekström, Seth-Reino; Borg, Erik
The masking effect of a piano composition, played at different speeds and in different octaves, on speech-perception thresholds was investigated in 15 normal-hearing and 14 moderately-hearing-impaired subjects. Running speech (just follow conversation, JFC) testing and use of hearing aids increased the everyday validity of the findings. A comparison was made with standard audiometric noises [International Collegium of Rehabilitative Audiology (ICRA) noise and speech spectrum-filtered noise (SPN)]. All masking sounds, music or noise, were presented at the same equivalent sound level (50 dBA). The results showed a significant effect of piano performance speed and octave (PMusic had a lower masking effect than did ICRA noise with two or six speakers at normal vocal effort (Pmusic offers an interesting opportunity for studying masking under realistic conditions, where spectral and temporal features can be varied independently. The results have implications for composing music with vocal parts, designing acoustic environments and creating a balance between speech perception and privacy in social settings. PMID:21768731
Hearing impairment, and specifically sensorineural hearing loss, is an increasingly prevalent condition, especially amongst the ageing population. It occurs primarily as a result of damage to hair cells that act as sound receptors in the inner ear and causes a variety of hearing perception problems, most notably a reduction in speech intelligibility. Accurate diagnosis of hearing impairments is a time consuming process and is complicated by the reliance on indirect measurements based on patie...
EFL In-service Yemeni Teachers’ Perceptions and Perspectives on the Importance of Teaching Stress and Intonation as Supra-segmental Features of Speech and Sound attributes to the Process of Comprehension: A Survey-Study
Rafiq Al-Shamiry; Ahmed M. S. Alduais
Purposes: To obtain the EFL in-service Yemeni teachers’ perceptions and perspectives on the importance of teaching stress and intonation as supra-segmental features of speech and sound-attributes to the process of comprehension. Methods: 40 EFL teachers who were identified as in-service teachers in both public and private schools at IBB city, Yemen (20 Arts’ graduates and 20 Education’s graduates, both males and females) participated in this survey-study. A researcher-made questionnaire consi...
Keitel, Christian; Müller, Matthias M
Our brain relies on neural mechanisms of selective attention and converging sensory processing to efficiently cope with rich and unceasing multisensory inputs. One prominent assumption holds that audio-visual synchrony can act as a strong attractor for spatial attention. Here, we tested for a similar effect of audio-visual synchrony on feature-selective attention. We presented two superimposed Gabor patches that differed in colour and orientation. On each trial, participants were cued to selectively attend to one of the two patches. Over time, spatial frequencies of both patches varied sinusoidally at distinct rates (3.14 and 3.63 Hz), giving rise to pulse-like percepts. A simultaneously presented pure tone carried a frequency modulation at the pulse rate of one of the two visual stimuli to introduce audio-visual synchrony. Pulsed stimulation elicited distinct time-locked oscillatory electrophysiological brain responses. These steady-state responses were quantified in the spectral domain to examine individual stimulus processing under conditions of synchronous versus asynchronous tone presentation and when respective stimuli were attended versus unattended. We found that both, attending to the colour of a stimulus and its synchrony with the tone, enhanced its processing. Moreover, both gain effects combined linearly for attended in-sync stimuli. Our results suggest that audio-visual synchrony can attract attention to specific stimulus features when stimuli overlap in space. PMID:26226930
Hausen, Maija; Torppa, Ritva; Salmela, Viljami R.; Vainio, Martti; Särkämö, Teppo
Disorders of music and speech perception, known as amusia and aphasia, have traditionally been regarded as dissociated deficits based on studies of brain damaged patients. This has been taken as evidence that music and speech are perceived by largely separate and independent networks in the brain. However, recent studies of congenital amusia have broadened this view by showing that the deficit is associated with problems in perceiving speech prosody, especially intonation and emotional prosod...
Maija eHausen; Ritva eTorppa; Salmela, Viljami R.; Martti eVainio; Teppo eSärkämö
Disorders of music and speech perception, known as amusia and aphasia, have traditionally been regarded as dissociated deficits based on studies of brain damaged patients. This has been taken as evidence that music and speech are perceived by largely separate and independent networks in the brain. However, recent studies of congenital amusia have broadened this view by showing that the deficit is associated with problems in perceiving speech prosody, especially intonation and emotional prosod...
Julien Meyer; Laure Dentel; Fanny Meunier
In the real world, human speech recognition nearly always involves listening in background noise. The impact of such noise on speech signals and on intelligibility performance increases with the separation of the listener from the speaker. The present behavioral experiment provides an overview of the effects of such acoustic disturbances on speech perception in conditions approaching ecologically valid contexts. We analysed the intelligibility loss in spoken word lists with increasing listene...
tone condition [F = (2, 39 = 47.18, p < .001, η2 = .55] and group [F = (2, 39 = 75.89, p = .017, η2 = .14], with T5 eliciting more positive responses than T2, and stronger responses from the [+Per+Pro] than [+Per-Pro] group. Correlations between production accuracy of the two rising tones and perceptual measures found that the averaged production accuracy was negatively correlated with the discrimination RT (r = -.502, p = .001, with shorter discrimination RTs associated with higher production accuracy. In addition, the production accuracy was positively correlated with the mean amplitude of brain responses to rise time of T5 (r = .421, p = .006, the larger the response, the higher the production accuracy. In summary, the present study demonstrated that tone perception is highly dynamic and exploits different acoustic cues at different stages of processing – rise time at the sensory/perceptual level and pitch feature at the cognitive level, as the auditory signal unfolds over time. Moreover, our findings revealed differential sensitivities between individuals with and without distinctive production of the two rising tones as evidenced by the differences in discrimination latency of the two tones and magnitude of brain response to short rise time. The individual differences found in production are proposed to have a perceptual origin, in that less defined phonological representations lead to less distinctive production.
Lewkowicz, David J; Hansen-Tift, Amy M
The mechanisms underlying the acquisition of speech-production ability in human infancy are not well understood. We tracked 4-12-mo-old English-learning infants' and adults' eye gaze while they watched and listened to a female reciting a monologue either in their native (English) or nonnative (Spanish) language. We found that infants shifted their attention from the eyes to the mouth between 4 and 8 mo of age regardless of language and then began a shift back to the eyes at 12 mo in response to native but not nonnative speech. We posit that the first shift enables infants to gain access to redundant audiovisual speech cues that enable them to learn their native speech forms and that the second shift reflects growing native-language expertise that frees them to shift attention to the eyes to gain access to social cues. On this account, 12-mo-old infants do not shift attention to the eyes when exposed to nonnative speech because increasing native-language expertise and perceptual narrowing make it more difficult to process nonnative speech and require them to continue to access redundant audiovisual cues. Overall, the current findings demonstrate that the development of speech production capacity relies on changes in selective audiovisual attention and that this depends critically on early experience. PMID:22307596
Yuanqing Li; Jinyi Long; Biao Huang; Tianyou Yu; Wei Wu; Peijun Li; Fang Fang; Pei Sun
An audiovisual object may contain multiple semantic features, such as the gender and emotional features of the speaker. Feature-selective attention and audiovisual semantic integration are two brain functions involved in the recognition of audiovisual objects. Humans often selectively attend to one or several features while ignoring the other features of an audiovisual object. Meanwhile, the human brain integrates semantic information from the visual and auditory modalities. However, how thes...
This master thesis analyzes the ethical challenges journalists have in their work, with special regard to code of conduct and hate speech. When it comes to the issue of hate speech, this master thesis focuses at hate speech directed to minorities in Turkey. The media market in Turkey is highly regulated by laws and regulations. As a result of that several newspapers have been in trouble with the law. This in turn leads to self-censorship in the business. Two media groups o...
This paper provides an attempted study of the speech perception,concentrating on acoustic phonetic aspects of the processes,which underline the capacity to identify the phonological structure of speech.
Werkhoven, Peter; Philippi, Tom; van Erp, J.B.F.
It has been shown that multisensory presentation can improve perception, attention, and object memory compared with unisensory presentation. Consequently, we expect that multisensory presentation of landmarks can improve spatial memory and navigation. In this study we tested the effect of visual, au
In summary, TMS is an innovative tool to investigate processing of speech perception and imitation. TMS studies have provided strong evidence that the sensory system is critically involved in mapping sensory input onto motor output and that the motor system plays an important role in speech perception.
王硕; 董瑞娟; Solveig Christina Voss; 钱金宇; 吴燕君; 张华
目的：本研究对感音神经性听力损失患者助听器选配后的言语识别能力进行评价，并分析听力损失程度与年龄对助听后言语康复效果的影响。方法30名感音神经性听力损失受试者，男13名，女17名，年龄26-86岁，双侧听力损失程度对称，双耳0.5-4 kHz频率下纯音听力阈值（PTA0.5-4 kHz）平均值40～75 dB HL。所有受试者均选配Phonak Bolero Q50系列耳背式助听器。使用汉语普通话言语测试软件（Mandarin Speech Test Materials, MSTMs）进行裸耳和助听后安静与噪声环境下言语识别能力测试。结果（1）助听后，安静环境下的双音节识别率平均提高35.1±19.5%；噪声环境下语句识别率平均提高32.8±22.8%；（2）助听后言语识别能力与听力损失程度呈显著负相关关系；（3）助听优势高于平均水平的受试者纯音听阈均大于50 dB HL，但存在个体差异大的特点。结论助听器选配可以有效帮助感音神经性听力损失患者提高言语识别能力，但听力损失程度不是唯一影响助听效果的因素，助听后言语识别能力的改善存在较大个体差异。%Objective This study was aimed at evaluating the speech perception performance in sensorineural hear-ing-impaired listeners with hearing aids. Methods Thirty subjects with sensorineural hearing loss were recruited, including 13 males and 17 females with the age ranging from 26 to 86 years. They had bilaterally symmetric hearing loss with the av-eraged 0.5-4 kHz PTA ranging from 40 to 75 dB HL. They were fitted with Phonak Bolero Q50 BTE hearing aids unilaterally. The Mandarin Speech Test Materials (MSTMs) software was used to test speech perception performance under four condi-tions, including unaided quiet, aided quiet, unaided noisy and aided noisy environments. Results (1) After fitting hearing aids, the speech perception score in quiet using bisyllabic materials improved by 35.1±19.5%in average
Ravishankar, C., Hughes Network Systems, Germantown, MD
Speech is the predominant means of communication between human beings and since the invention of the telephone by Alexander Graham Bell in 1876, speech services have remained to be the core service in almost all telecommunication systems. Original analog methods of telephony had the disadvantage of speech signal getting corrupted by noise, cross-talk and distortion Long haul transmissions which use repeaters to compensate for the loss in signal strength on transmission links also increase the associated noise and distortion. On the other hand digital transmission is relatively immune to noise, cross-talk and distortion primarily because of the capability to faithfully regenerate digital signal at each repeater purely based on a binary decision. Hence end-to-end performance of the digital link essentially becomes independent of the length and operating frequency bands of the link Hence from a transmission point of view digital transmission has been the preferred approach due to its higher immunity to noise. The need to carry digital speech became extremely important from a service provision point of view as well. Modem requirements have introduced the need for robust, flexible and secure services that can carry a multitude of signal types (such as voice, data and video) without a fundamental change in infrastructure. Such a requirement could not have been easily met without the advent of digital transmission systems, thereby requiring speech to be coded digitally. The term Speech Coding is often referred to techniques that represent or code speech signals either directly as a waveform or as a set of parameters by analyzing the speech signal. In either case, the codes are transmitted to the distant end where speech is reconstructed or synthesized using the received set of codes. A more generic term that is applicable to these techniques that is often interchangeably used with speech coding is the term voice coding. This term is more generic in the sense that the
Wendt, Dorothea Christine
The primary goal of this thesis is to gain a better insight into any impediments in speech processing that occur due to sensory and cognitive factors. To achieve this, a new audio-visual paradigm based on the analysis of eye-movements is developed here which allows for an online analysis of the speech understanding process with possible applications in the field of audiology. The proposed paradigm is used to investigate the influence of background noise and linguistic complexity on the proces...
Blanca Rodríguez Bravo
Full Text Available Se analizan las peculiaridades del documento audiovisual y el tratamiento documental que sufre en las emisoras de televisión. Observando a las particularidades de la imagen que condicionan su análisis y recuperación, se establecen las etapas y procedimientos para representar el mensaje audiovisual con vistas a su reutilización. Por último se realizan algunas consideraciones acerca del procesamiento automático del video y de los cambios introducidos por la televisión digital.Peculiarities of the audio-visual document and the treatment it undergoes in TV broadcasting stations are analyzed. The particular features of images condition their analysis and recovery; this paper establishes stages and proceedings for the representation of audio-visual messages with a view to their re-usability Also, some considerations about the automatic processing of the video and the changes introduced by digital TV are made.
Liberman, Alvin M.; Mattingly, Ignatius G.
Discusses the phonetic module that increases the rate of information flow, establishes the parity between sender and receiver, and provides for the natural development of phonetic structures in the individual. Cites evidence and function of this specialization and architectural relations between the two classes of modules. (Author/RT)
Kokaram, Anil; KOKARAM, ANIL CHRISTOPHER; Harte, Naomi; HINES, ANDREW
PUBLISHED Export Date: 27 August 2015 This paper presents an objective speech quality model, ViSQOL, the Virtual Speech Quality Objective Listener. It is a signal-based, full-reference, intrusive metric that models human speech quality perception using a spectro-temporal measure of similarity between a reference and a test speech signal. The metric has been particularly designed to be robust for quality issues associated with Voice over IP (VoIP) transmission. This paper describes the a...
Harris, Katherine Safford
Speaking is universally acknowledged as an important human talent, yet as a topic of educated common knowledge, it is peculiarly neglected. Partly, this is a consequence of the relatively recent growth of research on speech perception, production, and development, but also a function of the way that information is sliced up by undergraduate colleges. Although the basic acoustic mechanism of vowel production was known to Helmholtz, the ability to view speech production as a physiological event is evolving even now with such techniques as fMRI. Intensive research on speech perception emerged only in the early 1930s as Fletcher and the engineers at Bell Telephone Laboratories developed the transmission of speech over telephone lines. The study of speech development was revolutionized by the papers of Eimas and his colleagues on speech perception in infants in the 1970s. Dissemination of knowledge in these fields is the responsibility of no single academic discipline. It forms a center for two departments, Linguistics, and Speech and Hearing, but in the former, there is a heavy emphasis on other aspects of language than speech and, in the latter, a focus on clinical practice. For psychologists, it is a rather minor component of a very diverse assembly of topics. I will focus on these three fields in proposing possible remedies.
Ana Tereza de Matos Magalhães
Full Text Available As novas tecnologias do processador Freedom® foram criadas para proporcionar melhorias no processamento do som acústico de entrada, não apenas para novos usuários, como para gerações anteriores de implante coclear. OBJETIVO: Identificar a contribuição da tecnologia do processador de fala Freedom® para implante coclear multicanal, Nucleus22®, no desempenho de percepção de fala no silêncio e no ruído, e nos limiares audiométricos. MATERIAL E MÉTODO: A forma de estudo foi de coorte histórico com corte transversal. Dezessete pacientes preencheram os critérios de inclusão. Antes de iniciar os testes, o último mapa em uso com o Spectra® foi revisto e otimizado e o funcionamento do processador foi verificado. Os testes de fala foram apresentados a 60dBNPS em material gravado: monossílabos; frases em apresentação aberta no silêncio; e no ruído (SNR = 0dB. Foram realizadas audiometrias em campo livre com ambos os processadores de fala. A análise estatística utilizou testes não-paramétricos. RESULTADOS: Quando analisada a contribuição do Freedom® para pacientes com Nucleus22®, observa-se diferença estatisticamente significativa em todos os testes de percepção de fala e em todos os limiares audiométricos. CONCLUSÃO: A tecnologia contribuiu no desempenho de percepção de fala e nos limiares audiométricos dos pacientes usuários de Nucleus22®.New technology in the Freedom® speech processor for cochlear implants was developed to improve how incoming acoustic sound is processed; this applies not only for new users, but also for previous generations of cochlear implants. AIM: To identify the contribution of this technology - the Nucleus 22® - on speech perception tests in silence and in noise, and on audiometric thresholds. METHODS: A cross-sectional cohort study was undertaken. Seventeen patients were selected. The last map based on the Spectra® was revised and optimized before starting the tests. Troubleshooting
Describes results of survey of media service directors at public universities in Ohio to determine the expected longevity of audiovisual equipment. Use of the Delphi technique for estimates is explained, results are compared with an earlier survey done in 1977, and use of spreadsheet software to calculate depreciation is discussed. (LRW)
Pollock, Sean; Lee, Danny; Keall, Paul; Kim, Taeho
Purpose: The accuracy of motion prediction, utilized to overcome the system latency of motion management radiotherapy systems, is hampered by irregularities present in the patients’ respiratory pattern. Audiovisual (AV) biofeedback has been shown to reduce respiratory irregularities. The aim of this study was to test the hypothesis that AV biofeedback improves the accuracy of motion prediction.
Gimenez-Lopez, J. L.; Royo, T. Magal; Laborda, Jesus Garcia; Dunai, Larisa
The paper describes the adaptation methods of the active methodologies of the new European higher education area in the new Audiovisual Communication degree under the perspective of subjects related to the area of the interactive communication in Europe. The proposed active methodologies have been experimentally implemented into the new academic…
Holloway, Ian D; van Atteveldt, Nienke; Blomert, Leo; Ansari, Daniel
Reading skills are indispensible in modern technological societies. In transparent alphabetic orthographies, such as Dutch, reading skills build on associations between letters and speech sounds (LS pairs). Previously, we showed that the superior temporal cortex (STC) of Dutch readers is sensitive to the congruency of LS pairs. Here, we used functional magnetic resonance imaging to investigate whether a similar congruency sensitivity exists in STC of readers of the more opaque English orthography, where the relation among LS pairs is less reliable. Eighteen subjects passively perceived congruent and incongruent audiovisual pairs of different levels of transparency in English: letters and speech sounds (LS; irregular), letters and letter names (LN; fairly transparent), and numerals and number names (NN; transparent). In STC, we found congruency effects for NN and LN, but no effects in the predicted direction (congruent > incongruent) for LS pairs. These findings contrast with previous results obtained from Dutch readers. These data indicate that, through education, the STC becomes tuned to the congruency of transparent audiovisual pairs, but suggests a different neural processing of irregular mappings. The orthographic dependency of LS integration underscores cross-linguistic differences in the neural basis of reading and potentially has important implications for dyslexia interventions across languages. PMID:24351976
Detmer, W M; Shiffman, S.; Wyatt, J. C.; Friedman, C P; Lane, C D; Fagan, L. M.
OBJECTIVE: Evaluate the performance of a continuous-speech interface to a decision support system. DESIGN: The authors performed a prospective evaluation of a speech interface that matches unconstrained utterances of physicians with controlled-vocabulary terms from Quick Medical Reference (QMR). The performance of the speech interface was assessed in two stages: in the real-time experiment, physician subjects viewed audiovisual stimuli intended to evoke clinical findings, spoke a description ...
Dzati Athiar Ramli
Full Text Available In this study, we propose a novel approach for speaker verification system that uses a spectrogram image as features and Unconstrained Minimum Average Correlation Energy (UMACE filters as classifiers. Since speech signal is a behavioral signal, the speech data has a tendency not to consistently reproduce due to the change of speaking rates, health, emotional conditions, temperature and humidity. In order to overcome this problem, a modification of UMACE filters architecture is proposed by executing a multi-sample fusion using speech and lipreading data. So as to evaluate the outstanding fusion scheme, five multisample fusion strategies, i.e. maximum, minimum, median, average and majority vote are first experimented using thespeech signal data. Afterward, the performance of the audiovisualsystem using the enhanced UMACE filters is then tested. Here, lipreading data is combined to the audio samples pool and the outstanding fusion scheme that found in prior experiment is used as multi-sample fusion scheme. The Digit Database had been used for performance evaluation and the performance up to 99.64% is achieved by using the enhanced UMACE filters for the speech only system which is 6.89% improvement compared with the base line approach. Subsequently, the implementation of the audio-visual system is observed to be significant in order to broaden the PSR score interval between the authentic and imposter data as well as to further improve the performance of audio only system that offer toward a robust verification system.
Jassem, W.; Kudzdela, H.; Domagala, P.
A simple algorithm is proposed for automatic phonetic segmentation of the acoustic speech signal on the MERA 303 desk-top minicomputer. The algorithm is verified with Polish linguistic material spoken by two subjects. The proposed algorithm detects approximately 80 percent of the boundaries between enunciated segments correctly, a result no worse than that obtained using more complex methods. Speech recognition programs are discussed as speech perception models, and the nature of categorical perception of human speech sounds is examined.
In human face-to-face interaction, social affects should be distinguished from emotional expressions, triggered by innate and involuntary controls of the speaker, by their nature of voluntary controls expressed within the audiovisual prosody and by their important role in the realization of speech acts. They also put into circulation between the interlocutors the social context and social relationship information. The prosody is a main vector of social affects and its cross-language variabili...
Messerlin, Patrick; Cocq, Emmanuel
Under the 1994 Uruguay Round Agreement, only nineteen WTO members have made commitments in audiovisual services in their GATS schedule (see table 7). As illustrated in table 7, these commitments are generally of limited scope and magnitude.1 Among the large audiovisual producers, only the United States has taken substantial commitments at the various stages of audiovisual production, distribution, and transmission. Although more limited, the commitments by India (the world’s largest film prod...
Huang, Thomas S.; Zeng, Zhihong
Automatic affective expression recognition has attracted more and more attention of researchers from different disciplines, which will significantly contribute to a new paradigm for human computer interaction (affect-sensitive interfaces, socially intelligent environments) and advance the research in the affect-related fields including psychology, psychiatry, and education. Multimodal information integration is a process that enables human to assess affective states robustly and flexibly. In order to understand the richness and subtleness of human emotion behavior, the computer should be able to integrate information from multiple sensors. We introduce in this paper our efforts toward machine understanding of audio-visual affective behavior, based on both deliberate and spontaneous displays. Some promising methods are presented to integrate information from both audio and visual modalities. Our experiments show the advantage of audio-visual fusion in affective expression recognition over audio-only or visual-only approaches.
Liberman, A. M.
This report is one of a regular series on the status and progress of studies on the nature of speech, instrumentation for its investigation and practical applications. Manuscripts cover the following topics: Speech perception and memory coding in relation to reading ability; The use of orthographic structure by deaf adults: Recognition of finger-spelled letters; Exploring the information support for speech; The stream of speech; Using the acoustic signal to make inferences about place and duration of tongue-palate contact. Patterns of human interlimb coordination emerge from the the properties of nonlinear limit cycle oscillatory processes: Theory and data; Motor control: Which themes do we orchestrate? Exploring the nature of motor control in Down's syndrome; Periodicity and auditory memory: A pilot study; Reading skill and language skill: On the role of sign order and morphological structure in memory for American Sign Language sentences; Perception of nasal consonants with special reference to Catalan; and Speech production Characteristics of the hearing impaired.
Giménez López, José Luis; Magal Royo, Teresa; García Laborda, Jesús; Dunai Dunai, Larisa
The paper describes the adaptation methods of the active methodologies of the new European higher education area in the new Audiovisual Communication degree under the perspective of subjects related to the area of the interactive communication in Europe. The proposed active methodologies have been experimentally implemented into the new academic curricular development of the subjects, leading to a docent adjustment for the professors who currently teach lectures and who have been evaluated fo...