Theobald, Barry J.; Harvey, Richard; Cox, Stephen J.; Lewis, Colin; Owen, Gari P.
Accurate lip-reading techniques would be of enormous benefit for agencies involved in counter-terrorism and other law-enforcement areas. Unfortunately, there are very few skilled lip-readers, and it is apparently a difficult skill to transmit, so the area is under-resourced. In this paper we investigate the possibility of making the lip-reading task more amenable to a wider range of operators by enhancing lip movements in video sequences using active appearance models. These are generative, parametric models commonly used to track faces in images and video sequences. The parametric nature of the model allows a face in an image to be encoded in terms of a few tens of parameters, while the generative nature allows faces to be re-synthesised using the parameters. The aim of this study is to determine if exaggerating lip-motions in video sequences by amplifying the parameters of the model improves lip-reading ability. We also present results of lip-reading tests undertaken by experienced (but non-expert) adult subjects who claim to use lip-reading in their speech recognition process. The results, which are comparisons of word error-rates on unprocessed and processed video, are mixed. We find that there appears to be the potential to improve the word error rate but, for the method to improve the intelligibility there is need for more sophisticated tracking and visual modelling. Our technique can also act as an expression or visual gesture amplifier and so has applications to animation and the presentation of information via avatars or synthetic humans.
To explore the relationship between reading and lipreading and to determine whether readers and lipreaders use similar strategies to comprehend verbal messages, 60 female junior and sophomore high school students--30 good and 30 poor readers--were given a filmed lipreading test, a test to measure eye-voice span, a test of cloze ability, and a test of their ability to comprehend printed material presented one word at a time in the absence of an opportunity to regress or scan ahead. The results of this study indicated that (a) there is a significant relationship between reading and lipreading ability; (b) although good readers may be either good or poor lipreaders, poor readers are more likely to be poor than good lipreaders; (c) there are similarities in the strategies used by readers and lipreaders in their approach to comprehending spoken and written material; (d) word-by-word reading of continuous prose appears to be a salient characteristic of both poor reading and poor lipreading ability; and (c) good readers and lipreaders do not engage in word-by-word reading but rather use a combination of visual and linguistic cues to interpret written and spoken messages.
Petridis, Stavros; Wang, Yujiang; Li, Zuwei; Pantic, Maja
Non-frontal lip views contain useful information which can be used to enhance the performance of frontal view lipreading. However, the vast majority of recent lipreading works, including the deep learning approaches which significantly outperform traditional approaches, have focused on frontal mouth
Heikkilä, Jenni; Lonka, Eila; Ahola, Sanna; Meronen, Auli; Tiippana, Kaisa
Purpose: Lipreading and its cognitive correlates were studied in school-age children with typical language development and delayed language development due to specific language impairment (SLI). Method: Forty-two children with typical language development and 20 children with SLI were tested by using a word-level lipreading test and an extensive…
Gardiner, J M; Gathercole, S E; Gregg, V H
In both free and backward recall, it is shown that auditory but not visual recency is greatly disrupted when subjects have to lipread, rather than read, a series of numbers in a distractor task interpolated between list presentation and recall. This selective interference effect extends the generality of a finding reported by Spoehr and Corin (1978) and adds to an accumulating body of evidence that seems inconsistent with acoustic or echoic memory interpretations of the enhanced recency recall typically observed in comparing auditory with visual presentation. Alternative interpretations are briefly considered.
Ruytjens, Liesbet; Albers, Frans; van Dijk, Pim; Wit, Hero; Willemsen, Antoon
In the past, researchers investigated silent lipreading in normal hearing subjects with functional neuroimaging tools and showed how the brain processes visual stimuli that are normally accompanied by an auditory counterpart. Previously, we showed activation differences between males and females in
Aparicio, Mario; Peigneux, Philippe; Charlier, Brigitte; Balériaux, Danielle; Kavec, Martin; Leybaert, Jacqueline
We present here the first neuroimaging data for perception of Cued Speech (CS) by deaf adults who are native users of CS. CS is a visual mode of communicating a spoken language through a set of manual cues which accompany lipreading and disambiguate it. With CS, sublexical units of the oral language are conveyed clearly and completely through the visual modality without requiring hearing. The comparison of neural processing of CS in deaf individuals with processing of audiovisual (AV) speech in normally hearing individuals represents a unique opportunity to explore the similarities and differences in neural processing of an oral language delivered in a visuo-manual vs. an AV modality. The study included deaf adult participants who were early CS users and native hearing users of French who process speech audiovisually. Words were presented in an event-related fMRI design. Three conditions were presented to each group of participants. The deaf participants saw CS words (manual + lipread), words presented as manual cues alone, and words presented to be lipread without manual cues. The hearing group saw AV spoken words, audio-alone and lipread-alone. Three findings are highlighted. First, the middle and superior temporal gyrus (excluding Heschl’s gyrus) and left inferior frontal gyrus pars triangularis constituted a common, amodal neural basis for AV and CS perception. Second, integration was inferred in posterior parts of superior temporal sulcus for audio and lipread information in AV speech, but in the occipito-temporal junction, including MT/V5, for the manual cues and lipreading in CS. Third, the perception of manual cues showed a much greater overlap with the regions activated by CS (manual + lipreading) than lipreading alone did. This supports the notion that manual cues play a larger role than lipreading for CS processing. The present study contributes to a better understanding of the role of manual cues as support of visual speech perception in the framework
Schreitmüller, Stefan; Frenken, Miriam; Bentz, Lüder; Ortmann, Magdalene; Walger, Martin; Meister, Hartmut
Watching a talker's mouth is beneficial for speech reception (SR) in many communication settings, especially in noise and when hearing is impaired. Measures for audiovisual (AV) SR can be valuable in the framework of diagnosing or treating hearing disorders. This study addresses the lack of standardized methods in many languages for assessing lipreading, AV gain, and integration. A new method is validated that supplements a German speech audiometric test with visualizations of the synthetic articulation of an avatar that was used, for it is feasible to lip-sync auditory speech in a highly standardized way. Three hypotheses were formed according to the literature on AV SR that used live or filmed talkers. It was tested whether respective effects could be reproduced with synthetic articulation: (1) cochlear implant (CI) users have a higher visual-only SR than normal-hearing (NH) individuals, and younger individuals obtain higher lipreading scores than older persons. (2) Both CI and NH gain from presenting AV over unimodal (auditory or visual) sentences in noise. (3) Both CI and NH listeners efficiently integrate complementary auditory and visual speech features. In a controlled, cross-sectional study with 14 experienced CI users (mean age 47.4) and 14 NH individuals (mean age 46.3, similar broad age distribution), lipreading, AV gain, and integration of a German matrix sentence test were assessed. Visual speech stimuli were synthesized by the articulation of the Talking Head system "MASSY" (Modular Audiovisual Speech Synthesizer), which displayed standardized articulation with respect to the visibility of German phones. In line with the hypotheses and previous literature, CI users had a higher mean visual-only SR than NH individuals (CI, 38%; NH, 12%; p < 0.001). Age was correlated with lipreading such that within each group, younger individuals obtained higher visual-only scores than older persons (rCI = -0.54; p = 0.046; rNH = -0.78; p < 0.001). Both CI and NH
Ma, Wei Ji; Zhou, Xiang; Ross, Lars A; Foxe, John J; Parra, Lucas C
Watching a speaker's facial movements can dramatically enhance our ability to comprehend words, especially in noisy environments. From a general doctrine of combining information from different sensory modalities (the principle of inverse effectiveness), one would expect that the visual signals would be most effective at the highest levels of auditory noise. In contrast, we find, in accord with a recent paper, that visual information improves performance more at intermediate levels of auditory noise than at the highest levels, and we show that a novel visual stimulus containing only temporal information does the same. We present a Bayesian model of optimal cue integration that can explain these conflicts. In this model, words are regarded as points in a multidimensional space and word recognition is a probabilistic inference process. When the dimensionality of the feature space is low, the Bayesian model predicts inverse effectiveness; when the dimensionality is high, the enhancement is maximal at intermediate auditory noise levels. When the auditory and visual stimuli differ slightly in high noise, the model makes a counterintuitive prediction: as sound quality increases, the proportion of reported words corresponding to the visual stimulus should first increase and then decrease. We confirm this prediction in a behavioral experiment. We conclude that auditory-visual speech perception obeys the same notion of optimality previously observed only for simple multisensory stimuli.
Wei Ji Ma
Full Text Available Watching a speaker's facial movements can dramatically enhance our ability to comprehend words, especially in noisy environments. From a general doctrine of combining information from different sensory modalities (the principle of inverse effectiveness, one would expect that the visual signals would be most effective at the highest levels of auditory noise. In contrast, we find, in accord with a recent paper, that visual information improves performance more at intermediate levels of auditory noise than at the highest levels, and we show that a novel visual stimulus containing only temporal information does the same. We present a Bayesian model of optimal cue integration that can explain these conflicts. In this model, words are regarded as points in a multidimensional space and word recognition is a probabilistic inference process. When the dimensionality of the feature space is low, the Bayesian model predicts inverse effectiveness; when the dimensionality is high, the enhancement is maximal at intermediate auditory noise levels. When the auditory and visual stimuli differ slightly in high noise, the model makes a counterintuitive prediction: as sound quality increases, the proportion of reported words corresponding to the visual stimulus should first increase and then decrease. We confirm this prediction in a behavioral experiment. We conclude that auditory-visual speech perception obeys the same notion of optimality previously observed only for simple multisensory stimuli.
Baart, M.; Stekelenburg, J.J.; Vroomen, J.
Lip-read speech is integrated with heard speech at various neural levels. Here, we investigated the extent to which lip-read induced modulations of the auditory N1 and P2 (measured with EEG) are indicative of speech-specific audiovisual integration, and we explored to what extent the ERPs were
This paper describes evaluation results for a software programme that is intended to be used as a training-aid for lipreading in German. Tests were carried out in schools for hearing-impaired children in Germany which indicate that the ability to lipread increases significantly already after use...... of the software during a short period of time....
Maidment, David W; Macken, Bill; Jones, Dylan M
Functional similarities in verbal memory performance across presentation modalities (written, heard, lipread) are often taken to point to a common underlying representational form upon which the modalities converge. We show here instead that the pattern of performance depends critically on presentation modality and different mechanisms give rise to superficially similar effects across modalities. Lipread recency is underpinned by different mechanisms to auditory recency, and while the effect of an auditory suffix on an auditory list is due to the perceptual grouping of the suffix with the list, the corresponding effect with lipread speech is due to misidentification of the lexical content of the lipread suffix. Further, while a lipread suffix does not disrupt auditory recency, an auditory suffix does disrupt recency for lipread lists. However, this effect is due to attentional capture ensuing from the presentation of an unexpected auditory event, and is evident both with verbal and nonverbal auditory suffixes. These findings add to a growing body of evidence that short-term verbal memory performance is determined by modality-specific perceptual and motor processes, rather than by the storage and manipulation of phonological representations. Copyright © 2013 Elsevier B.V. All rights reserved.
Full Text Available The study identifies some self-regulation strategies used by deaf children in order to make their speech more intelligible. To achieve self-control while speaking, the child with severe hearing loss needs not only a high level of intelligence, but also an effective lip-reading capability and a strong intrinsic motivation. This is the reason why there are many cases of children with a high level of intelligence, but with a mediocre lip-reading capability and others with a lower level of intelligence, but with a good lip-reading capability. These differences also depend on the degree of hearing loss. Among the self-regulation strategies used by the children that achieve an intelligible speech are: the cognitive and meta-cognitive strategies, the motivational strategies etc. These results are important while designing the therapeutic activities, and especially the speech intelligibility factor being crucial in the social integration of those children with hearing impairment.
Tye-Murray, Nancy; Spehar, Brent P; Myerson, Joel; Hale, Sandra; Sommers, Mitchell S
Common-coding theory posits that (1) perceiving an action activates the same representations of motor plans that are activated by actually performing that action, and (2) because of individual differences in the ways that actions are performed, observing recordings of one's own previous behavior activates motor plans to an even greater degree than does observing someone else's behavior. We hypothesized that if observing oneself activates motor plans to a greater degree than does observing others, and if these activated plans contribute to perception, then people should be able to lipread silent video clips of their own previous utterances more accurately than they can lipread video clips of other talkers. As predicted, two groups of participants were able to lipread video clips of themselves, recorded more than two weeks earlier, significantly more accurately than video clips of others. These results suggest that visual input activates speech motor activity that links to word representations in the mental lexicon.
Keetels, M.N.; Schakel, L.; de Bonte, M.; Vroomen, J.
Listeners adjust their phonetic categories to cope with variations in the speech signal (phonetic recalibration). Previous studies have shown that lipread speech (and word knowledge) can adjust the perception of ambiguous speech and can induce phonetic adjustments (Bertelson, Vroomen, & de Gelder in
Findings from speech and hearing tests of older people in South Dakota community senior programs indicate the need for better testing and therapy procedures. Lipreading may be more effective than hearing aids, and factors other than hearing may be involved. Some problems and needs are noted. (MF)
LOWELL, EDGAR L.; AND OTHERS
AUDIOVISUAL PROGRAMS FOR PARENTS OF DEAF CHILDREN WERE DEVELOPED AND EVALUATED. EIGHTEEN SOUND FILMS AND ACCOMPANYING RECORDS PRESENTED INFORMATION ON HEARING, LIPREADING AND SPEECH, AND ATTEMPTED TO CHANGE PARENTAL ATTITUDES TOWARD CHILDREN AND SPOUSES. TWO VERSIONS OF THE FILMS AND RECORDS WERE NARRATED BY (1) "STARS" WHO WERE…
Picou, Erin M; Ricketts, Todd A; Hornsby, Benjamin W Y
To investigate the effect of visual cues on listening effort as well as whether predictive variables such as working memory capacity (WMC) and lipreading ability affect the magnitude of listening effort. Twenty participants with normal hearing were tested using a paired-associates recall task in 2 conditions (quiet and noise) and 2 presentation modalities (audio only [AO] and auditory-visual [AV]). Signal-to-noise ratios were adjusted to provide matched speech recognition across audio-only and AV noise conditions. Also measured were subjective perceptions of listening effort and 2 predictive variables: (a) lipreading ability and (b) WMC. Objective and subjective results indicated that listening effort increased in the presence of noise, but on average the addition of visual cues did not significantly affect the magnitude of listening effort. Although there was substantial individual variability, on average participants who were better lipreaders or had larger WMCs demonstrated reduced listening effort in noise in AV conditions. Overall, the results support the hypothesis that integrating auditory and visual cues requires cognitive resources in some participants. The data indicate that low lipreading ability or low WMC is associated with relatively effortful integration of auditory and visual information in noise.
Ellis, Jason A.
This article is about the deaf education methods debate in the public schools of Toronto, Canada. The author demonstrates how pure oralism (lip-reading and speech instruction to the complete exclusion of sign language) and day school classes for deaf schoolchildren were introduced as a progressive school reform in 1922. Plans for further oralist…
Rietveld-van Wingerden, M.; Westerman, W.E.
During the last two centuries there has been a methodological struggle over teaching the deaf. Do deaf people learn to communicate by means of gestures and signs (the "manual method") or is it important for them to learn speech and lip-reading (the "oral method")? In the second half of the
Rietveld-van Wingerden, M.; Westerman, W.
During the last two centuries there has been a methodological struggle over teaching the deaf. Do deaf people learn to communicate by means of gestures and signs (the "manual method") or is it important for them to learn speech and lip-reading (the "oral method")? In the second half of the
van Laarhoven, Thijs; Keetels, Mirjam; Schakel, Lemmy; Vroomen, Jean
Individuals with developmental dyslexia (DD) may experience, besides reading problems, other speech-related processing deficits. Here, we examined the influence of visual articulatory information (lip-read speech) at various levels of background noise on auditory word recognition in children and adults with DD. We found that children with a…
Williams, Joshua T; Darcy, Isabelle; Newman, Sharlene D
The aim of the present study was to characterize effects of learning a sign language on the processing of a spoken language. Specifically, audiovisual phoneme comprehension was assessed before and after 13 weeks of sign language exposure. L2 ASL learners performed this task in the fMRI scanner. Results indicated that L2 American Sign Language (ASL) learners' behavioral classification of the speech sounds improved with time compared to hearing nonsigners. Results indicated increased activation in the supramarginal gyrus (SMG) after sign language exposure, which suggests concomitant increased phonological processing of speech. A multiple regression analysis indicated that learner's rating on co-sign speech use and lipreading ability was correlated with SMG activation. This pattern of results indicates that the increased use of mouthing and possibly lipreading during sign language acquisition may concurrently improve audiovisual speech processing in budding hearing bimodal bilinguals. Copyright © 2015 Elsevier B.V. All rights reserved.
Lebib, Riadh; Papo, David; Douiri, Abdel; de Bode, Stella; Gillon Dowens, Margaret; Baudonnière, Pierre-Marie
Lipreading reliably improve speech perception during face-to-face conversation. Within the range of good dubbing, however, adults tolerate some audiovisual (AV) discrepancies and lipreading, then, can give rise to confusion. We used event-related brain potentials (ERPs) to study the perceptual strategies governing the intermodal processing of dynamic and bimodal speech stimuli, either congruently dubbed or not. Electrophysiological analyses revealed that non-coherent audiovisual dubbings modulated in amplitude an endogenous ERP component, the N300, we compared to a 'N400-like effect' reflecting the difficulty to integrate these conflicting pieces of information. This result adds further support for the existence of a cerebral system underlying 'integrative processes' lato sensu. Further studies should take advantage of this 'N400-like effect' with AV speech stimuli to open new perspectives in the domain of psycholinguistics.
Bayard, Clémence; Colin, Cécile; Leybaert, Jacqueline
Speech perception for both hearing and deaf people involves an integrative process between auditory and lip-reading information. In order to disambiguate information from lips, manual cues from Cued Speech may be added. Cued Speech (CS) is a system of manual aids developed to help deaf people to clearly and completely understand speech visually (Cornett, 1967). Within this system, both labial and manual information, as lone input sources, remain ambiguous. Perceivers, therefore, have to combi...
Brand, Yves; Senn, Pascal; Kompis, Martin; Dillier, Norbert; Allum, John H. J.
The cochlear implant (CI) is one of the most successful neural prostheses developed to date. It offers artificial hearing to individuals with profound sensorineural hearing loss and with insufficient benefit from conventional hearing aids. The first implants available some 30 years ago provided a limited sensation of sound. The benefit for users of these early systems was mostly a facilitation of lip-reading based communication rather than an understanding of speech. Considerable progress has...
Baart, Martijn; Stekelenburg, Jeroen J; Vroomen, Jean
Lip-read speech is integrated with heard speech at various neural levels. Here, we investigated the extent to which lip-read induced modulations of the auditory N1 and P2 (measured with EEG) are indicative of speech-specific audiovisual integration, and we explored to what extent the ERPs were modulated by phonetic audiovisual congruency. In order to disentangle speech-specific (phonetic) integration from non-speech integration, we used Sine-Wave Speech (SWS) that was perceived as speech by half of the participants (they were in speech-mode), while the other half was in non-speech mode. Results showed that the N1 obtained with audiovisual stimuli peaked earlier than the N1 evoked by auditory-only stimuli. This lip-read induced speeding up of the N1 occurred for listeners in speech and non-speech mode. In contrast, if listeners were in speech-mode, lip-read speech also modulated the auditory P2, but not if listeners were in non-speech mode, thus revealing speech-specific audiovisual binding. Comparing ERPs for phonetically congruent audiovisual stimuli with ERPs for incongruent stimuli revealed an effect of phonetic stimulus congruency that started at ~200 ms after (in)congruence became apparent. Critically, akin to the P2 suppression, congruency effects were only observed if listeners were in speech mode, and not if they were in non-speech mode. Using identical stimuli, we thus confirm that audiovisual binding involves (partially) different neural mechanisms for sound processing in speech and non-speech mode. © 2013 Published by Elsevier Ltd.
Rosenblum, Lawrence D.
Speech perception is inherently multimodal. Visual speech (lip-reading) information is used by all perceivers and readily integrates with auditory speech. Imaging research suggests that the brain treats auditory and visual speech similarly. These findings have led some researchers to consider that speech perception works by extracting amodal information that takes the same form across modalities. From this perspective, speech integration is a property of the input information itself. Amodal s...
Hertrich, Ingo; Dietrich, Susanne; Ackermann, Hermann
In blind people, the visual channel cannot assist face-to-face communication via lipreading or visual prosody. Nevertheless, the visual system may enhance the evaluation of auditory information due to its cross-links to (1) the auditory system, (2) supramodal representations, and (3) frontal action-related areas. Apart from feedback or top-down support of, for example, the processing of spatial or phonological representations, experimental data have shown that the visual system can impact aud...
Picou, Erin M; Ricketts, Todd A; Hornsby, Benjamin W Y
The purpose of this article was to evaluate factors that influence the listening effort experienced when processing speech for people with hearing loss. Specifically, the change in listening effort resulting from introducing hearing aids, visual cues, and background noise was evaluated. An additional exploratory aim was to investigate the possible relationships between the magnitude of listening effort change and individual listeners' working memory capacity, verbal processing speed, or lipreading skill. Twenty-seven participants with bilateral sensorineural hearing loss were fitted with linear behind-the-ear hearing aids and tested using a dual-task paradigm designed to evaluate listening effort. The primary task was monosyllable word recognition and the secondary task was a visual reaction time task. The test conditions varied by hearing aids (unaided, aided), visual cues (auditory-only, auditory-visual), and background noise (present, absent). For all participants, the signal to noise ratio was set individually so that speech recognition performance in noise was approximately 60% in both the auditory-only and auditory-visual conditions. In addition to measures of listening effort, working memory capacity, verbal processing speed, and lipreading ability were measured using the Automated Operational Span Task, a Lexical Decision Task, and the Revised Shortened Utley Lipreading Test, respectively. In general, the effects measured using the objective measure of listening effort were small (~10 msec). Results indicated that background noise increased listening effort, and hearing aids reduced listening effort, while visual cues did not influence listening effort. With regard to the individual variables, verbal processing speed was negatively correlated with hearing aid benefit for listening effort; faster processors were less likely to derive benefit. Working memory capacity, verbal processing speed, and lipreading ability were related to benefit from visual cues. No
Thong, Jiun Fong; Sung, John K K; Wong, Terence K C; Tong, Michael C F
To describe our experience and outcomes of auditory brainstem implantation (ABI) in Chinese patients with Neurofibromatosis Type II (NF2). Retrospective case review. Tertiary referral center. Patients with NF2 who received ABIs. Between 1997 and 2014, eight patients with NF2 received 9 ABIs after translabyrinthine removal of their vestibular schwannomas. One patient did not have auditory response using the ABI after activation. Environmental sounds could be differentiated by six (75%) patients after 6 months of ABI use (mean score 46% [range 28-60%]), and by five (63%) patients after 1 year (mean score 57% [range 36-76%]) and 2 years of ABI use (mean score 48% [range 24-76%]). Closed-set word identification was possible in four (50%) patients after 6 months (mean score 39% [range 12-72%]), 1 year (mean score 68% [range 48-92%]), and 2 years of ABI use (mean score 62% [range 28-100%]). No patient demonstrated open-set sentence recognition in quiet in the ABI-only condition. However, the use of ABI together with lip-reading conferred an improvement over lip-reading alone in open-set sentence recognition scores in two (25%) patients after 6 months of ABI use (mean improvement 46%), and five (63%) patients after 1 year (mean improvement 25%) and 2 years of ABI use (mean improvement 28%). At 2 years postoperatively, three (38%) patients remained ABI users. This is the only published study to date examining ABI outcomes in Cantonese-speaking Chinese NF2 patients and the data seems to show poorer outcomes compared with English-speaking and other nontonal language-speaking NF2 patients. Environmental sound awareness and lip-reading enhancement are the main benefits observed in our patients. More work is needed to improve auditory implant speech-processing strategies for tonal languages and these advancements may yield better speech perception outcomes in the future.
Eberhardt, Silvio P; Auer, Edward T; Bernstein, Lynne E
In a series of studies we have been investigating how multisensory training affects unisensory perceptual learning with speech stimuli. Previously, we reported that audiovisual (AV) training with speech stimuli can promote auditory-only (AO) perceptual learning in normal-hearing adults but can impede learning in congenitally deaf adults with late-acquired cochlear implants. Here, impeder and promoter effects were sought in normal-hearing adults who participated in lipreading training. In Experiment 1, visual-only (VO) training on paired associations between CVCVC nonsense word videos and nonsense pictures demonstrated that VO words could be learned to a high level of accuracy even by poor lipreaders. In Experiment 2, visual-auditory (VA) training in the same paradigm but with the addition of synchronous vocoded acoustic speech impeded VO learning of the stimuli in the paired-associates paradigm. In Experiment 3, the vocoded AO stimuli were shown to be less informative than the VO speech. Experiment 4 combined vibrotactile speech stimuli with the visual stimuli during training. Vibrotactile stimuli were shown to promote visual perceptual learning. In Experiment 5, no-training controls were used to show that training with visual speech carried over to consonant identification of untrained CVCVC stimuli but not to lipreading words in sentences. Across this and previous studies, multisensory training effects depended on the functional relationship between pathways engaged during training. Two principles are proposed to account for stimulus effects: (1) Stimuli presented to the trainee's primary perceptual pathway will impede learning by a lower-rank pathway. (2) Stimuli presented to the trainee's lower rank perceptual pathway will promote learning by a higher-rank pathway. The mechanisms supporting these principles are discussed in light of multisensory reverse hierarchy theory (RHT).
Ryumin, D.; Karpov, A. A.
In this article, we propose a new method for parametric representation of human's lips region. The functional diagram of the method is described and implementation details with the explanation of its key stages and features are given. The results of automatic detection of the regions of interest are illustrated. A speed of the method work using several computers with different performances is reported. This universal method allows applying parametrical representation of the speaker's lipsfor the tasks of biometrics, computer vision, machine learning, and automatic recognition of face, elements of sign languages, and audio-visual speech, including lip-reading.
Strand, Julia F
A widely agreed-upon feature of spoken word recognition is that multiple lexical candidates in memory are simultaneously activated in parallel when a listener hears a word, and that those candidates compete for recognition (Luce, Goldinger, Auer, & Vitevitch, Perception 62:615-625, 2000; Luce & Pisoni, Ear and Hearing 19:1-36, 1998; McClelland & Elman, Cognitive Psychology 18:1-86, 1986). Because the presence of those competitors influences word recognition, much research has sought to quantify the processes of lexical competition. Metrics that quantify lexical competition continuously are more effective predictors of auditory and visual (lipread) spoken word recognition than are the categorical metrics traditionally used (Feld & Sommers, Speech Communication 53:220-228, 2011; Strand & Sommers, Journal of the Acoustical Society of America 130:1663-1672, 2011). A limitation of the continuous metrics is that they are somewhat computationally cumbersome and require access to existing speech databases. This article describes the Phi-square Lexical Competition Database (Phi-Lex): an online, searchable database that provides access to multiple metrics of auditory and visual (lipread) lexical competition for English words, available at www.juliastrand.com/phi-lex .
Aniansson, G.; Peterson, Y.
Speech intelligibility (PB words) in traffic-like noise was investigated in a laboratory situation simulating three common listening situations, indoors at 1 and 4 m and outdoors at 1 m. The maximum noise levels still permitting 75% intelligibility of PB words in these three listening situations were also defined. A total of 269 persons were examined. Forty-six had normal hearing, 90 a presbycusis-type hearing loss, 95 a noise-induced hearing loss and 38 a conductive hearing loss. In the indoor situation the majority of the groups with impaired hearing retained good speech intelligibility in 40 dB(A) masking noise. Lowering the noise level to less than 40 dB(A) resulted in a minor, usually insignificant, improvement in speech intelligibility. Listeners with normal hearing maintained good speech intelligibility in the outdoor listening situation at noise levels up to 60 dB(A), without lip-reading (i.e., using non-auditory information). For groups with impaired hearing due to age and/or noise, representing 8% of the population in Sweden, the noise level outdoors had to be lowered to less than 50 dB(A), in order to achieve good speech intelligibility at 1 m without lip-reading.
Hashimoto, Masahiro; Kumashiro, Masaharu
The purpose of this study was to investigate the limitations of lip-reading advantages for Japanese young adults by desynchronizing visual and auditory information in speech. In the experiment, audio-visual speech stimuli were presented under the six test conditions: audio-alone, and audio-visually with either 0, 60, 120, 240 or 480 ms of audio delay. The stimuli were the video recordings of a face of a female Japanese speaking long and short Japanese sentences. The intelligibility of the audio-visual stimuli was measured as a function of audio delays in sixteen untrained young subjects. Speech intelligibility under the audio-delay condition of less than 120 ms was significantly better than that under the audio-alone condition. On the other hand, the delay of 120 ms corresponded to the mean mora duration measured for the audio stimuli. The results implied that audio delays of up to 120 ms would not disrupt lip-reading advantage, because visual and auditory information in speech seemed to be integrated on a syllabic time scale. Potential applications of this research include noisy workplace in which a worker must extract relevant speech from all the other competing noises.
JOSE MIGUEL MESTRE
Full Text Available PERVALE-S was developed to assess the ability of DP to perceive both social and basic emotions. PERVALE-S presents different sets of visual images of a real deaf person expressing both basic and social emotions, according to the normative standard of emotional expressions in Spanish sign language. Emotional expression stimuli were presented at two different levels of intensity (1: low; and 2: high because DP do not distinguish the same range of frequency adverbs as hearing people (HP do. Then, participants had to click on the more suitable emotional expression. PERVALE-S contains video instructions of a sign language interpreter to improve DP’s understanding about how to use the software. DP had to watch the videos before answering the items. To test PERVALE-S, a sample of 56 individuals was recruited (18 signers, 8 lip-readers, and 30 hearing people. Participants also responded to a personality test (HSPQ adapted and a fluid intelligence measure (RAPM. Moreover, four teachers from deaf center rated all deaf participants. Results: there were no significant differences between DP and HP in performance in PERVALE-S. Confusion matrices revealed that embarrassment, envy, and jealousy were worse perceived by participants (DP and HP. There were not significant differences of emotional perception performance among lip-readings, signers, and hearings. Regarding emotional perception ability (EPA, basic emotion performance was positively related to consciousness, and negatively with tension. Social emotion performance was positively related to age and fluid intelligence, and negatively related to dominance. When an adapted instrument for assessing EPA is developed without language implications, the performance among DP and HP are closer. This instrument could have experimental interest in order of eliminating language influences in EPA.
Mestre, José M.; Larrán, Cristina; Herrero, Joaquín; Guil, Rocío; de la Torre, Gabriel G.
A poorly understood aspect of deaf people (DP) is how their emotional information is processed. Verbal ability is key to improve emotional knowledge in people. Nevertheless, DP are unable to distinguish intonation, intensity, and the rhythm of language due to lack of hearing. Some DP have acquired both lip-reading abilities and sign language, but others have developed only sign language. PERVALE-S was developed to assess the ability of DP to perceive both social and basic emotions. PERVALE-S presents different sets of visual images of a real deaf person expressing both basic and social emotions, according to the normative standard of emotional expressions in Spanish Sign Language. Emotional expression stimuli were presented at two different levels of intensity (1: low; and 2: high) because DP do not distinguish an object in the same way as hearing people (HP) do. Then, participants had to click on the more suitable emotional expression. PERVALE-S contains video instructions (given by a sign language interpreter) to improve DP’s understanding about how to use the software. DP had to watch the videos before answering the items. To test PERVALE-S, a sample of 56 individuals was recruited (18 signers, 8 lip-readers, and 30 HP). Participants also performed a personality test (High School Personality Questionnaire adapted) and a fluid intelligence (Gf) measure (RAPM). Moreover, all deaf participants were rated by four teachers for the deaf. Results: there were no significant differences between deaf and HP in performance in PERVALE-S. Confusion matrices revealed that embarrassment, envy, and jealousy were worse perceived. Age was just related to social-emotional tasks (but not in basic emotional tasks). Emotional perception ability was related mainly to warmth and consciousness, but negatively related to tension. Meanwhile, Gf was related to only social-emotional tasks. There were no gender differences. PMID:26300828
Spence, C; Ranson, J; Driver, J
In three experiments, we investigated whether the ease with which distracting sounds can be ignored depends on their distance from fixation and from attended visual events. In the first experiment, participants shadowed an auditory stream of words presented behind their heads, while simultaneously fixating visual lip-read information consistent with the relevant auditory stream, or meaningless "chewing" lip movements. An irrelevant auditory stream of words, which participants had to ignore, was presented either from the same side as the fixated visual stream or from the opposite side. Selective shadowing was less accurate in the former condition, implying that distracting sounds are harder to ignore when fixated. Furthermore, the impairment when fixating toward distractor sounds was greater when speaking lips were fixated than when chewing lips were fixated, suggesting that people find it particularly difficult to ignore sounds at locations that are actively attended for visual lipreading rather than merely passively fixated. Experiments 2 and 3 tested whether these results are specific to cross-modal links in speech perception by replacing the visual lip movements with a rapidly changing stream of meaningless visual shapes. The auditory task was again shadowing, but the active visual task was now monitoring for a specific visual shape at one location. A decrement in shadowing was again observed when participants passively fixated toward the irrelevant auditory stream. This decrement was larger when participants performed a difficult active visual task there versus fixating, but not for a less demanding visual task versus fixation. The implications for cross-modal links in spatial attention are discussed.
Wu, Yu-Hsiang; Stangl, Elizabeth; Pang, Carol; Zhang, Xuyang
Little is known regarding the acoustic features of a stimulus used by listeners to determine the acceptable noise level (ANL). Features suggested by previous research include speech intelligibility (noise is unacceptable when it degrades speech intelligibility to a certain degree; the intelligibility hypothesis) and loudness (noise is unacceptable when the speech-to-noise loudness ratio is poorer than a certain level; the loudness hypothesis). The purpose of the study was to investigate if speech intelligibility or loudness is the criterion feature that determines ANL. To achieve this, test conditions were chosen so that the intelligibility and loudness hypotheses would predict different results. In Experiment 1, the effect of audiovisual (AV) and binaural listening on ANL was investigated; in Experiment 2, the effect of interaural correlation (ρ) on ANL was examined. A single-blinded, repeated-measures design was used. Thirty-two and twenty-five younger adults with normal hearing participated in Experiments 1 and 2, respectively. In Experiment 1, both ANL and speech recognition performance were measured using the AV version of the Connected Speech Test (CST) in three conditions: AV-binaural, auditory only (AO)-binaural, and AO-monaural. Lipreading skill was assessed using the Utley lipreading test. In Experiment 2, ANL and speech recognition performance were measured using the Hearing in Noise Test (HINT) in three binaural conditions, wherein the interaural correlation of noise was varied: ρ = 1 (N(o)S(o) [a listening condition wherein both speech and noise signals are identical across two ears]), -1 (NπS(o) [a listening condition wherein speech signals are identical across two ears whereas the noise signals of two ears are 180 degrees out of phase]), and 0 (N(u)S(o) [a listening condition wherein speech signals are identical across two ears whereas noise signals are uncorrelated across ears]). The results were compared to the predictions made based on the
Huyse, Aurélie; Berthommier, Frédéric; Leybaert, Jacqueline
The aim of the present study was to examine audiovisual speech integration in cochlear-implanted children and in normally hearing children exposed to degraded auditory stimuli. Previous studies have shown that speech perception in cochlear-implanted users is biased toward the visual modality when audition and vision provide conflicting information. Our main question was whether an experimentally designed degradation of the visual speech cue would increase the importance of audition in the response pattern. The impact of auditory proficiency was also investigated. A group of 31 children with cochlear implants and a group of 31 normally hearing children matched for chronological age were recruited. All children with cochlear implants had profound congenital deafness and had used their implants for at least 2 years. Participants had to perform an /aCa/ consonant-identification task in which stimuli were presented randomly in three conditions: auditory only, visual only, and audiovisual (congruent and incongruent McGurk stimuli). In half of the experiment, the visual speech cue was normal; in the other half (visual reduction) a degraded visual signal was presented, aimed at preventing lipreading of good quality. The normally hearing children received a spectrally reduced speech signal (simulating the input delivered by the cochlear implant). First, performance in visual-only and in congruent audiovisual modalities were decreased, showing that the visual reduction technique used here was efficient at degrading lipreading. Second, in the incongruent audiovisual trials, visual reduction led to a major increase in the number of auditory based responses in both groups. Differences between proficient and nonproficient children were found in both groups, with nonproficient children's responses being more visual and less auditory than those of proficient children. Further analysis revealed that differences between visually clear and visually reduced conditions and between
Pendergrass, Kathy M; Nemeth, Lynne; Newman, Susan D; Jenkins, Carolyn M; Jones, Elaine G
Nurse practitioners (NPs), as well as all healthcare clinicians, have a legal and ethical responsibility to provide health care for deaf American Sign Language (ASL) users equal to that of other patients, including effective communication, autonomy, and confidentiality. However, very little is known about the feasibility to provide equitable health care. The purpose of this study was to examine NP perceptions of barriers and facilitators in providing health care for deaf ASL users. Semistructured interviews in a qualitative design using a socio-ecological model (SEM). Barriers were identified at all levels of the SEM. NPs preferred interpreters to facilitate the visit, but were unaware of their role in assuring effective communication is achieved. A professional sign language interpreter was considered a last resort when all other means of communication failed. Gesturing, note-writing, lip-reading, and use of a familial interpreter were all considered facilitators. Interventions are needed at all levels of the SEM. Resources are needed to provide awareness of deaf communication issues and legal requirements for caring for deaf signers for practicing and student NPs. Protocols need to be developed and present in all healthcare facilities for hiring interpreters as well as quick access to contact information for these interpreters. ©2017 American Association of Nurse Practitioners.
Strand, Julia F; Sommers, Mitchell S
Much research has explored how spoken word recognition is influenced by the architecture and dynamics of the mental lexicon (e.g., Luce and Pisoni, 1998; McClelland and Elman, 1986). A more recent question is whether the processes underlying word recognition are unique to the auditory domain, or whether visually perceived (lipread) speech may also be sensitive to the structure of the mental lexicon (Auer, 2002; Mattys, Bernstein, and Auer, 2002). The current research was designed to test the hypothesis that both aurally and visually perceived spoken words are isolated in the mental lexicon as a function of their modality-specific perceptual similarity to other words. Lexical competition (the extent to which perceptually similar words influence recognition of a stimulus word) was quantified using metrics that are well-established in the literature, as well as a statistical method for calculating perceptual confusability based on the phi-square statistic. Both auditory and visual spoken word recognition were influenced by modality-specific lexical competition as well as stimulus word frequency. These findings extend the scope of activation-competition models of spoken word recognition and reinforce the hypothesis (Auer, 2002; Mattys et al., 2002) that perceptual and cognitive properties underlying spoken word recognition are not specific to the auditory domain. In addition, the results support the use of the phi-square statistic as a better predictor of lexical competition than metrics currently used in models of spoken word recognition. © 2011 Acoustical Society of America
Brand, Yves; Senn, Pascal; Kompis, Martin; Dillier, Norbert; Allum, John H J
The cochlear implant (CI) is one of the most successful neural prostheses developed to date. It offers artificial hearing to individuals with profound sensorineural hearing loss and with insufficient benefit from conventional hearing aids. The first implants available some 30 years ago provided a limited sensation of sound. The benefit for users of these early systems was mostly a facilitation of lip-reading based communication rather than an understanding of speech. Considerable progress has been made since then. Modern, multichannel implant systems feature complex speech processing strategies, high stimulation rates and multiple sites of stimulation in the cochlea. Equipped with such a state-of-the-art system, the majority of recipients today can communicate orally without visual cues and can even use the telephone. The impact of CIs on deaf individuals and on the deaf community has thus been exceptional. To date, more than 300,000 patients worldwide have received CIs. In Switzerland, the first implantation was performed in 1977 and, as of 2012, over 2,000 systems have been implanted with a current rate of around 150 CIs per year. The primary purpose of this article is to provide a contemporary overview of cochlear implantation, emphasising the situation in Switzerland.
Laccourreye, L; Ettienne, V; Prang, I; Couloigner, V; Garabedian, E-N; Loundon, N
To analyze speech in children with profound hearing loss following congenital cytomegalovirus (cCMV) infection with cochlear implantation (CI) before the age of 3 years. In a cohort of 15 children with profound hearing loss, speech perception, production and intelligibility were assessed before and 3 years after CI; variables impacting results were explored. Post-CI, median word recognition was 74% on closed-list and 48% on open-list testing; 80% of children acquired speech production; and 60% were intelligible for all listeners or listeners attentive to lip-reading and/or aware of the child's hearing loss. Univariate analysis identified 3 variables (mean post-CI hearing threshold, bilateral vestibular areflexia, and brain abnormality on MRI) with significant negative impact on the development of speech perception, production and intelligibility. CI showed positive impact on hearing and speech in children with post-cCMV profound hearing loss. Our study demonstrated the key role of maximizing post-CI hearing gain. A few children had insufficient progress, especially in case of bilateral vestibular areflexia and/or brain abnormality on MRI. This led us to suggest that balance rehabilitation and speech therapy should be intensified in such cases. Copyright © 2015 Elsevier Masson SAS. All rights reserved.
Takahashi, Shohei; Ohya, Jun
Since long time ago, speech recognition has been researched, though it does not work well in noisy places such as in the car or in the train. In addition, people with hearing-impaired or difficulties in hearing cannot receive benefits from speech recognition. To recognize the speech automatically, visual information is also important. People understand speeches from not only audio information, but also visual information such as temporal changes in the lip shape. A vision based speech recognition method could work well in noisy places, and could be useful also for people with hearing disabilities. In this paper, we propose an automatic lip-reading method for recognizing the speech by using multimodal visual information without using any audio information such as speech recognition. First, the ASM (Active Shape Model) is used to track and detect the face and lip in a video sequence. Second, the shape, optical flow and spatial frequencies of the lip features are extracted from the lip detected by ASM. Next, the extracted multimodal features are ordered chronologically so that Support Vector Machine is performed in order to learn and classify the spoken words. Experiments for classifying several words show promising results of this proposed method.
The Cochlear implant or 'Bionic ear' is a device that enables people who do not get sufficient benefit from a hearing aid to communicate with the hearing world. The Cochlear implant is not an amplifier, but a device that electrically stimulates the auditory nerve in a way that crudely mimics normal hearing, thus providing a hearing percept. Many recipients are able to understand running speech without the help of lipreading. Cochlear implants have reached a stage of maturity where there are now 170 000 recipients implanted worldwide. The commercial development of these devices has occurred over the last 30 years. This development has been multidisciplinary, including audiologists, engineers, both mechanical and electrical, histologists, materials scientists, physiologists, surgeons and speech pathologists. This paper will trace the development of the device we have today, from the engineering perspective. The special challenges of designing an active device that will work in the human body for a lifetime will be outlined. These challenges include biocompatibility, extreme reliability, safety, patient fitting and surgical issues. It is emphasized that the successful development of a neural prosthesis requires the partnership of academia and industry.
Stekelenburg, Jeroen J; Maes, Jan Pieter; Van Gool, Arthur R; Sitskoorn, Margriet; Vroomen, Jean
In many natural audiovisual events (e.g., the sight of a face articulating the syllable /ba/), the visual signal precedes the sound and thus allows observers to predict the onset and the content of the sound. In healthy adults, the N1 component of the event-related brain potential (ERP), reflecting neural activity associated with basic sound processing, is suppressed if a sound is accompanied by a video that reliably predicts sound onset. If the sound does not match the content of the video (e.g., hearing /ba/ while lipreading /fu/), the later occurring P2 component is affected. Here, we examined whether these visual information sources affect auditory processing in patients with schizophrenia. The electroencephalography (EEG) was recorded in 18 patients with schizophrenia and compared with that of 18 healthy volunteers. As stimuli we used video recordings of natural actions in which visual information preceded and predicted the onset of the sound that was either congruent or incongruent with the video. For the healthy control group, visual information reduced the auditory-evoked N1 if compared to a sound-only condition, and stimulus-congruency affected the P2. This reduction in N1 was absent in patients with schizophrenia, and the congruency effect on the P2 was diminished. Distributed source estimations revealed deficits in the network subserving audiovisual integration in patients with schizophrenia. The results show a deficit in multisensory processing in patients with schizophrenia and suggest that multisensory integration dysfunction may be an important and, to date, under-researched aspect of schizophrenia. Copyright © 2013. Published by Elsevier B.V.
Ujiie, Yuta; Asai, Tomohisa; Wakabayashi, Akio
The McGurk effect, which denotes the influence of visual information on audiovisual speech perception, is less frequently observed in individuals with autism spectrum disorder (ASD) compared to those without it; the reason for this remains unclear. Several studies have suggested that facial configuration context might play a role in this difference. More specifically, people with ASD show a local processing bias for faces-that is, they process global face information to a lesser extent. This study examined the role of facial configuration context in the McGurk effect in 46 healthy students. Adopting an analogue approach using the Autism-Spectrum Quotient (AQ), we sought to determine whether this facial configuration context is crucial to previously observed reductions in the McGurk effect in people with ASD. Lip-reading and audiovisual syllable identification tasks were assessed via presentation of upright normal, inverted normal, upright Thatcher-type, and inverted Thatcher-type faces. When the Thatcher-type face was presented, perceivers were found to be sensitive to the misoriented facial characteristics, causing them to perceive a weaker McGurk effect than when the normal face was presented (this is known as the McThatcher effect). Additionally, the McGurk effect was weaker in individuals with high AQ scores than in those with low AQ scores in the incongruent audiovisual condition, regardless of their ability to read lips or process facial configuration contexts. Our findings, therefore, do not support the assumption that individuals with ASD show a weaker McGurk effect due to a difficulty in processing facial configuration context.
Full Text Available The auditory midbrain implant (AMI consists of a single shank array (20 sites for stimulation along the tonotopic axis of the central nucleus of the inferior colliculus (ICC and has been safely implanted in deaf patients who cannot benefit from a cochlear implant (CI. The AMI improves lip-reading abilities and environmental awareness in the implanted patients. However, the AMI cannot achieve the high levels of speech perception possible with the CI. It appears the AMI can transmit sufficient spectral cues but with limited temporal cues required for speech understanding. Currently, the AMI uses a CI-based strategy, which was originally designed to stimulate each frequency region along the cochlea with amplitude-modulated pulse trains matching the envelope of the bandpass-filtered sound components. However, it is unclear if this type of stimulation with only a single site within each frequency lamina of the ICC can elicit sufficient temporal cues for speech perception. At least speech understanding in quiet is still possible with envelope cues as low as 50 Hz. Therefore, we investigated how ICC neurons follow the bandpass-filtered envelope structure of natural stimuli in ketamine-anesthetized guinea pigs. We identified a subset of ICC neurons that could closely follow the envelope structure (up to ~100 Hz of a diverse set of species-specific calls, which was revealed by using a peripheral ear model to estimate the true bandpass-filtered envelopes observed by the brain. Although previous studies have suggested a complex neural transformation from the auditory nerve to the ICC, our data suggest that the brain maintains a robust temporal code in a subset of ICC neurons matching the envelope structure of natural stimuli. Clinically, these findings suggest that a CI-based strategy may still be effective for the AMI if the appropriate neurons are entrained to the envelope of the acoustic stimulus and can transmit sufficient temporal cues to higher
Gardiner, J M
In short-term memory, the tendency for the last few (recency) items from a verbal sequence to be increasingly well recalled is more pronounced if the items are spoken rather than written. This auditory recency advantage has been quite generally attributed to echoic memory, on the grounds that in the auditory, but not the visual, mode, sensory memory persists just long enough to supplement recall of the most recent items. This view no longer seems tenable. There are now several studies showing that an auditory recency advantage occurs not only in long-term memory, but under conditions in which it cannot possibly be attributed to echoic memory. Also, similar recency phenomena have been discovered in short-term memory when the items are lip-read, or presented in sign-language, rather than heard. This article provides a partial review of these studies, taking a broad theoretical position from which these particular recency phenomena are approached as possible exceptions, to a general theory according to which recency is due to temporal distinctiveness. Much of the fresh evidence reviewed is of a somewhat preliminary nature and it is as yet unexplained by any theory of memory. The need for additional, converging experimental tests is obvious; so too is the need for further theoretical development. Several alternative theoretical resolutions are mentioned, including the possibility that enhanced recency may reflect movement, from sequentially occurring stimulus features, and the suggestion that it may be associated with the primary linguistic mode of the individuals concerned. But special weight is attached to the conjecture that all these recency phenomena might be accounted for in terms of distinctiveness or discriminability. On this view, the enhanced recency effects observed with certain modes, including the auditory mode, are attributed to items possessing greater temporal discriminability in those modes.
Yahata, Izumi; Kanno, Akitake; Hidaka, Hiroshi; Sakamoto, Shuichi; Nakasato, Nobukazu; Kawashima, Ryuta; Katori, Yukio
The effects of visual speech (the moving image of the speaker’s face uttering speech sound) on early auditory evoked fields (AEFs) were examined using a helmet-shaped magnetoencephalography system in 12 healthy volunteers (9 males, mean age 35.5 years). AEFs (N100m) in response to the monosyllabic sound /be/ were recorded and analyzed under three different visual stimulus conditions, the moving image of the same speaker’s face uttering /be/ (congruent visual stimuli) or uttering /ge/ (incongruent visual stimuli), and visual noise (still image processed from speaker’s face using a strong Gaussian filter: control condition). On average, latency of N100m was significantly shortened in the bilateral hemispheres for both congruent and incongruent auditory/visual (A/V) stimuli, compared to the control A/V condition. However, the degree of N100m shortening was not significantly different between the congruent and incongruent A/V conditions, despite the significant differences in psychophysical responses between these two A/V conditions. Moreover, analysis of the magnitudes of these visual effects on AEFs in individuals showed that the lip-reading effects on AEFs tended to be well correlated between the two different audio-visual conditions (congruent vs. incongruent visual stimuli) in the bilateral hemispheres but were not significantly correlated between right and left hemisphere. On the other hand, no significant correlation was observed between the magnitudes of visual speech effects and psychophysical responses. These results may indicate that the auditory-visual interaction observed on the N100m is a fundamental process which does not depend on the congruency of the visual information. PMID:28141836
Stropahl, Maren; Debener, Stefan
There is clear evidence for cross-modal cortical reorganization in the auditory system of post-lingually deafened cochlear implant (CI) users. A recent report suggests that moderate sensori-neural hearing loss is already sufficient to initiate corresponding cortical changes. To what extend these changes are deprivation-induced or related to sensory recovery is still debated. Moreover, the influence of cross-modal reorganization on CI benefit is also still unclear. While reorganization during deafness may impede speech recovery, reorganization also has beneficial influences on face recognition and lip-reading. As CI users were observed to show differences in multisensory integration, the question arises if cross-modal reorganization is related to audio-visual integration skills. The current electroencephalography study investigated cortical reorganization in experienced post-lingually deafened CI users ( n = 18), untreated mild to moderately hearing impaired individuals (n = 18) and normal hearing controls ( n = 17). Cross-modal activation of the auditory cortex by means of EEG source localization in response to human faces and audio-visual integration, quantified with the McGurk illusion, were measured. CI users revealed stronger cross-modal activations compared to age-matched normal hearing individuals. Furthermore, CI users showed a relationship between cross-modal activation and audio-visual integration strength. This may further support a beneficial relationship between cross-modal activation and daily-life communication skills that may not be fully captured by laboratory-based speech perception tests. Interestingly, hearing impaired individuals showed behavioral and neurophysiological results that were numerically between the other two groups, and they showed a moderate relationship between cross-modal activation and the degree of hearing loss. This further supports the notion that auditory deprivation evokes a reorganization of the auditory system
Full Text Available There is clear evidence for cross-modal cortical reorganization in the auditory system of post-lingually deafened cochlear implant (CI users. A recent report suggests that moderate sensori-neural hearing loss is already sufficient to initiate corresponding cortical changes. To what extend these changes are deprivation-induced or related to sensory recovery is still debated. Moreover, the influence of cross-modal reorganization on CI benefit is also still unclear. While reorganization during deafness may impede speech recovery, reorganization also has beneficial influences on face recognition and lip-reading. As CI users were observed to show differences in multisensory integration, the question arises if cross-modal reorganization is related to audio-visual integration skills. The current electroencephalography study investigated cortical reorganization in experienced post-lingually deafened CI users (n = 18, untreated mild to moderately hearing impaired individuals (n = 18 and normal hearing controls (n = 17. Cross-modal activation of the auditory cortex by means of EEG source localization in response to human faces and audio-visual integration, quantified with the McGurk illusion, were measured. CI users revealed stronger cross-modal activations compared to age-matched normal hearing individuals. Furthermore, CI users showed a relationship between cross-modal activation and audio-visual integration strength. This may further support a beneficial relationship between cross-modal activation and daily-life communication skills that may not be fully captured by laboratory-based speech perception tests. Interestingly, hearing impaired individuals showed behavioral and neurophysiological results that were numerically between the other two groups, and they showed a moderate relationship between cross-modal activation and the degree of hearing loss. This further supports the notion that auditory deprivation evokes a reorganization of the
Full Text Available The aim of the text is to point out the developmental tendencies in the research methodology of special education and rehabilitation worldwide and in our country and to emphasize the importance of methodological training of students in special education and rehabilitation at the Faculty of Philosophy in Skopje.The achieved scientific knowledge through research is the fundamental pre-condition for development of special education and rehabilitation theory and practice. The results of the scientific work sometimes cause small, insignificant changes, but, at times, they make radical changes. Thank to the scientific researches and knowledge, certain prejudices were rejected. For example, in the sixth decade of the last century there was a strong prejudice that mentally retarded children should be segregated from the society as aggressive and unfriendly ones or the deaf children should not learn sign language because they would not be motivated to learn lip-reading and would hardly adapt. Piaget and his colleagues from Geneva institute were the pioneers in researching this field and they imposed their belief that handicapped children were not handicapped in each field and they had potentials that could be developed and improved by systematic and organized work. It is important to initiate further researches in the field of special education and rehabilitation, as well as a critical analysis of realized researches. Further development of the scientific research in special education and rehabilitation should be a base for education policy on people with disabilities and development of institutional and non-institutional treatment of this population.
Kim, Heejung; Hahm, Jarang; Lee, Hyekyoung; Kang, Eunjoo; Kang, Hyejin; Lee, Dong Soo
The human brain naturally integrates audiovisual information to improve speech perception. However, in noisy environments, understanding speech is difficult and may require much effort. Although the brain network is supposed to be engaged in speech perception, it is unclear how speech-related brain regions are connected during natural bimodal audiovisual or unimodal speech perception with counterpart irrelevant noise. To investigate the topological changes of speech-related brain networks at all possible thresholds, we used a persistent homological framework through hierarchical clustering, such as single linkage distance, to analyze the connected component of the functional network during speech perception using functional magnetic resonance imaging. For speech perception, bimodal (audio-visual speech cue) or unimodal speech cues with counterpart irrelevant noise (auditory white-noise or visual gum-chewing) were delivered to 15 subjects. In terms of positive relationship, similar connected components were observed in bimodal and unimodal speech conditions during filtration. However, during speech perception by congruent audiovisual stimuli, the tighter couplings of left anterior temporal gyrus-anterior insula component and right premotor-visual components were observed than auditory or visual speech cue conditions, respectively. Interestingly, visual speech is perceived under white noise by tight negative coupling in the left inferior frontal region-right anterior cingulate, left anterior insula, and bilateral visual regions, including right middle temporal gyrus, right fusiform components. In conclusion, the speech brain network is tightly positively or negatively connected, and can reflect efficient or effortful processes during natural audiovisual integration or lip-reading, respectively, in speech perception.
The auditory cortex communicates with the frontal lobe via the middle temporal gyrus (auditory ventral stream; AVS) or the inferior parietal lobule (auditory dorsal stream; ADS). Whereas the AVS is ascribed only with sound recognition, the ADS is ascribed with sound localization, voice detection, prosodic perception/production, lip-speech integration, phoneme discrimination, articulation, repetition, phonological long-term memory and working memory. Previously, I interpreted the juxtaposition of sound localization, voice detection, audio-visual integration and prosodic analysis, as evidence that the behavioral precursor to human speech is the exchange of contact calls in non-human primates. Herein, I interpret the remaining ADS functions as evidence of additional stages in language evolution. According to this model, the role of the ADS in vocal control enabled early Homo (Hominans) to name objects using monosyllabic calls, and allowed children to learn their parents' calls by imitating their lip movements. Initially, the calls were forgotten quickly but gradually were remembered for longer periods. Once the representations of the calls became permanent, mimicry was limited to infancy, and older individuals encoded in the ADS a lexicon for the names of objects (phonological lexicon). Consequently, sound recognition in the AVS was sufficient for activating the phonological representations in the ADS and mimicry became independent of lip-reading. Later, by developing inhibitory connections between acoustic-syllabic representations in the AVS and phonological representations of subsequent syllables in the ADS, Hominans became capable of concatenating the monosyllabic calls for repeating polysyllabic words (i.e., developed working memory). Finally, due to strengthening of connections between phonological representations in the ADS, Hominans became capable of encoding several syllables as a single representation (chunking). Consequently, Hominans began vocalizing and
Wu, Chao; Zheng, Yingjun; Li, Juanhua; Zhang, Bei; Li, Ruikeng; Wu, Haibo; She, Shenglin; Liu, Sha; Peng, Hongjun; Ning, Yuping; Li, Liang
Under a "cocktail-party" listening condition with multiple-people talking, compared to healthy people, people with schizophrenia benefit less from the use of visual-speech (lipreading) priming (VSP) cues to improve speech recognition. The neural mechanisms underlying the unmasking effect of VSP remain unknown. This study investigated the brain substrates underlying the unmasking effect of VSP in healthy listeners and the schizophrenia-induced changes in the brain substrates. Using functional magnetic resonance imaging, brain activation and functional connectivity for the contrasts of the VSP listening condition vs. the visual non-speech priming (VNSP) condition were examined in 16 healthy listeners (27.4 ± 8.6 years old, 9 females and 7 males) and 22 listeners with schizophrenia (29.0 ± 8.1 years old, 8 females and 14 males). The results showed that in healthy listeners, but not listeners with schizophrenia, the VSP-induced activation (against the VNSP condition) of the left posterior inferior temporal gyrus (pITG) was significantly correlated with the VSP-induced improvement in target-speech recognition against speech masking. Compared to healthy listeners, listeners with schizophrenia showed significantly lower VSP-induced activation of the left pITG and reduced functional connectivity of the left pITG with the bilateral Rolandic operculum, bilateral STG, and left insular. Thus, the left pITG and its functional connectivity may be the brain substrates related to the unmasking effect of VSP, assumedly through enhancing both the processing of target visual-speech signals and the inhibition of masking-speech signals. In people with schizophrenia, the reduced unmasking effect of VSP on speech recognition may be associated with a schizophrenia-related reduction of VSP-induced activation and functional connectivity of the left pITG.
Full Text Available In blind people, the visual channel cannot assist face-to-face communication via lipreading or visual prosody. Nevertheless, the visual system may enhance the evaluation of auditory information due to its cross-links to (1 the auditory system, (2 supramodal representations, and (3 frontal action-related areas. Apart from feedback or top-down support of, for example, the processing of spatial or phonological representations, experimental data have shown that the visual system can impact auditory perception at more basic computational stages such as temporal resolution. For example, blind as compared to sighted subjects are more resistant against backward masking, and this ability appears to be associated with activity in visual cortex. Regarding the comprehension of continuous speech, blind subjects can learn to use accelerated text-to-speech systems for "reading" texts at ultra-fast speaking rates (> 16 syllables/s, exceeding by far the normal range of 6 syllables/s. An fMRI study has shown that this ability, among other brain regions, significantly covaries with BOLD responses in bilateral pulvinar, right visual cortex, and left supplementary motor area. Furthermore, magnetoencephalographic (MEG measurements revealed a particular component in right occipital cortex phase-locked to the syllable onsets of accelerated speech. In sighted people, the "bottleneck" for understanding time-compressed speech seems related to a demand for buffering phonological material and is, presumably, linked to frontal brain structures. On the other hand, the neurophysiological correlates of functions overcoming this bottleneck, seem to depend upon early visual cortex activity. The present Hypothesis and Theory paper outlines a model that aims at binding these data together, based on early cross-modal pathways that are already known from various audiovisual experiments considering cross-modal adjustments in space, time, and object recognition.
Full Text Available Children with hearing impairment (HI show disorders in syntax and morphology. The question is whether and how these disorders are connected to problems in the auditory domain. The aim of this paper is to examine whether moderate to severe hearing loss at a young age affects the ability of German-speaking orally trained children to understand and produce sentences. We focused on sentence structures that are derived by syntactic movement, which have been identified as a sensitive marker for syntactic impairment in other languages and in other populations with syntactic impairment. Therefore, our study tested subject and object relatives, subject and object Wh-questions, passive sentences, and topicalized sentences, as well as sentences with verb movement to second sentential position. We tested 19 HI children aged 9;5–13;6 and compared their performance with hearing children using comprehension tasks of sentence-picture matching and sentence repetition tasks. For the comprehension tasks, we included HI children who passed an auditory discrimination task; for the sentence repetition tasks, we selected children who passed a screening task of simple sentence repetition without lip-reading; this made sure that they could perceive the words in the tests, so that we could test their grammatical abilities. The results clearly showed that most of the participants with HI had considerable difficulties in the comprehension and repetition of sentences with syntactic movement: they had significant difficulties understanding object relatives, Wh-questions, and topicalized sentences, and in the repetition of object who and which questions and subject relatives, as well as in sentences with verb movement to second sentential position. Repetition of passives was only problematic for some children. Object relatives were still difficult at this age for both HI and hearing children. An additional important outcome of the study is that not all sentence structures
Full Text Available What does the visual cortex of the blind do during Braille reading? This process involves converting simple tactile information into meaningful patterns that have lexical and semantic properties. The perceptual processing of Braille might be mediated by the somatosensory system, whereas visual letter identity is accomplished within the visual system in sighted people. Recent advances in functional neuroimaging techniques have enabled exploration of the neural substrates of Braille reading (Sadato et al. 1996, 1998, 2002, Cohen et al. 1997, 1999. The primary visual cortex of early-onset blind subjects is functionally relevant to Braille reading, suggesting that the brain shows remarkable plasticity that potentially permits the additional processing of tactile information in the visual cortical areas. Similar cross-modal plasticity is observed by the auditory deprivation: Sign language activates the auditory cortex of deaf subjects (Neville et al. 1999, Nishimura et al. 1999, Sadato et al. 2004. Cross-modal activation can be seen in the sighted and hearing subjects. For example, the tactile shape discrimination of two dimensional (2D shapes (Mah-Jong tiles activated the visual cortex by expert players (Saito et al. 2006, and the lip-reading (visual phonetics (Sadato et al. 2004 or key touch reading by pianists (Hasegawa et al. 2004 activates the auditory cortex of hearing subjects. Thus the cross-modal plasticity by sensory deprivation and cross-modal integration through the learning may share their neural substrates. To clarify the distribution of the neural substrates and their dynamics during cross-modal association learning within several hours, we conducted audio-visual paired association learning of delayed-matching-to-sample type tasks (Tanabe et al. 2005. Each trial consisted of the successive presentation of a pair of stimuli. Subjects had to find pre-defined audio-visual or visuo-visual pairs in a trial and error manner with feedback in
Davidson, Lisa S; Geers, Ann E; Blamey, Peter J; Tobey, Emily A; Brenner, Christine A
Picture Vocabulary Test achieved asymptote at similar ages, around 10 to 11 yrs. On average, children receiving CIs between 2 and 5 yrs of age exhibited significant improvement on tests of speech perception, lipreading, speech production, and language skills measured between primary grades and adolescence. Evidence suggests that improvement in speech perception scores with age reflects increased spoken language level up to a language age of about 10 yrs. Speech perception performance significantly decreased with softer stimulus intensity level and with introduction of background noise. Upgrades to newer speech processing strategies and greater use of frequency-modulated systems may be beneficial for ameliorating performance under these demanding listening conditions.
Full Text Available Magdalena Lachowska, Agnieszka Pastuszka, Paulina Glinka, Kazimierz Niemczyk Department of Otolaryngology, Hearing Implant Center, Medical University of Warsaw, Warsaw, Poland Purpose: To assess the benefits of cochlear implantation in the elderly. Patients and methods: A retrospective analysis of 31 postlingually deafened elderly (≥60 years of age with unilateral cochlear implants was conducted. Audiological testing included preoperative and postoperative pure-tone audiometry and a monosyllabic word recognition test presented from recorded material in free field. Speech perception tests included Ling's six sound test (sound detection, discrimination, and identification, syllable discrimination, and monosyllabic and multisyllabic word recognition (open set without lip-reading. Everyday life benefits from cochlear implantation were also evaluated. Results: The mean age at the time of cochlear implantation was 72.4 years old. The mean post-implantation follow-up time was 2.34 years. All patients significantly improved their audiological and speech understanding performances. The preoperative mean pure-tone average threshold for 500 Hz, 1,000 Hz, 2,000 Hz, and 4,000 Hz was 110.17 dB HL. Before cochlear implantation, all patients scored 0% on the monosyllabic word recognition test in free field at 70 dB SPL intensity level. The postoperative pure-tone average was 37.14 dB HL (the best mean threshold was 17.50 dB HL, the worst was 58.75 dB HL. After the surgery, mean monosyllabic word recognition reached 47.25%. Speech perception tests showed statistically significant improvement in speech recognition. Conclusion: The results of this study showed that cochlear implantation is indeed a successful treatment for improving speech recognition and offers a great help in everyday life to deafened elderly patients. Therefore, they can be good candidates for cochlear implantation and their age alone should not be a relevant or excluding factor when choosing