This book reviews research towards perceptual quality dimensions of synthetic speech, compares these findings with the state of the art, and derives a set of five universal perceptual quality dimensions for TTS signals. They are: (i) naturalness of voice, (ii) prosodic quality, (iii) fluency and intelligibility, (iv) absence of disturbances, and (v) calmness. Moreover, a test protocol for the efficient indentification of those dimensions in a listening test is introduced. Furthermore, several factors influencing these dimensions are examined. In addition, different techniques for the instrumental quality assessment of TTS signals are introduced, reviewed and tested. Finally, the requirements for the integration of an instrumental quality measure into a concatenative TTS system are examined.
Richardson, Sunil; Seelan, Nikkie S; Selvaraj, Dhivakar; Khandeparker, Rakshit V; Gnanamony, Sangeetha
To assess speech outcomes after anterior maxillary distraction (AMD) in patients with cleft-related maxillary hypoplasia. Fifty-eight patients at least 10 years old with cleft-related maxillary hypoplasia were included in this study irrespective of gender, type of cleft lip and palate, and amount of required advancement. AMD was carried out in all patients using a tooth-borne palatal distractor by a single oral and maxillofacial surgeon. Perceptual speech assessment was performed by 2 speech language pathologists preoperatively, before placement of the distractor device, and 6 months postoperatively using the scoring system of Perkins et al (Plast Reconstr Surg 116:72, 2005); the system evaluates velopharyngeal insufficiency (VPI), resonance, nasal air emission, articulation errors, and intelligibility. The data obtained were tabulated and subjected to statistical analysis using Wilcoxon signed rank test. A P value less than .05 was considered significant. Eight patients were lost to follow-up. At 6-month follow-up, improvements of 62% (n = 31), 64% (n = 32), 50% (n = 25), 68% (n = 34), and 70% (n = 35) in VPI, resonance, nasal air emission, articulation, and intelligibility, respectively, were observed, with worsening of all parameters in 1 patient (2%). The results for all tested parameters were highly significant (P ≤ .001). AMD offers a substantial improvement in speech for all 5 parameters of perceptual speech assessment. Copyright © 2016 The American Association of Oral and Maxillofacial Surgeons. Published by Elsevier Inc. All rights reserved.
Full Text Available BackgroundMaxillary hypoplasia refers to a deficiency in the growth of the maxilla commonly seen in patients with a repaired cleft palate. Those who develop maxillary hypoplasia can be offered a repositioning of the maxilla to a functional and esthetic position. Velopharyngeal dysfunction is one of the important problems affecting speech after maxillary advancement surgery. The aim of this study was to investigate the impact of maxillary advancement on repaired cleft palate patients without preoperative deterioration in speech compared with non-cleft palate patients.MethodsEighteen patients underwent Le Fort I osteotomy between 2005 and 2011. One patient was excluded due to preoperative deterioration in speech. Eight repaired cleft palate patients belonged to group A, and 9 non-cleft palate patients belonged to group B. Speech assessments were performed preoperatively and postoperatively by using a speech screening protocol that consisted of a list of single words designed by Ok-Ran Jung. Wilcoxon signed rank test was used to determine if there were significant differences between the preoperative and postoperative outcomes in each group A and B. And Mann-Whitney U test was used to determine if there were significant differences in the change of score between groups A and B.ResultsNo patients had any noticeable change in speech production on perceptual assessment after maxillary advancement in our study. Furthermore, there were no significant differences between groups A and B.ConclusionsRepaired cleft palate patients without preoperative velopharyngeal dysfunction would not have greater risk of deterioration of velopharyngeal function after maxillary advancement compared to non-cleft palate patients.
Kim, Seok-Kwun; Kim, Ju-Chan; Moon, Ju-Bong; Lee, Keun-Cheol
Maxillary hypoplasia refers to a deficiency in the growth of the maxilla commonly seen in patients with a repaired cleft palate. Those who develop maxillary hypoplasia can be offered a repositioning of the maxilla to a functional and esthetic position. Velopharyngeal dysfunction is one of the important problems affecting speech after maxillary advancement surgery. The aim of this study was to investigate the impact of maxillary advancement on repaired cleft palate patients without preoperative deterioration in speech compared with non-cleft palate patients. Eighteen patients underwent Le Fort I osteotomy between 2005 and 2011. One patient was excluded due to preoperative deterioration in speech. Eight repaired cleft palate patients belonged to group A, and 9 non-cleft palate patients belonged to group B. Speech assessments were performed preoperatively and postoperatively by using a speech screening protocol that consisted of a list of single words designed by Ok-Ran Jung. Wilcoxon signed rank test was used to determine if there were significant differences between the preoperative and postoperative outcomes in each group A and B. And Mann-Whitney U test was used to determine if there were significant differences in the change of score between groups A and B. No patients had any noticeable change in speech production on perceptual assessment after maxillary advancement in our study. Furthermore, there were no significant differences between groups A and B. Repaired cleft palate patients without preoperative velopharyngeal dysfunction would not have greater risk of deterioration of velopharyngeal function after maxillary advancement compared to non-cleft palate patients.
Lohmander, Anette; Olsson, Maria
This review of 88 articles in three international journals was undertaken for the purpose of investigating the methodology for perceptual speech assessment in patients with cleft palate. The articles were published between 1980 and 2000 in the Cleft Palate-Craniofacial Journal, the International Journal of Language and Communication Disorders, and Folia Phoniatrica et Logopaedica. The majority of articles (76) were published in the Cleft Palate-Craniofacial Journal, with an increase in articles during the 1990s and 2000. Information about measures or variables was clearly given in all articles. However, the review raises several major concerns regarding method for collection and documentation of data and method for measurement. The most distressing findings were the use of a cross-sectional design in studies of few patients with large age ranges and different types of clefts, the use of highly variable speech samples, and the lack of information about listeners and on reliability. It is hoped that ongoing national and international collaborative efforts to standardize procedures for collection and analysis of perceptual data will help to eliminate such concerns and thus make comparison of published results possible in the future.
Lalonde, Kaylah; Holt, Rachael Frush
This study used the auditory evaluation framework [Erber (1982). Auditory Training (Alexander Graham Bell Association, Washington, DC)] to characterize the influence of visual speech on audiovisual (AV) speech perception in adults and children at multiple levels of perceptual processing. Six- to eight-year-old children and adults completed auditory and AV speech perception tasks at three levels of perceptual processing (detection, discrimination, and recognition). The tasks differed in the level of perceptual processing required to complete them. Adults and children demonstrated visual speech influence at all levels of perceptual processing. Whereas children demonstrated the same visual speech influence at each level of perceptual processing, adults demonstrated greater visual speech influence on tasks requiring higher levels of perceptual processing. These results support previous research demonstrating multiple mechanisms of AV speech processing (general perceptual and speech-specific mechanisms) with independent maturational time courses. The results suggest that adults rely on both general perceptual mechanisms that apply to all levels of perceptual processing and speech-specific mechanisms that apply when making phonetic decisions and/or accessing the lexicon. Six- to eight-year-old children seem to rely only on general perceptual mechanisms across levels. As expected, developmental differences in AV benefit on this and other recognition tasks likely reflect immature speech-specific mechanisms and phonetic processing in children.
Lametti, D.R.; Oostwoud Wijdenes, L.; Bonaiuto, J.; Bestmann, S.; Rothwell, J.C.
Neuroimaging studies suggest that the cerebellum might play a role in both speech perception and speech perceptual learning. However, it remains unclear what this role is: does the cerebellum directly contribute to the perceptual decision? Or does it contribute to the timing of perceptual decisions?
Pilling, Michael; Thomas, Sharon
Two experiments investigate the effectiveness of audiovisual (AV) speech cues (cues derived from both seeing and hearing a talker speak) in facilitating perceptual learning of spectrally distorted speech. Speech was distorted through an eight channel noise-vocoder which shifted the spectral envelope of the speech signal to simulate the properties…
Lalonde, Kaylah; Holt, Rachael Frush
This study used the auditory evaluation framework [Erber (1982). Auditory Training (Alexander Graham Bell Association, Washington, DC)] to characterize the influence of visual speech on audiovisual (AV) speech perception in adults and children at multiple levels of perceptual processing. Six- to eight-year-old children and adults completed auditory and AV speech perception tasks at three levels of perceptual processing (detection, discrimination, and recognition). The tasks differed in the le...
Vozňák, Miroslav; Rozhon, Jan
The paper deals with a speech quality monitoring tool which we have developed in accordance with PESQ (Perceptual Evaluation of Speech Quality) and is automatically running and calculating the MOS (Mean Opinion Score). Results are stored into database and used in a research project investigating how meteorological conditions influence the speech quality in a GSM network. The meteorological station, which is located in our university campus provides information about a temperature,...
Dromey, Christopher; Hunter, Elise; Nissen, Shawn L.
Purpose: This study used perceptual and acoustic measures to examine the time course of speech adaptation after the attachment of electromagnetic sensor coils to the tongue, lips, and jaw. Method: Twenty native English speakers read aloud stimulus sentences before the attachment of the sensors, immediately after attachment, and again 5, 10, 15,…
Berisha, Visar; Liss, Julie
Bandwidth extension of speech is used in the International Telecommunication Union G.729.1 standard in which the narrowband bitstream is combined with quantized high-band parameters. Although this system produces high-quality wideband speech, the additional bits used to represent the high band can be further reduced. In addition to the algorithm used in the G.729.1 standard, bandwidth extension methods based on spectrum prediction have also been proposed. Although these algorithms do not require additional bits, they perform poorly when the correlation between the low and the high band is weak. In this book, two wideband speech coding algorithms that rely on bandwidth extension are developed. The algorithms operate as wrappers around existing narrowband compression schemes. More specifically, in these algorithms, the low band is encoded using an existing toll-quality narrowband system, whereas the high band is generated using the proposed extension techniques. The first method relies only on transmitted high-...
Eisner, Frank; Melinger, Alissa; Weber, Andrea
The perception of speech sounds can be re-tuned through a mechanism of lexically driven perceptual learning after exposure to instances of atypical speech production. This study asked whether this re-tuning is sensitive to the position of the atypical sound within the word. We investigated perceptual learning using English voiced stop consonants, which are commonly devoiced in word-final position by Dutch learners of English. After exposure to a Dutch learner’s productions of devoiced stops in word-final position (but not in any other positions), British English (BE) listeners showed evidence of perceptual learning in a subsequent cross-modal priming task, where auditory primes with devoiced final stops (e.g., “seed”, pronounced [si:th]), facilitated recognition of visual targets with voiced final stops (e.g., SEED). In Experiment 1, this learning effect generalized to test pairs where the critical contrast was in word-initial position, e.g., auditory primes such as “town” facilitated recognition of visual targets like DOWN. Control listeners, who had not heard any stops by the speaker during exposure, showed no learning effects. The generalization to word-initial position did not occur when participants had also heard correctly voiced, word-initial stops during exposure (Experiment 2), and when the speaker was a native BE speaker who mimicked the word-final devoicing (Experiment 3). The readiness of the perceptual system to generalize a previously learned adjustment to other positions within the word thus appears to be modulated by distributional properties of the speech input, as well as by the perceived sociophonetic characteristics of the speaker. The results suggest that the transfer of pre-lexical perceptual adjustments that occur through lexically driven learning can be affected by a combination of acoustic, phonological, and sociophonetic factors. PMID:23554598
Nagaraj, Naveen K; Magimairaj, Beula M
The role of working memory (WM) capacity and lexical knowledge in perceptual restoration (PR) of missing speech was investigated using the interrupted speech perception paradigm. Speech identification ability, which indexed PR, was measured using low-context sentences periodically interrupted at 1.5 Hz. PR was measured for silent gated, low-frequency speech noise filled, and low-frequency fine-structure and envelope filled interrupted conditions. WM capacity was measured using verbal and visuospatial span tasks. Lexical knowledge was assessed using both receptive vocabulary and meaning from context tests. Results showed that PR was better for speech noise filled condition than other conditions tested. Both receptive vocabulary and verbal WM capacity explained unique variance in PR for the speech noise filled condition, but were unrelated to performance in the silent gated condition. It was only receptive vocabulary that uniquely predicted PR for fine-structure and envelope filled conditions. These findings suggest that the contribution of lexical knowledge and verbal WM during PR depends crucially on the information content that replaced the silent intervals. When perceptual continuity was partially restored by filler speech noise, both lexical knowledge and verbal WM capacity facilitated PR. Importantly, for fine-structure and envelope filled interrupted conditions, lexical knowledge was crucial for PR.
Mattys, Sven L; Palmer, Shekeila D
Performing a secondary task while listening to speech has a detrimental effect on speech processing, but the locus of the disruption within the speech system is poorly understood. Recent research has shown that cognitive load imposed by a concurrent visual task increases dependency on lexical knowledge during speech processing, but it does not affect lexical activation per se. This suggests that "lexical drift" under cognitive load occurs either as a post-lexical bias at the decisional level or as a secondary consequence of reduced perceptual sensitivity. This study aimed to adjudicate between these alternatives using a forced-choice task that required listeners to identify noise-degraded spoken words with or without the addition of a concurrent visual task. Adding cognitive load increased the likelihood that listeners would select a word acoustically similar to the target even though its frequency was lower than that of the target. Thus, there was no evidence that cognitive load led to a high-frequency response bias. Rather, cognitive load seems to disrupt sublexical encoding, possibly by impairing perceptual acuity at the auditory periphery.
Borrie, Stephanie A
This study investigated the influence of visual speech information on perceptual processing of neurologically degraded speech. Fifty listeners identified spastic dysarthric speech under both audio (A) and audiovisual (AV) conditions. Condition comparisons revealed that the addition of visual speech information enhanced processing of the neurologically degraded input in terms of (a) acuity (percent phonemes correct) of vowels and consonants and (b) recognition (percent words correct) of predictive and nonpredictive phrases. Listeners exploited stress-based segmentation strategies more readily in AV conditions, suggesting that the perceptual benefit associated with adding visual speech information to the auditory signal-the AV advantage-has both segmental and suprasegmental origins. Results also revealed that the magnitude of the AV advantage can be predicted, to some degree, by the extent to which an individual utilizes syllabic stress cues to inform word recognition in AV conditions. Findings inform the development of a listener-specific model of speech perception that applies to processing of dysarthric speech in everyday communication contexts.
Whitton, Jonathon P; Hancock, Kenneth E; Shannon, Jeffrey M; Polley, Daniel B
Sensory and motor skills can be improved with training, but learning is often restricted to practice stimuli. As an exception, training on closed-loop (CL) sensorimotor interfaces, such as action video games and musical instruments, can impart a broad spectrum of perceptual benefits. Here we ask whether computerized CL auditory training can enhance speech understanding in levels of background noise that approximate a crowded restaurant. Elderly hearing-impaired subjects trained for 8 weeks on a CL game that, like a musical instrument, challenged them to monitor subtle deviations between predicted and actual auditory feedback as they moved their fingertip through a virtual soundscape. We performed our study as a randomized, double-blind, placebo-controlled trial by training other subjects in an auditory working-memory (WM) task. Subjects in both groups improved at their respective auditory tasks and reported comparable expectations for improved speech processing, thereby controlling for placebo effects. Whereas speech intelligibility was unchanged after WM training, subjects in the CL training group could correctly identify 25% more words in spoken sentences or digit sequences presented in high levels of background noise. Numerically, CL audiomotor training provided more than three times the benefit of our subjects' hearing aids for speech processing in noisy listening conditions. Gains in speech intelligibility could be predicted from gameplay accuracy and baseline inhibitory control. However, benefits did not persist in the absence of continuing practice. These studies employ stringent clinical standards to demonstrate that perceptual learning on a computerized audio game can transfer to "real-world" communication challenges. Copyright © 2017 Elsevier Ltd. All rights reserved.
Foundations of Voice and Speech Quality Perception starts out with the fundamental question of: "How do listeners perceive voice and speech quality and how can these processes be modeled?" Any quantitative answers require measurements. This is natural for physical quantities but harder to imagine for perceptual measurands. This book approaches the problem by actually identifying major perceptual dimensions of voice and speech quality perception, defining units wherever possible and offering paradigms to position these dimensions into a structural skeleton of perceptual speech and voice quality. The emphasis is placed on voice and speech quality assessment of systems in artificial scenarios. Many scientific fields are involved. This book bridges the gap between two quite diverse fields, engineering and humanities, and establishes the new research area of Voice and Speech Quality Perception.
Richtsmeier, Peter T; Goffman, Lisa
What cognitive mechanisms account for the trajectory of speech sound development, in particular, gradually increasing accuracy during childhood? An intriguing potential contributor is statistical learning, a type of learning that has been studied frequently in infant perception but less often in child speech production. To assess the relevance of statistical learning to developing speech accuracy, we carried out a statistical learning experiment with four- and five-year-olds in which statistical learning was examined over one week. Children were familiarized with and tested on word-medial consonant sequences in novel words. There was only modest evidence for statistical learning, primarily in the first few productions of the first session. This initial learning effect nevertheless aligns with previous statistical learning research. Furthermore, the overall learning effect was similar to an estimate of weekly accuracy growth based on normative studies. The results implicate other important factors in speech sound development, particularly learning via production. Copyright © 2017 Elsevier Inc. All rights reserved.
Shriberg, Lawrence D.; Fourakis, Marios; Hall, Sheryl D.; Karlsson, Heather B.; Lohmeier, Heather L.; McSweeny, Jane L.; Potter, Nancy L.; Scheer-Cohen, Alison R.; Strand, Edythe A.; Tilkens, Christie M.; Wilson, David L.
A companion paper describes three extensions to a classification system for paediatric speech sound disorders termed the Speech Disorders Classification System (SDCS). The SDCS uses perceptual and acoustic data reduction methods to obtain information on a speaker's speech, prosody, and voice. The present paper provides reliability estimates for…
Xie, Xin; Theodore, Rachel M; Myers, Emily B
The literature on perceptual learning for speech shows that listeners use lexical information to disambiguate phonetically ambiguous speech sounds and that they maintain this new mapping for later recognition of ambiguous sounds for a given talker. Evidence for this kind of perceptual reorganization has focused on phonetic category boundary shifts. Here, we asked whether listeners adjust both category boundaries and internal category structure in rapid adaptation to foreign accents. We investigated the perceptual learning of Mandarin-accented productions of word-final voiced stops in English. After exposure to a Mandarin speaker's productions, native-English listeners' adaptation to the talker was tested in 3 ways: a cross-modal priming task to assess spoken word recognition (Experiment 1), a category identification task to assess shifts in the phonetic boundary (Experiment 2), and a goodness rating task to assess internal category structure (Experiment 3). Following exposure, both category boundary and internal category structure were adjusted; moreover, these prelexical changes facilitated subsequent word recognition. Together, the results demonstrate that listeners' sensitivity to acoustic-phonetic detail in the accented input promoted a dynamic, comprehensive reorganization of their perceptual response as a consequence of exposure to the accented input. We suggest that an examination of internal category structure is important for a complete account of the mechanisms of perceptual learning. (PsycINFO Database Record (c) 2016 APA, all rights reserved).
Tzeng, Christina Y; Alexander, Jessica E D; Sidaras, Sabrina K; Nygaard, Lynne C
Foreign-accented speech contains multiple sources of variation that listeners learn to accommodate. Extending previous findings showing that exposure to high-variation training facilitates perceptual learning of accented speech, the current study examines to what extent the structure of training materials affects learning. During training, native adult speakers of American English transcribed sentences spoken in English by native Spanish-speaking adults. In Experiment 1, training stimuli were blocked by speaker, sentence, or randomized with respect to speaker and sentence (Variable training). At test, listeners transcribed novel English sentences produced by unfamiliar Spanish-accented speakers. Listeners' transcription accuracy was highest in the Variable condition, suggesting that varying both speaker identity and sentence across training trials enabled listeners to generalize their learning to novel speakers and linguistic content. Experiment 2 assessed the extent to which ordering of training tokens by a single factor, speaker intelligibility, would facilitate speaker-independent accent learning, finding that listeners' test performance did not reliably differ from that in the no-training control condition. Overall, these results suggest that the structure of training exposure, specifically trial-to-trial variation on both speaker's voice and linguistic content, facilitates learning of the systematic properties of accented speech. The current findings suggest a crucial role of training structure in optimizing perceptual learning. Beyond characterizing the types of variation listeners encode in their representations of spoken utterances, theories of spoken language processing should incorporate the role of training structure in learning lawful variation in speech. (PsycINFO Database Record (c) 2016 APA, all rights reserved).
WEIJianqiang; DULimin; YANZhaoli; ZENGHui
In this paper, a Kalman filter-based speech enhancement algorithm with some improvements of previous work is presented. A new technique based on spectral subtraction is used for separation speech and noise characteristics from noisy speech and for the computation of speech and noise Autoregressive (AR) parameters. In order to obtain a Kalman filter output with high audible quality, a perceptual post-filter is placed at the output of the Kalman filter to smooth the enhanced speech spectra.Extensive experiments indicate that this newly proposed method works well.
Banks, Briony; Gowen, Emma; Munro, Kevin J; Adank, Patti
Perceptual adaptation allows humans to recognize different varieties of accented speech. We investigated whether perceptual adaptation to accented speech is facilitated if listeners can see a speaker's facial and mouth movements. In Study 1, participants listened to sentences in a novel accent and underwent a period of training with audiovisual or audio-only speech cues, presented in quiet or in background noise. A control group also underwent training with visual-only (speech-reading) cues. We observed no significant difference in perceptual adaptation between any of the groups. To address a number of remaining questions, we carried out a second study using a different accent, speaker and experimental design, in which participants listened to sentences in a non-native (Japanese) accent with audiovisual or audio-only cues, without separate training. Participants' eye gaze was recorded to verify that they looked at the speaker's face during audiovisual trials. Recognition accuracy was significantly better for audiovisual than for audio-only stimuli; however, no statistical difference in perceptual adaptation was observed between the two modalities. Furthermore, Bayesian analysis suggested that the data supported the null hypothesis. Our results suggest that although the availability of visual speech cues may be immediately beneficial for recognition of unfamiliar accented speech in noise, it does not improve perceptual adaptation.
Butts, Sydney C; Truong, Alan; Forde, Christina; Stefanov, Dimitre G; Marrinan, Eileen
To assess the ability of otolaryngology residents to rate the hypernasal resonance of patients with velopharyngeal dysfunction. We hypothesize that experience (postgraduate year [PGY] level) and training will result in improved ratings of speech samples. Prospective cohort study. Otolaryngology training programs at 2 academic medical centers. Thirty otolaryngology residents (PGY 1-5) were enrolled in the study. All residents rated 30 speech samples at 2 separate times. Half the residents completed a training module between the rating exercises, with the other half serving as a control group. Percentage agreement with the expert rating of each speech sample and intrarater reliability were calculated for each resident. Analysis of covariance was used to model accuracy at session 2. The median percentage agreement at session 1 was 53.3% for all residents. At the second session, the median scores were 53.3% for the control group and 60% for the training group, but this difference was not statistically significant. Intrarater reliability was moderate for both groups. Residents were more accurate in their ratings of normal and severely hypernasal speech. There was no correlation between rating accuracy and PGY level. Score at session 1 positively correlated with score at session 2. Perceptual training of otolaryngology residents has the potential to improve their ratings of hypernasal speech. Length of time in residency may not be best predictor of perceptual skill. Training modalities incorporating practice with hypernasal speech samples could improve rater skills and should be studied more extensively. © American Academy of Otolaryngology—Head and Neck Surgery Foundation 2016.
Thordis Marisa Neger
Full Text Available Within a few sentences, listeners learn to understand severely degraded speech such as noise-vocoded speech. However, individuals vary in the amount of such perceptual learning and it is unclear what underlies these differences. The present study investigates whether perceptual learning in speech relates to statistical learning, as sensitivity to probabilistic information may aid identification of relevant cues in novel speech input. If statistical learning and perceptual learning (partly draw on the same general mechanisms, then statistical learning in a non-auditory modality using non-linguistic sequences should predict adaptation to degraded speech.In the present study, 73 older adults (aged over 60 years and 60 younger adults (aged between 18 and 30 years performed a visual artificial grammar learning task and were presented with sixty meaningful noise-vocoded sentences in an auditory recall task. Within age groups, sentence recognition performance over exposure was analyzed as a function of statistical learning performance, and other variables that may predict learning (i.e., hearing, vocabulary, attention switching control, working memory and processing speed. Younger and older adults showed similar amounts of perceptual learning, but only younger adults showed significant statistical learning. In older adults, improvement in understanding noise-vocoded speech was constrained by age. In younger adults, amount of adaptation was associated with lexical knowledge and with statistical learning ability. Thus, individual differences in general cognitive abilities explain listeners' variability in adapting to noise-vocoded speech. Results suggest that perceptual and statistical learning share mechanisms of implicit regularity detection, but that the ability to detect statistical regularities is impaired in older adults if visual sequences are presented quickly.
Saija, Jefta D; Akyürek, Elkan G; Andringa, Tjeerd C; Başkent, Deniz
Cognitive skills, such as processing speed, memory functioning, and the ability to divide attention, are known to diminish with aging. The present study shows that, despite these changes, older adults can successfully compensate for degradations in speech perception. Critically, the older participants of this study were not pre-selected for high performance on cognitive tasks, but only screened for normal hearing. We measured the compensation for speech degradation using phonemic restoration, where intelligibility of degraded speech is enhanced using top-down repair mechanisms. Linguistic knowledge, Gestalt principles of perception, and expectations based on situational and linguistic context are used to effectively fill in the inaudible masked speech portions. A positive compensation effect was previously observed only with young normal hearing people, but not with older hearing-impaired populations, leaving the question whether the lack of compensation was due to aging or due to age-related hearing problems. Older participants in the present study showed poorer intelligibility of degraded speech than the younger group, as expected from previous reports of aging effects. However, in conditions that induce top-down restoration, a robust compensation was observed. Speech perception by the older group was enhanced, and the enhancement effect was similar to that observed with the younger group. This effect was even stronger with slowed-down speech, which gives more time for cognitive processing. Based on previous research, the likely explanations for these observations are that older adults can overcome age-related cognitive deterioration by relying on linguistic skills and vocabulary that they have accumulated over their lifetime. Alternatively, or simultaneously, they may use different cerebral activation patterns or exert more mental effort. This positive finding on top-down restoration skills by the older individuals suggests that new cognitive training methods
Saija, Jefta D; Akyürek, Elkan G; Andringa, Tjeerd C; Başkent, Deniz
Cognitive skills, such as processing speed, memory functioning, and the ability to divide attention, are known to diminish with aging. The present study shows that, despite these changes, older adults can successfully compensate for degradations in speech perception. Critically, the older
Hargrove, Patricia M.; Pittelko, Stephen; Fillingane, Evan; Rustman, Emily; Lund, Bonnie
The purpose of this research was to compare selected speech and paralinguistic skills of speakers with Williams syndrome (WS) and typically developing peers and to demonstrate the feasibility of providing preexisting databases to students to facilitate graduate research. In a series of three studies, conversational samples of 12 adolescents with…
Gilbert, Annie C; Boucher, Victor J; Jemel, Boutheina
We examined how perceptual chunks of varying size in utterances can influence immediate memory of heard items (monosyllabic words). Using behavioral measures and event-related potentials (N400) we evaluated the quality of the memory trace for targets taken from perceived temporal groups (TGs) of three and four items. Variations in the amplitude of the N400 showed a better memory trace for items presented in TGs of three compared to those in groups of four. Analyses of behavioral responses along with P300 components also revealed effects of chunk position in the utterance. This is the first study to measure the online effects of perceptual chunks on the memory trace of spoken items. Taken together, the N400 and P300 responses demonstrate that the perceptual chunking of speech facilitates information buffering and a processing on a chunk-by-chunk basis.
Bernstein, Lynne E; Auer, Edward T; Eberhardt, Silvio P; Jiang, Jintao
Speech perception under audiovisual (AV) conditions is well known to confer benefits to perception such as increased speed and accuracy. Here, we investigated how AV training might benefit or impede auditory perceptual learning of speech degraded by vocoding. In Experiments 1 and 3, participants learned paired associations between vocoded spoken nonsense words and nonsense pictures. In Experiment 1, paired-associates (PA) AV training of one group of participants was compared with audio-only (AO) training of another group. When tested under AO conditions, the AV-trained group was significantly more accurate than the AO-trained group. In addition, pre- and post-training AO forced-choice consonant identification with untrained nonsense words showed that AV-trained participants had learned significantly more than AO participants. The pattern of results pointed to their having learned at the level of the auditory phonetic features of the vocoded stimuli. Experiment 2, a no-training control with testing and re-testing on the AO consonant identification, showed that the controls were as accurate as the AO-trained participants in Experiment 1 but less accurate than the AV-trained participants. In Experiment 3, PA training alternated AV and AO conditions on a list-by-list basis within participants, and training was to criterion (92% correct). PA training with AO stimuli was reliably more effective than training with AV stimuli. We explain these discrepant results in terms of the so-called "reverse hierarchy theory" of perceptual learning and in terms of the diverse multisensory and unisensory processing resources available to speech perception. We propose that early AV speech integration can potentially impede auditory perceptual learning; but visual top-down access to relevant auditory features can promote auditory perceptual learning.
Yang, Wu-xia; Feng, Jie; Huang, Wan-ting; Zhang, Cheng-xiang; Nan, Yun
Congenital amusia is a musical disorder that mainly affects pitch perception. Among Mandarin speakers, some amusics also have difficulties in processing lexical tones (tone agnosics). To examine to what extent these perceptual deficits may be related to pitch production impairments in music and Mandarin speech, eight amusics, eight tone agnosics, and 12 age- and IQ-matched normal native Mandarin speakers were asked to imitate music note sequences and Mandarin words of comparable lengths. The results indicated that both the amusics and tone agnosics underperformed the controls on musical pitch production. However, tone agnosics performed no worse than the amusics, suggesting that lexical tone perception deficits may not aggravate musical pitch production difficulties. Moreover, these three groups were all able to imitate lexical tones with perfect intelligibility. Taken together, the current study shows that perceptual musical pitch and lexical tone deficits might coexist with musical pitch production difficulties. But at the same time these perceptual pitch deficits might not affect lexical tone production or the intelligibility of the speech words that were produced. The perception-production relationship for pitch among individuals with perceptual pitch deficits may be, therefore, domain-dependent. PMID:24474944
Full Text Available Congenital amusia is a musical disorder that mainly affects pitch perception. Among Mandarin speakers, some amusics also have difficulties in processing lexical tones (tone agnosics. To examine to what extent these perceptual deficits may be related to pitch production impairments in music and Mandarin speech, 8 amusics, 8 tone agnosics, and 12 age- and IQ-matched normal native Mandarin speakers were asked to imitate music note sequences and Mandarin words of comparable lengths. The results indicated that both the amusics and tone agnosics underperformed the controls on musical pitch production. However, tone agnosics performed no worse than the amusics, suggesting that lexical tone perception deficits may not aggravate musical pitch production difficulties. Moreover, these three groups were all able to imitate lexical tones with perfect intelligibility. Taken together, the current study shows that perceptual musical pitch and lexical tone deficits might coexist with musical pitch production difficulties. But at the same time these perceptual pitch deficits might not affect lexical tone production or the intelligibility of the speech words that were produced. The perception-production relationship for pitch among individuals with perceptual pitch deficits may be, therefore, domain-dependent.
Stilp, Christian E; Assgari, Ashley A
Speech perception is heavily influenced by surrounding sounds. When spectral properties differ between earlier (context) and later (target) sounds, this can produce spectral contrast effects (SCEs) that bias perception of later sounds. For example, when context sounds have more energy in low-F 1 frequency regions, listeners report more high-F 1 responses to a target vowel, and vice versa. SCEs have been reported using various approaches for a wide range of stimuli, but most often, large spectral peaks were added to the context to bias speech categorization. This obscures the lower limit of perceptual sensitivity to spectral properties of earlier sounds, i.e., when SCEs begin to bias speech categorization. Listeners categorized vowels (/ɪ/-/ɛ/, Experiment 1) or consonants (/d/-/g/, Experiment 2) following a context sentence with little spectral amplification (+1 to +4 dB) in frequency regions known to produce SCEs. In both experiments, +3 and +4 dB amplification in key frequency regions of the context produced SCEs, but lesser amplification was insufficient to bias performance. This establishes a lower limit of perceptual sensitivity where spectral differences across sounds can bias subsequent speech categorization. These results are consistent with proposed adaptation-based mechanisms that potentially underlie SCEs in auditory perception. Recent sounds can change what speech sounds we hear later. This can occur when the average frequency composition of earlier sounds differs from that of later sounds, biasing how they are perceived. These "spectral contrast effects" are widely observed when sounds' frequency compositions differ substantially. We reveal the lower limit of these effects, as +3 dB amplification of key frequency regions in earlier sounds was enough to bias categorization of the following vowel or consonant sound. Speech categorization being biased by very small spectral differences across sounds suggests that spectral contrast effects occur
Repp, B H; Mann, V A
The perceptual dependence of stop consonants on preceding fricatives [Mann and Repp, J. Acoust. Soc. Am. 69, 548--558 (1981)] was further investigated in two experiments employing both natural and synthetic speech. These experiments consistently replicated our original finding that listeners, report velar stops following [s]. In addition, our data confirmed earlier reports that natural fricative noises (excerpted from utterances of [st alpha], [sk alpha], [(formula: see text)k alpha]) contain cues to the following stop consonants; this was revealed in subjects' identifications of stops from isolated fricative noises and from stimuli consisting of these noises followed by synthetic CV portions drawn from a [t alpha]--[k alpha] continuum. However, these cues in the noise portion could not account for the contextual effect of fricative identity ([formula: see text] versus [sp) on stop perception (more "k" responses following [s]). Rather, this effect seems to be related to a coarticulatory influence of a preceding fricative on stop production; Subjects' responses to excised natural CV portions (with bursts and aspiration removed) were biased towards a relatively more forward place of stop articulation when the CVs had originally been preceded by [s]; and the identification of a preceding ambiguous fricative was biased in the direction of the original fricative context in which a given CV portion had been produced. These findings support an articulatory explanation for the effect of preceding fricatives on stop consonant perception.
Full Text Available It is commonly acknowledged that word or phoneme intelligibility is an important criterion in the assessment of the communication efficiency of a pathological speaker. People have therefore put a lot of effort in the design of perceptual intelligibility rating tests. These tests usually have the drawback that they employ unnatural speech material (e.g., nonsense words and that they cannot fully exclude errors due to listener bias. Therefore, there is a growing interest in the application of objective automatic speech recognition technology to automate the intelligibility assessment. Current research is headed towards the design of automated methods which can be shown to produce ratings that correspond well with those emerging from a well-designed and well-performed perceptual test. In this paper, a novel methodology that is built on previous work (Middag et al., 2008 is presented. It utilizes phonological features, automatic speech alignment based on acoustic models that were trained on normal speech, context-dependent speaker feature extraction, and intelligibility prediction based on a small model that can be trained on pathological speech samples. The experimental evaluation of the new system reveals that the root mean squared error of the discrepancies between perceived and computed intelligibilities can be as low as 8 on a scale of 0 to 100.
Somanath, Keerthan; Mau, Ted
(1) To develop an automated algorithm to analyze electroglottographic (EGG) signal in continuous dysphonic speech, and (2) to identify EGG waveform parameters that correlate with the auditory-perceptual quality of strain in the speech of patients with adductor spasmodic dysphonia (ADSD). Software development with application in a prospective controlled study. EGG was recorded from 12 normal speakers and 12 subjects with ADSD reading excerpts from the Rainbow Passage. Data were processed by a new algorithm developed with the specific goal of analyzing continuous dysphonic speech. The contact quotient, pulse width, a new parameter peak skew, and various contact closing slope quotient and contact opening slope quotient measures were extracted. EGG parameters were compared between normal and ADSD speech. Within the ADSD group, intra-subject comparison was also made between perceptually strained syllables and unstrained syllables. The opening slope quotient SO7525 distinguished strained syllables from unstrained syllables in continuous speech within individual subjects with ADSD. The standard deviations, but not the means, of contact quotient, EGGW50, peak skew, and SO7525 were different between normal and ADSD speakers. The strain-stress pattern in continuous speech can be visualized as color gradients based on the variation of EGG parameter values. EGG parameters may provide a within-subject measure of vocal strain and serve as a marker for treatment response. The addition of EGG to multidimensional assessment may lead to improved characterization of the voice disturbance in ADSD. Copyright Â© 2016 The Voice Foundation. Published by Elsevier Inc. All rights reserved.
Maryn, Youri; Kim, Hyung-Tae; Kim, Jaeock
The purpose of this study was to explore the criterion-related concurrent validity of two standardized auditory-perceptual rating protocols and the Acoustic Voice Quality Index (AVQI) for measuring dysphonia severity in Korean speech. Sixty native Korean subjects with various voice disorders were asked to sustain the vowel [a:] and to read aloud the Korean text "Walk." A 3-second midvowel portion of the sustained vowel and two sentences (with 25 syllables) were edited, concatenated, and analyzed according to methods described elsewhere. From 56 participants, both continuous speech and sustained vowel recordings had sufficiently high signal-to-noise ratios (35.5 dB and 37 dB on average, respectively) and were therefore subjected to further dysphonia severity analysis with (1) "G" or Grade from the GRBAS protocol, (2) "OS" or Overall Severity from the Consensus Auditory-Perceptual Evaluation of Voice protocol, and (3) AVQI. First, high correlations were found between G and OS (rS = 0.955 for sustained vowels; rS = 0.965 for continuous speech). Second, the AVQI showed a strong correlation with G (rS = 0.911) as well as OS (rP = 0.924). These findings are in agreement with similar studies dealing with continuous speech in other languages. The present study highlights the criterion-related concurrent validity of these methods in Korean speech. Furthermore, it supports the cross-linguistic robustness of the AVQI as a valid and objective marker of overall dysphonia severity. Copyright © 2016 The Voice Foundation. Published by Elsevier Inc. All rights reserved.
Miller, Nick; Nath, Uma; Noble, Emma; Burn, David
To determine if perceptual speech measures distinguish people with Parkinson's disease (PD), multiple system atrophy with predominant parkinsonism (MSA-P) and progressive supranuclear palsy (PSP). Speech-language therapists blind to patient characteristics employed clinical rating scales to evaluate speech/voice in 24 people with clinically diagnosed PD, 17 with PSP and 9 with MSA-P, matched for disease duration (mean 4.9 years, standard deviation 2.2). No consistent intergroup differences appeared on specific speech/voice variables. People with PD were significantly less impaired on overall speech/voice severity. Analyses by severity suggested further investigation around laryngeal, resonance and fluency changes may characterize individual groups. MSA-P and PSP compared with PD were distinguished by severity of speech/voice deterioration, but individual speech/voice parameters failed to consistently differentiate groups.
Sergent, Marie T.; Sedlacek, William E.
Describes perceptual mapping, a newly developed method for assessing perceptions of campus environments. Describes evaluation of a student union by students using this method. Discusses the advantages and disadvantages of this perceptual mapping method for assessing college environments. (Author/ABL)
Brons, Inge; Houben, Rolph; Dreschler, Wouter A
Time-frequency masking is a method for noise reduction that is based on the time-frequency representation of a speech in noise signal. Depending on the estimated signal-to-noise ratio (SNR), each time-frequency unit is either attenuated or not. A special type of a time-frequency mask is the ideal binary mask (IBM), which has access to the real SNR (ideal). The IBM either retains or removes each time-frequency unit (binary mask). The IBM provides large improvements in speech intelligibility and is a valuable tool for investigating how different factors influence intelligibility. This study extends the standard outcome measure (speech intelligibility) with additional perceptual measures relevant for noise reduction: listening effort, noise annoyance, speech naturalness, and overall preference. Four types of time-frequency masking were evaluated: the original IBM, a tempered version of the IBM (called ITM) which applies limited and non-binary attenuation, and non-ideal masking (also tempered) with two different types of noise-estimation algorithms. The results from ideal masking imply that there is a trade-off between intelligibility and sound quality, which depends on the attenuation strength. Additionally, the results for non-ideal masking suggest that subjective measures can show effects of noise reduction even if noise reduction does not lead to differences in intelligibility.
Full Text Available Recent studies suggest that multisensory integration is enhanced in older adults but it is not known whether this enhancement is solely driven by perceptual processes or affected by cognitive processes. Using the ‘McGurk illusion’, in Experiment 1 we found that audio-visual integration of incongruent audio-visual words was higher in older adults than in younger adults, although the recognition of either audio- or visual-only presented words was the same across groups. In Experiment 2 we tested recall of sentences within which an incongruent audio-visual speech word was embedded. The overall semantic meaning of the sentence was compatible with either one of the unisensory components of the target word and/or with the illusory percept. Older participants recalled more illusory audio-visual words in sentences than younger adults, however, there was no differential effect of word compatibility on recall for the two groups. Our findings suggest that the relatively high susceptibility to the audio-visual speech illusion in older participants is due more to perceptual than cognitive processing.
Loebach, Jeremy L; Pisoni, David B; Svirsky, Mario A
The effect of feedback and materials on perceptual learning was examined in listeners with normal hearing who were exposed to cochlear implant simulations. Generalization was most robust when feedback paired the spectrally degraded sentences with their written transcriptions, promoting mapping between the degraded signal and its acoustic-phonetic representation. Transfer-appropriate processing theory suggests that such feedback was most successful because the original learning conditions were reinstated at testing: Performance was facilitated when both training and testing contained degraded stimuli. In addition, the effect of semantic context on generalization was assessed by training listeners on meaningful or anomalous sentences. Training with anomalous sentences was as effective as that with meaningful sentences, suggesting that listeners were encouraged to use acoustic-phonetic information to identify speech than to make predictions from semantic context.
Eberhardt, Silvio P; Auer, Edward T; Bernstein, Lynne E
In a series of studies we have been investigating how multisensory training affects unisensory perceptual learning with speech stimuli. Previously, we reported that audiovisual (AV) training with speech stimuli can promote auditory-only (AO) perceptual learning in normal-hearing adults but can impede learning in congenitally deaf adults with late-acquired cochlear implants. Here, impeder and promoter effects were sought in normal-hearing adults who participated in lipreading training. In Experiment 1, visual-only (VO) training on paired associations between CVCVC nonsense word videos and nonsense pictures demonstrated that VO words could be learned to a high level of accuracy even by poor lipreaders. In Experiment 2, visual-auditory (VA) training in the same paradigm but with the addition of synchronous vocoded acoustic speech impeded VO learning of the stimuli in the paired-associates paradigm. In Experiment 3, the vocoded AO stimuli were shown to be less informative than the VO speech. Experiment 4 combined vibrotactile speech stimuli with the visual stimuli during training. Vibrotactile stimuli were shown to promote visual perceptual learning. In Experiment 5, no-training controls were used to show that training with visual speech carried over to consonant identification of untrained CVCVC stimuli but not to lipreading words in sentences. Across this and previous studies, multisensory training effects depended on the functional relationship between pathways engaged during training. Two principles are proposed to account for stimulus effects: (1) Stimuli presented to the trainee's primary perceptual pathway will impede learning by a lower-rank pathway. (2) Stimuli presented to the trainee's lower rank perceptual pathway will promote learning by a higher-rank pathway. The mechanisms supporting these principles are discussed in light of multisensory reverse hierarchy theory (RHT).
David L Woods
Full Text Available Hearing aids (HAs only partially restore the ability of older hearing impaired (OHI listeners to understand speech in noise, due in large part to persistent deficits in consonant identification. Here, we investigated whether adaptive perceptual training would improve consonant-identification in noise in sixteen aided OHI listeners who underwent 40 hours of computer-based training in their homes. Listeners identified 20 onset and 20 coda consonants in 9,600 consonant-vowel-consonant (CVC syllables containing different vowels (/ɑ/, /i/, or /u/ and spoken by four different talkers. Consonants were presented at three consonant-specific signal-to-noise ratios (SNRs spanning a 12 dB range. Noise levels were adjusted over training sessions based on d' measures. Listeners were tested before and after training to measure (1 changes in consonant-identification thresholds using syllables spoken by familiar and unfamiliar talkers, and (2 sentence reception thresholds (SeRTs using two different sentence tests. Consonant-identification thresholds improved gradually during training. Laboratory tests of d' thresholds showed an average improvement of 9.1 dB, with 94% of listeners showing statistically significant training benefit. Training normalized consonant confusions and improved the thresholds of some consonants into the normal range. Benefits were equivalent for onset and coda consonants, syllables containing different vowels, and syllables presented at different SNRs. Greater training benefits were found for hard-to-identify consonants and for consonants spoken by familiar than unfamiliar talkers. SeRTs, tested with simple sentences, showed less elevation than consonant-identification thresholds prior to training and failed to show significant training benefit, although SeRT improvements did correlate with improvements in consonant thresholds. We argue that the lack of SeRT improvement reflects the dominant role of top-down semantic processing in
De Smet, Hyo Jung; Catsman-Berrevoets, Coriene; Aarsen, Femke; Verhoeven, Jo; Mariën, Peter; Paquier, Philippe F
Mutism and Subsequent Dysarthria (MSD) and the Posterior Fossa Syndrome (PFS) have become well-recognized clinical entities which may develop after resection of cerebellar tumours. However, speech characteristics following a period of mutism have not been documented in much detail. This study carried out a perceptual speech analysis in 24 children and adolescents (of whom 12 became mute in the immediate postoperative phase) 1-12.2 years after cerebellar tumour resection. The most prominent speech deficits in this study were distorted vowels, slow rate, voice tremor, and monopitch. Factors influencing long-term speech disturbances are presence or absence of postoperative PFS, the localisation of the surgical lesion and the type of adjuvant treatment. Long-term speech deficits may be present up to 12 years post-surgery. The speech deficits found in children and adolescents with cerebellar lesions following cerebellar tumour surgery do not necessarily resemble adult speech characteristics of ataxic dysarthria. Copyright © 2012 European Paediatric Neurology Society. Published by Elsevier Ltd. All rights reserved.
This work combines interdisciplinary knowledge and experience from research fields of psychology, linguistics, audio-processing, machine learning, and computer science. The work systematically explores a novel research topic devoted to automated modeling of personality expression from speech. For this aim, it introduces a novel personality assessment questionnaire and presents the results of extensive labeling sessions to annotate the speech data with personality assessments. It provides estimates of the Big 5 personality traits, i.e. openness, conscientiousness, extroversion, agreeableness, and neuroticism. Based on a database built on the questionnaire, the book presents models to tell apart different personality types or classes from speech automatically.
.... To aid in the assessment of various commercially available speech recognition systems, several aircraft speech databases have been developed at the Air Force Research Laboratory's Human Effectiveness Directorate...
Nittrouer, Susan; Lowenstein, Joanna H.
Developmental dyslexia is a condition in which children encounter difficulty learning to read in spite of adequate instruction. Although considerable effort has been expended trying to identify the source of the problem, no single solution has been agreed upon. The current study explored a new hypothesis, that developmental dyslexia may be due to faulty perceptual organization of linguistically relevant sensory input. To test that idea, sentence-length speech signals were processed to create either sine-wave or noise-vocoded analogs. Seventy children between 8 and 11 years of age, with and without dyslexia participated. Children with dyslexia were selected to have phonological awareness deficits, although those without such deficits were retained in the study. The processed sentences were presented for recognition, and measures of reading, phonological awareness, and expressive vocabulary were collected. Results showed that children with dyslexia, regardless of phonological subtype, had poorer recognition scores than children without dyslexia for both kinds of degraded sentences. Older children with dyslexia recognized the sine-wave sentences better than younger children with dyslexia, but no such effect of age was found for the vocoded materials. Recognition scores were used as predictor variables in regression analyses with reading, phonological awareness, and vocabulary measures used as dependent variables. Scores for both sorts of sentence materials were strong predictors of performance on all three dependent measures when all children were included, but only performance for the sine-wave materials explained significant proportions of variance when only children with dyslexia were included. Finally, matching young, typical readers with older children with dyslexia on reading abilities did not mitigate the group difference in recognition of vocoded sentences. Conclusions were that children with dyslexia have difficulty organizing linguistically relevant sensory
Van Niekerk, DR
Full Text Available With the increasing prominence and maturity of corpus-based techniques for speech synthesis, the process of system development has in some ways been simplified considerably. However, the dependence on sufficient amounts of relevant speech data...
Huang, Y. B.; Fan, M. H.; Zhang, Q. Y.
With constant progress in modern speech communication technologies, the speech data is prone to be attacked by the noise or maliciously tampered. In order to make the speech perception hash algorithm has strong robustness and high efficiency, this paper put forward a speech perception hash algorithm based on the tensor decomposition and multi features is proposed. This algorithm analyses the speech perception feature acquires each speech component wavelet packet decomposition. LPCC, LSP and ISP feature of each speech component are extracted to constitute the speech feature tensor. Speech authentication is done by generating the hash values through feature matrix quantification which use mid-value. Experimental results showing that the proposed algorithm is robust for content to maintain operations compared with similar algorithms. It is able to resist the attack of the common background noise. Also, the algorithm is highly efficiency in terms of arithmetic, and is able to meet the real-time requirements of speech communication and complete the speech authentication quickly.
Mowlaee, Pejman; Saeidi, Rahim; Christensen, Mads Græsbøll
Previous studies on performance evaluation of single-channel speech separation (SCSS) algorithms mostly focused on automatic speech recognition (ASR) accuracy as their performance measure. Assessing the separated signals by different metrics other than this has the benefit that the results...... are expected to carry on to other applications beyond ASR. In this paper, in addition to conventional speech quality metrics (PESQ and SNRloss), we also evaluate the separation systems output using different source separation metrics: blind source separation evaluation (BSS EVAL) and perceptual evaluation...... that PESQ and PEASS quality metrics predict well the subjective quality of separated signals obtained by the separation systems. From the results it is observed that the short-time objective intelligibility (STOI) measure predict the speech intelligibility results....
Demir, Özlem Ece; So, Wing-Chee; Özyürek, Asli; Goldin-Meadow, Susan
Speakers choose a particular expression based on many factors, including availability of the referent in the perceptual context. We examined whether, when expressing referents, monolingual English- and Turkish-speaking children: (1) are sensitive to perceptual context, (2) express this sensitivity in language-specific ways, and (3) use co-speech gestures to specify referents that are underspecified. We also explored the mechanisms underlying children’s sensitivity to perceptual context. Children described short vignettes to an experimenter under two conditions: The characters in the vignettes were present in the perceptual context (perceptual context); the characters were absent (no perceptual context). Children routinely used nouns in the no perceptual context condition, but shifted to pronouns (English-speaking children) or omitted arguments (Turkish-speaking children) in the perceptual context condition. Turkish-speaking children used underspecified referents more frequently than English-speaking children in the perceptual context condition; however, they compensated for the difference by using gesture to specify the forms. Gesture thus gives children learning structurally different languages a way to achieve comparable levels of specification while at the same time adhering to the referential expressions dictated by their language. PMID:22904588
Sussman, Joan E.; Tjaden, Kris
Purpose: The primary purpose of this study was to compare percent correct word and sentence intelligibility scores for individuals with multiple sclerosis (MS) and Parkinson's disease (PD) with scaled estimates of speech severity obtained for a reading passage. Method: Speech samples for 78 talkers were judged, including 30 speakers with MS, 16…
Deroost, Natacha; Coomans, Daphné
We examined the role of sequence awareness in a pure perceptual sequence learning design. Participants had to react to the target's colour that changed according to a perceptual sequence. By varying the mapping of the target's colour onto the response keys, motor responses changed randomly. The effect of sequence awareness on perceptual sequence learning was determined by manipulating the learning instructions (explicit versus implicit) and assessing the amount of sequence awareness after the experiment. In the explicit instruction condition (n = 15), participants were instructed to intentionally search for the colour sequence, whereas in the implicit instruction condition (n = 15), they were left uninformed about the sequenced nature of the task. Sequence awareness after the sequence learning task was tested by means of a questionnaire and the process-dissociation-procedure. The results showed that the instruction manipulation had no effect on the amount of perceptual sequence learning. Based on their report to have actively applied their sequence knowledge during the experiment, participants were subsequently regrouped in a sequence strategy group (n = 14, of which 4 participants from the implicit instruction condition and 10 participants from the explicit instruction condition) and a no-sequence strategy group (n = 16, of which 11 participants from the implicit instruction condition and 5 participants from the explicit instruction condition). Only participants of the sequence strategy group showed reliable perceptual sequence learning and sequence awareness. These results indicate that perceptual sequence learning depends upon the continuous employment of strategic cognitive control processes on sequence knowledge. Sequence awareness is suggested to be a necessary but not sufficient condition for perceptual learning to take place. Copyright © 2018 Elsevier B.V. All rights reserved.
Eskelund, Kasper; Dau, Torsten
Speech perception integrates signal from ear and eye. This is witnessed by a wide range of audiovisual integration effects, such as ventriloquism and the McGurk illusion. Some behavioral evidence suggest that audiovisual integration of specific aspects is special for speech perception. However, our...... knowledge of such bimodal integration would be strengthened if the phenomena could be investigated by objective, neutrally based methods. One key question of the present work is if perceptual processing of audiovisual speech can be gauged with a specific signature of neurophysiological activity...... on the auditory speech percept? In two experiments, which both combine behavioral and neurophysiological measures, an uncovering of the relation between perception of faces and of audiovisual integration is attempted. Behavioral findings suggest a strong effect of face perception, whereas the MMN results are less...
Aichert, Ingrid; Späth, Mona; Ziegler, Wolfram
Several factors are known to influence speech accuracy in patients with apraxia of speech (AOS), e.g., syllable structure or word length. However, the impact of word stress has largely been neglected so far. More generally, the role of prosodic information at the phonetic encoding stage of speech production often remains unconsidered in models of speech production. This study aimed to investigate the influence of word stress on error production in AOS. Two-syllabic words with stress on the first (trochees) vs. the second syllable (iambs) were compared in 14 patients with AOS, three of them exhibiting pure AOS, and in a control group of six normal speakers. The patients produced significantly more errors on iambic than on trochaic words. A most prominent metrical effect was obtained for segmental errors. Acoustic analyses of word durations revealed a disproportionate advantage of the trochaic meter in the patients relative to the healthy controls. The results indicate that German apraxic speakers are sensitive to metrical information. It is assumed that metrical patterns function as prosodic frames for articulation planning, and that the regular metrical pattern in German, the trochaic form, has a facilitating effect on word production in patients with AOS. Copyright © 2016 Elsevier Ltd. All rights reserved.
Mauszycki, Shannon C.; Wambaugh, Julie L.; Cameron, Rosalea M.
Purpose: Early apraxia of speech (AOS) research has characterized errors as being variable, resulting in a number of different error types being produced on repeated productions of the same stimuli. Conversely, recent research has uncovered greater consistency in errors, but there are limited data examining sound errors over time (more than one…
Jerger, Susan; Damian, Markus F.; Mills, Candice; Bartlett, James; Tye-Murray, Nancy; Abdi, Herve
Purpose: To examine whether semantic access by speech requires attention in children. Method: Children ("N" = 200) named pictures and ignored distractors on a cross-modal (distractors: auditory-no face) or multimodal (distractors: auditory-static face and audiovisual- dynamic face) picture word task. The cross-modal task had a low load,…
Lansford, Kaitlin L; Borrie, Stephanie A; Bystricky, Lukas
It has been documented in laboratory settings that familiarizing listeners with dysarthric speech improves intelligibility of that speech. If these findings can be replicated in real-world settings, the ability to improve communicative function by focusing on communication partners has major implications for extending clinical practice in dysarthria rehabilitation. An important step toward development of a listener-targeted treatment approach requires establishment of its ecological validity. To this end, the present study leveraged the mechanism of crowdsourcing to determine whether perceptual-training benefits achieved by listeners in the laboratory could be elicited in an at-home computer-based scenario. Perceptual-training data (i.e., intelligibility scores from a posttraining transcription task) were collected from listeners in 2 settings-the laboratory and the crowdsourcing website Amazon Mechanical Turk. Consistent with previous findings, results revealed a main effect of training condition (training vs. control) on intelligibility scores. There was, however, no effect of training setting (Mechanical Turk vs. laboratory). Thus, the perceptual benefit achieved via Mechanical Turk was comparable to that achieved in the laboratory. This study provides evidence regarding the ecological validity of perceptual-training paradigms designed to improve intelligibility of dysarthric speech, thereby supporting their continued advancement as a listener-targeted treatment option.
Strand, Edythe A.; McCauley, Rebecca J.; Weigand, Stephen D.; Stoeckel, Ruth E.; Baas, Becky S.
Purpose: In this article, the authors report reliability and validity evidence for the Dynamic Evaluation of Motor Speech Skill (DEMSS), a new test that uses dynamic assessment to aid in the differential diagnosis of childhood apraxia of speech (CAS). Method: Participants were 81 children between 36 and 79 months of age who were referred to the…
Howe, Tsu-Hsin; Chen, Hao-Ling; Lee, Candy Chieh; Chen, Ying-Dar; Wang, Tien-Ni
Visual perceptual motor skills have been proposed as underlying courses of handwriting difficulties. However, there is no evaluation tool currently available to assess these skills comprehensively and to serve as a sensitive measure. The purpose of this study was to validate the Computerized Perceptual Motor Skills Assessment (CPMSA), a newly developed evaluation tool for children in early elementary grades. Its test-retest reliability, concurrent validity, discriminant validity, and responsiveness were examined in 43 typically developing children and 26 children with handwriting difficulty. The CPMSA demonstrated excellent reliability across all subtests with intra-class correlation coefficients (ICCs)≥0.80. Significant moderate correlations between the domains of the CPMSA and corresponding gold standards including Beery VMI, the TVPS-3, and the eye-hand coordination subtest of the DTVP-2 demonstrated good concurrent validity. In addition, the CPMSA showed evidence of discriminant validity in samples of children with and without handwriting difficulty. This article provides evidence in support of the CPMSA. The CPMSA is a reliable, valid, and promising measure of visual perceptual motor skills for children in early elementary grades. Directions for future study and improvements to the assessment are discussed. Copyright © 2017. Published by Elsevier Ltd.
Gubiani, Marileda Barichello; Pagliarin, Karina Carlesso; Keske-Soares, Marcia
This study systematically reviews the literature on the main tools used to evaluate childhood apraxia of speech (CAS). The search strategy includes Scopus, PubMed, and Embase databases. Empirical studies that used tools for assessing CAS were selected. Articles were selected by two independent researchers. The search retrieved 695 articles, out of which 12 were included in the study. Five tools were identified: Verbal Motor Production Assessment for Children, Dynamic Evaluation of Motor Speech Skill, The Orofacial Praxis Test, Kaufman Speech Praxis Test for Children, and Madison Speech Assessment Protocol. There are few instruments available for CAS assessment and most of them are intended to assess praxis and/or orofacial movements, sequences of orofacial movements, articulation of syllables and phonemes, spontaneous speech, and prosody. There are some tests for assessment and diagnosis of CAS. However, few studies on this topic have been conducted at the national level, as well as protocols to assess and assist in an accurate diagnosis.
Willadsen, Elisabeth; Henningsson, Gunilla
. Finally, the influence of different languages on some aspects of language acquisition in young children with cleft palate is presented and discussed. Until recently, not much has been written about cross linguistic perspectives when dealing with cleft palate speech. Most literature about assessment......This chapter deals with cross linguistic perspectives that need to be taken into account when comparing speech assessment and speech outcome obtained from cleft palate speakers of different languages. Firstly, an overview of consonants and vowels vulnerable to the cleft condition is presented. Then......, consequences for assessment of cleft palate speech by native versus non-native speakers of a language are discussed, as well as the use of phonemic versus phonetic transcription in cross linguistic studies. Specific recommendations for the construction of speech samples in cross linguistic studies are given...
Nemr, Kátia; Amar, Ali; Abrahão, Marcio; Leite, Grazielle Capatto de Almeida; Köhle, Juliana; Santos, Alexandra de O; Correa, Luiz Artur Costa
As a result of technology evolution and development, methods of voice evaluation have changed both in medical and speech and language pathology practice. To relate the results of perceptual evaluation, acoustic analysis and medical evaluation in the diagnosis of vocal and/or laryngeal affections of the population with vocal complaint. Clinical prospective. 29 people that attended vocal health protection campaign were evaluated. They were submitted to perceptual evaluation (AFPA), acoustic analysis (AA), indirect laryngoscopy (LI) and telelaryngoscopy (TL). Correlations between medical and speech language pathology evaluation methods were established, verifying possible statistical signification with the application of Fischer Exact Test. There were statistically significant results in the correlation between AFPA and LI, AFPA and TL, LI and TL. This research study conducted in a vocal health protection campaign presented correlations between speech language pathology evaluation and perceptual evaluation and clinical evaluation, as well as between vocal affection and/or laryngeal medical exams.
Automatic Speech Signal Analysis for Clinical Diagnosis and Assessment of Speech Disorders provides a survey of methods designed to aid clinicians in the diagnosis and monitoring of speech disorders such as dysarthria and dyspraxia, with an emphasis on the signal processing techniques, statistical validity of the results presented in the literature, and the appropriateness of methods that do not require specialized equipment, rigorously controlled recording procedures or highly skilled personnel to interpret results. Such techniques offer the promise of a simple and cost-effective, yet objective, assessment of a range of medical conditions, which would be of great value to clinicians. The ideal scenario would begin with the collection of examples of the clients’ speech, either over the phone or using portable recording devices operated by non-specialist nursing staff. The recordings could then be analyzed initially to aid diagnosis of conditions, and subsequently to monitor the clients’ progress and res...
Garcia, Julian Martinez-Villalba; Jeong, Cheol-Ho; Brunskog, Jonas
This study proposes a numerical and experimental framework for evaluating the perceptual aspect of the diffuse field condition with intended final use in music auditoria. Multiple Impulse Responses are simulated based on the time domain Poisson process with increasing reflection density. Different...
Full Text Available Automatic detection of voice pathologies enables non-invasive, low cost and objective assessments of the presence of disorders, as well as accelerating and improving the process of diagnosis and clinical treatment given to patients. In this work, a vector made up of 28 acoustic parameters is evaluated using principal component analysis (PCA, kernel principal component analysis (kPCA and an auto-associative neural network (NLPCA in four kinds of pathology detection (hyperfunctional dysphonia, functional dysphonia, laryngitis, vocal cord paralysis using the a, i and u vowels, spoken at a high, low and normal pitch. The results indicate that the kPCA and NLPCA methods can be considered a step towards pathology detection of the vocal folds. The results show that such an approach provides acceptable results for this purpose, with the best efficiency levels of around 100%. The study brings the most commonly used approaches to speech signal processing together and leads to a comparison of the machine learning methods determining the health status of the patient
William L Schuerman
Full Text Available In different tasks involving action perception, performance has been found to be facilitated when the presented stimuli were produced by the participants themselves rather than by another participant. These results suggest that the same mental representations are accessed during both production and perception. However, with regard to spoken word perception, evidence also suggests that listeners' representations for speech reflect the input from their surrounding linguistic community rather than their own idiosyncratic productions. Furthermore, speech perception is heavily influenced by indexical cues that may lead listeners to frame their interpretations of incoming speech signals with regard to speaker identity. In order to determine whether word recognition evinces similar self-advantages as found in action perception, it was necessary to eliminate indexical cues from the speech signal. We therefore asked participants to identify noise-vocoded versions of Dutch words that were based on either their own recordings or those of a statistically average speaker. The majority of participants were more accurate for the average speaker than for themselves, even after taking into account differences in intelligibility. These results suggest that the speech representations accessed during perception of noise-vocoded speech are more reflective of the input of the speech community, and hence that speech perception is not necessarily based on representations of one's own speech.
Nielsen, Jens Bo
Reliable methods for assessing speech intelligibility are essential within hearing research, audiology, and related areas. Such methods can be used for obtaining a better understanding of how speech intelligibility is affected by, e.g., various environmental factors or different types of hearing...... impairment. In this thesis, two sentence-based tests for speech intelligibility in Danish were developed. The first test is the Conversational Language Understanding Evaluation (CLUE), which is based on the principles of the original American-English Hearing in Noise Test (HINT). The second test...... is a modified version of CLUE where the speech material and the scoring rules have been reconsidered. An extensive validation of the modified test was conducted with both normal-hearing and hearing-impaired listeners. The validation showed that the test produces reliable results for both groups of listeners...
Kaplanis, Neofytos; Bech, Søren; Sakari, Tervo
This paper reports the design and implementation of a method to perceptually assess the acoustical prop- erties of a car cabin and the subsequent sound reproduction properties of automotive audio systems. Here, we combine Spatial Decomposition Method and Rapid Sensory Analysis techniques. The for......This paper reports the design and implementation of a method to perceptually assess the acoustical prop- erties of a car cabin and the subsequent sound reproduction properties of automotive audio systems. Here, we combine Spatial Decomposition Method and Rapid Sensory Analysis techniques...
Pope, Diana S; Miller-Klein, Erik T
Hospitals have complex soundscapes that create challenges to patient care. Extraneous noise and high reverberation rates impair speech intelligibility, which leads to raised voices. In an unintended spiral, the increasing noise may result in diminished speech privacy, as people speak loudly to be heard over the din. The products available to improve hospital soundscapes include construction materials that absorb sound (acoustic ceiling tiles, carpet, wall insulation) and reduce reverberation rates. Enhanced privacy curtains are now available and offer potential for a relatively simple way to improve speech privacy and speech intelligibility by absorbing sound at the hospital patient's bedside. Acoustic assessments were performed over 2 days on two nursing units with a similar design in the same hospital. One unit was built with the 1970s' standard hospital construction and the other was newly refurbished (2013) with sound-absorbing features. In addition, we determined the effect of an enhanced privacy curtain versus standard privacy curtains using acoustic measures of speech privacy and speech intelligibility indexes. Privacy curtains provided auditory protection for the patients. In general, that protection was increased by the use of enhanced privacy curtains. On an average, the enhanced curtain improved sound absorption from 20% to 30%; however, there was considerable variability, depending on the configuration of the rooms tested. Enhanced privacy curtains provide measureable improvement to the acoustics of patient rooms but cannot overcome larger acoustic design issues. To shorten reverberation time, additional absorption, and compact and more fragmented nursing unit floor plate shapes should be considered.
Diana S Pope
Full Text Available Hospitals have complex soundscapes that create challenges to patient care. Extraneous noise and high reverberation rates impair speech intelligibility, which leads to raised voices. In an unintended spiral, the increasing noise may result in diminished speech privacy, as people speak loudly to be heard over the din. The products available to improve hospital soundscapes include construction materials that absorb sound (acoustic ceiling tiles, carpet, wall insulation and reduce reverberation rates. Enhanced privacy curtains are now available and offer potential for a relatively simple way to improve speech privacy and speech intelligibility by absorbing sound at the hospital patient′s bedside. Acoustic assessments were performed over 2 days on two nursing units with a similar design in the same hospital. One unit was built with the 1970s′ standard hospital construction and the other was newly refurbished (2013 with sound-absorbing features. In addition, we determined the effect of an enhanced privacy curtain versus standard privacy curtains using acoustic measures of speech privacy and speech intelligibility indexes. Privacy curtains provided auditory protection for the patients. In general, that protection was increased by the use of enhanced privacy curtains. On an average, the enhanced curtain improved sound absorption from 20% to 30%; however, there was considerable variability, depending on the configuration of the rooms tested. Enhanced privacy curtains provide measureable improvement to the acoustics of patient rooms but cannot overcome larger acoustic design issues. To shorten reverberation time, additional absorption, and compact and more fragmented nursing unit floor plate shapes should be considered.
Pope, Diana S.; Miller-Klein, Erik T.
Hospitals have complex soundscapes that create challenges to patient care. Extraneous noise and high reverberation rates impair speech intelligibility, which leads to raised voices. In an unintended spiral, the increasing noise may result in diminished speech privacy, as people speak loudly to be heard over the din. The products available to improve hospital soundscapes include construction materials that absorb sound (acoustic ceiling tiles, carpet, wall insulation) and reduce reverberation rates. Enhanced privacy curtains are now available and offer potential for a relatively simple way to improve speech privacy and speech intelligibility by absorbing sound at the hospital patient's bedside. Acoustic assessments were performed over 2 days on two nursing units with a similar design in the same hospital. One unit was built with the 1970s’ standard hospital construction and the other was newly refurbished (2013) with sound-absorbing features. In addition, we determined the effect of an enhanced privacy curtain versus standard privacy curtains using acoustic measures of speech privacy and speech intelligibility indexes. Privacy curtains provided auditory protection for the patients. In general, that protection was increased by the use of enhanced privacy curtains. On an average, the enhanced curtain improved sound absorption from 20% to 30%; however, there was considerable variability, depending on the configuration of the rooms tested. Enhanced privacy curtains provide measureable improvement to the acoustics of patient rooms but cannot overcome larger acoustic design issues. To shorten reverberation time, additional absorption, and compact and more fragmented nursing unit floor plate shapes should be considered. PMID:26780959
De Lamo White, Caroline; Jin, Lixian
British society is multicultural and multilingual, thus for many children English is not their main or only language. Speech and language therapists are required to assess accurately the speech and language skills of bilingual children if they are suspected of having a disorder. Cultural and linguistic diversity means that a more complex assessment procedure is needed and research suggests that bilingual children are at risk of misdiagnosis. Clinicians have identified a lack of suitable assessment instruments for use with this client group. This paper highlights the challenges of assessing bilingual children and reviews available speech and language assessment procedures and approaches for use with this client group. It evaluates different approaches for assessing bilingual children to identify approaches that may be more appropriate for carrying out assessments effectively. This review discusses and evaluates the efficacy of norm-referenced standardized measures, criterion-referenced measures, language-processing measures, dynamic assessment and a sociocultural approach. When all named procedures and approaches are compared, the sociocultural approach appears to hold the most promise for accurate assessment of bilingual children. Research suggests that language-processing measures are not effective indicators for identifying speech and language disorders in bilingual children, but further research is warranted. The sociocultural approach encompasses some of the other approaches discussed, including norm-referenced measures, criterion-referenced measures and dynamic assessment. The sociocultural approach enables the clinician to interpret results in the light of the child's linguistic and cultural background. In addition, combining approaches mitigates the weaknesses inherent in each approach. © 2011 Royal College of Speech and Language Therapists.
Tolli, Michela, E-mail: email@example.com [Department of Architecture and Design (DiAP), Sapienza University of Rome, Via Gramsci 53, 00197 Rome (Italy); Recanatesi, Fabio, E-mail: firstname.lastname@example.org [Department of Agriculture, Forests, Nature and Energy (D.A.F.N.E.), Tuscia University, Via S. Camillo de Lellis, 01100 Viterbo (Italy); Piccinno, Matteo; Leone, Antonio [Department of Agriculture, Forests, Nature and Energy (D.A.F.N.E.), Tuscia University, Via S. Camillo de Lellis, 01100 Viterbo (Italy)
The main aim of this paper is to explore how perceptual and aesthetic impact analyses are considered in Environmental Impact Assessment (EIA), with specific reference to Italian renewable energy projects. To investigate this topic, the paper starts by establishing which factors are linked with perceptual and aesthetic impacts and why it is important to analyze these aspects, which are also related to legislative provisions and procedures in Europe and in Italy. In particular the paper refers to renewable energy projects because environmental policies are encouraging more and more investment in this kind of primary resource. The growing interest in this type of energy is leading to the realization of projects which change the governance of territories, with inevitable effects on the landscape from the aesthetic and perceptual points of view. Legislative references to EIA, including the latest directive regarding this topic show the importance of integrating the assessment of environmental and perceptual impacts, thus there is a need to improve EIA methodological approaches to this purpose. This paper proposes a profile of aesthetic and perceptual impact analysis in EIA for renewable energy projects in Italy, and concludes with recommendations as to how this kind of analysis could be improved. - Highlights: • We analyze 29 EIA Reports of Italian renewable energy projects. • We examine esthetic and perceptual aspects present in Italian EIA reports. • We identified inconsistency in use of methods for esthetic and perceptual aspects. • Local populations are rarely included as stakeholders in EIAs. • A shared understanding of perceptual and esthetic issues in EIA proceedings is required.
Tolli, Michela; Recanatesi, Fabio; Piccinno, Matteo; Leone, Antonio
The main aim of this paper is to explore how perceptual and aesthetic impact analyses are considered in Environmental Impact Assessment (EIA), with specific reference to Italian renewable energy projects. To investigate this topic, the paper starts by establishing which factors are linked with perceptual and aesthetic impacts and why it is important to analyze these aspects, which are also related to legislative provisions and procedures in Europe and in Italy. In particular the paper refers to renewable energy projects because environmental policies are encouraging more and more investment in this kind of primary resource. The growing interest in this type of energy is leading to the realization of projects which change the governance of territories, with inevitable effects on the landscape from the aesthetic and perceptual points of view. Legislative references to EIA, including the latest directive regarding this topic show the importance of integrating the assessment of environmental and perceptual impacts, thus there is a need to improve EIA methodological approaches to this purpose. This paper proposes a profile of aesthetic and perceptual impact analysis in EIA for renewable energy projects in Italy, and concludes with recommendations as to how this kind of analysis could be improved. - Highlights: • We analyze 29 EIA Reports of Italian renewable energy projects. • We examine esthetic and perceptual aspects present in Italian EIA reports. • We identified inconsistency in use of methods for esthetic and perceptual aspects. • Local populations are rarely included as stakeholders in EIAs. • A shared understanding of perceptual and esthetic issues in EIA proceedings is required.
Wang , Junle; Gautier , Josselin; Bosc , Emilie; Li , Jing; Ricordel , Vincent
62; Livrable D6,1 du projet ANR PERSEE; Ce rapport a été réalisé dans le cadre du projet ANR PERSEE (n° ANR-09-BLAN-0170). Exactement il correspond au livrable D6.1 du projet. Son titre : Perceptual Assessment : Definition of the scenarios
Loebach, Jeremy L.; Pisoni, David B.; Svirsky, Mario A.
The effect of feedback and materials on perceptual learning was examined in listeners with normal hearing who were exposed to cochlear implant simulations. Generalization was most robust when feedback paired the spectrally degraded sentences with their written transcriptions, promoting mapping between the degraded signal and its acoustic-phonetic…
Winn, Matthew B; Won, Jong Ho; Moon, Il Joon
This study was conducted to measure auditory perception by cochlear implant users in the spectral and temporal domains, using tests of either categorization (using speech-based cues) or discrimination (using conventional psychoacoustic tests). The authors hypothesized that traditional nonlinguistic tests assessing spectral and temporal auditory resolution would correspond to speech-based measures assessing specific aspects of phonetic categorization assumed to depend on spectral and temporal auditory resolution. The authors further hypothesized that speech-based categorization performance would ultimately be a superior predictor of speech recognition performance, because of the fundamental nature of speech recognition as categorization. Nineteen cochlear implant listeners and 10 listeners with normal hearing participated in a suite of tasks that included spectral ripple discrimination, temporal modulation detection, and syllable categorization, which was split into a spectral cue-based task (targeting the /ba/-/da/ contrast) and a timing cue-based task (targeting the /b/-/p/ and /d/-/t/ contrasts). Speech sounds were manipulated to contain specific spectral or temporal modulations (formant transitions or voice onset time, respectively) that could be categorized. Categorization responses were quantified using logistic regression to assess perceptual sensitivity to acoustic phonetic cues. Word recognition testing was also conducted for cochlear implant listeners. Cochlear implant users were generally less successful at utilizing both spectral and temporal cues for categorization compared with listeners with normal hearing. For the cochlear implant listener group, spectral ripple discrimination was significantly correlated with the categorization of formant transitions; both were correlated with better word recognition. Temporal modulation detection using 100- and 10-Hz-modulated noise was not correlated either with the cochlear implant subjects' categorization of
Hill, Anne Jane; Theodoros, Deborah; Russell, Trevor; Ward, Elizabeth
Background: Telerehabilitation is the remote delivery of rehabilitation services via information technology and telecommunication systems. There have been a number of studies that have used videoconferencing to assess speech and language skills in people with acquired neurogenic communication disorders. However, few studies have focused on cases…
Tamplin, Jeanette; Brazzale, Danny J; Pretto, Jeffrey J; Ruehland, Warren R; Buttifant, Mary; Brown, Douglas J; Berlowitz, David J
To explore how respiratory impairment after cervical spinal cord injury affects vocal function, and to explore muscle recruitment strategies used during vocal tasks after quadriplegia. It was hypothesized that to achieve the increased respiratory support required for singing and loud speech, people with quadriplegia use different patterns of muscle recruitment and control strategies compared with control subjects without spinal cord injury. Matched, parallel-group design. Large university-affiliated public hospital. Consenting participants with motor-complete C5-7 quadriplegia (n=6) and able-bodied age-matched controls (n=6) were assessed on physiologic and voice measures during vocal tasks. Not applicable. Standard respiratory function testing, surface electromyographic activity from accessory respiratory muscles, sound pressure levels during vocal tasks, the Voice Handicap Index, and the Perceptual Voice Profile. The group with quadriplegia had a reduced lung capacity (vital capacity, 71% vs 102% of predicted; P=.028), more perceived voice problems (Voice Handicap Index score, 22.5 vs 6.5; P=.046), and greater recruitment of accessory respiratory muscles during both loud and soft volumes (P=.028) than the able-bodied controls. The group with quadriplegia also demonstrated higher accessory muscle activation in changing from soft to loud speech (P=.028). People with quadriplegia have impaired vocal ability and use different muscle recruitment strategies during speech than the able-bodied. These findings will enable us to target specific measurements of respiratory physiology for assessing functional improvements in response to formal therapeutic singing training. Copyright © 2011 American Congress of Rehabilitation Medicine. Published by Elsevier Inc. All rights reserved.
Jafari, Narges; Salehi, Abolfazl; Izadi, Farzad; Talebian Moghadam, Saeed; Ebadi, Abbas; Dabirmoghadam, Payman; Faham, Maryam; Shahbazi, Mehdi
Muscle tension dysphonia (MTD) is a functional dysphonia, which appears with an excessive tension in the intrinsic and extrinsic laryngeal musculatures. MTD can affect voice quality and quality of life. The purpose of the present study was to assess the effectiveness of vocal function exercises (VFEs) on perceptual and self-assessment ratings in a group of 15 subjects with MTD. The study comprised 15 subjects with MTD (8 men and 7 women, mean age 39.8 years, standard deviation 10.6, age range 24-62 years). All participants were native Persian speakers who underwent a 6-week course of VFEs. The Voice Handicap Index (VHI) (the self-assessment scale) and Grade, Roughness, Breathiness, Asthenia, Strain (GRBAS) scale (perceptual rating of voice quality) were used to compare pre- and post-VFEs. GRBAS data of patients before and after VFEs were compared using Wilcoxon signed-rank test, and VHI data of patients pre- and post-VFEs were compared using Student paired t test. These perceptual parameters showed a statistically significant improvement in subjects with MTD after voice therapy (significant at P self-assessment ratings measurements (with the VHI). As a result, the data provide evidence regarding the efficacy of VFEs in the treatment of patients with MTD. Copyright © 2017 The Voice Foundation. Published by Elsevier Inc. All rights reserved.
Papakonstantinou, Alexandra; Strelcyk, Olaf; Dau, Torsten
This study investigates behavioural and objective measures of temporal auditory processing and their relation to the ability to understand speech in noise. The experiments were carried out on a homogeneous group of seven hearing-impaired listeners with normal sensitivity at low frequencies (up to 1...... kHz) and steeply sloping hearing losses above 1 kHz. For comparison, data were also collected for five normalhearing listeners. Temporal processing was addressed at low frequencies by means of psychoacoustical frequency discrimination, binaural masked detection and amplitude modulation (AM......) detection. In addition, auditory brainstem responses (ABRs) to clicks and broadband rising chirps were recorded. Furthermore, speech reception thresholds (SRTs) were determined for Danish sentences in speechshaped noise. The main findings were: (1) SRTs were neither correlated with hearing sensitivity...
Uhler, Kristin M; Baca, Rosalinda; Dudas, Emily; Fredrickson, Tammy
Speech perception measures have long been considered an integral piece of the audiological assessment battery. Currently, a prelinguistic, standardized measure of speech perception is missing in the clinical assessment battery for infants and young toddlers. Such a measure would allow systematic assessment of speech perception abilities of infants as well as the potential to investigate the impact early identification of hearing loss and early fitting of amplification have on the auditory pathways. To investigate the impact of sensation level (SL) on the ability of infants with normal hearing (NH) to discriminate /a-i/ and /ba-da/ and to determine if performance on the two contrasts are significantly different in predicting the discrimination criterion. The design was based on a survival analysis model for event occurrence and a repeated measures logistic model for binary outcomes. The outcome for survival analysis was the minimum SL for criterion and the outcome for the logistic regression model was the presence/absence of achieving the criterion. Criterion achievement was designated when an infant's proportion correct score was >0.75 on the discrimination performance task. Twenty-two infants with NH sensitivity participated in this study. There were 9 males and 13 females, aged 6-14 mo. Testing took place over two to three sessions. The first session consisted of a hearing test, threshold assessment of the two speech sounds (/a/ and /i/), and if time and attention allowed, visual reinforcement infant speech discrimination (VRISD). The second session consisted of VRISD assessment for the two test contrasts (/a-i/ and /ba-da/). The presentation level started at 50 dBA. If the infant was unable to successfully achieve criterion (>0.75) at 50 dBA, the presentation level was increased to 70 dBA followed by 60 dBA. Data examination included an event analysis, which provided the probability of criterion distribution across SL. The second stage of the analysis was a
Jennifer K. Bizley
Full Text Available With increasing numbers of children and adults receiving bilateral cochlear implants, there is an urgent need for assessment tools that enable testing of binaural hearing abilities. Current test batteries are either limited in scope or are of an impractical duration for routine testing. Here, we report a behavioral test that enables combined testing of speech identification and spatial discrimination in noise. In this task, multitalker babble was presented from all speakers, and pairs of speech tokens were sequentially presented from two adjacent speakers. Listeners were required to identify both words from a closed set of four possibilities and to determine whether the second token was presented to the left or right of the first. In Experiment 1, normal-hearing adult listeners were tested at 15° intervals throughout the frontal hemifield. Listeners showed highest spatial discrimination performance in and around the frontal midline, with a decline at more eccentric locations. In contrast, speech identification abilities were least accurate near the midline and showed an improvement in performance at more lateral locations. In Experiment 2, normal-hearing listeners were assessed using a restricted range of speaker locations designed to match those found in clinical testing environments. Here, speakers were separated by 15° around the midline and 30° at more lateral locations. This resulted in a similar pattern of behavioral results as in Experiment 1. We conclude, this test offers the potential to assess both spatial discrimination and the ability to use spatial information for unmasking in clinical populations.
Developmental apraxia of speech (DAS) in children is a speech disorder, supposed to have a neurological origin, which is commonly considered to result from particular deficits in speech processing (i.e., phonological planning, motor programming). However, the label DAS has often been used as
Karthikeyan, Ramasamy; Sainarayanan, Gopalakrishnan; Deepa, Subramaniam Nachimuthu
Since usage of digital video is wide spread nowadays, quality considerations have become essential, and industry demand for video quality measurement is rising. This proposal provides a method of perceptual quality assessment in H.264 standard encoder using objective modeling. For this purpose, quality impairments are calculated and a model is developed to compute the perceptual video quality metric based on no reference method. Because of the shuttle difference between the original video and the encoded video the quality of the encoded picture gets degraded, this quality difference is introduced by the encoding process like Intra and Inter prediction. The proposed model takes into account of the artifacts introduced by these spatial and temporal activities in the hybrid block based coding methods and an objective modeling of these artifacts into subjective quality estimation is proposed. The proposed model calculates the objective quality metric using subjective impairments; blockiness, blur and jerkiness compared to the existing bitrate only calculation defined in the ITU G 1070 model. The accuracy of the proposed perceptual video quality metrics is compared against popular full reference objective methods as defined by VQEG.
Elizabeth A. Dinsdale
Full Text Available Integrating information from a range of community members in environmental management provides a more complete assessment of the problem and a diversification of management options, but is difficult to achieve. To investigate the relationship between different environmental interpretations, I compared three distinct measures of anchor damage on coral reefs: ecological measures, perceptual meanings, and subjective health judgments. The ecological measures identified an increase in the number of overturned corals and a reduction in coral cover, the perceptual meanings identified a loss of visual quality, and the health judgments identified a reduction in the health of the coral reef sites associated with high levels of anchoring. Combining the perceptual meanings and health judgments identified that the judgment of environmental health was a key feature that both scientific and lay participants used to describe the environment. Some participants in the survey were familiar with the coral reef environment, and others were not. However, they provided consistent judgment of a healthy coral reef, suggesting that these judgments were not linked to present-day experiences. By combining subjective judgments and ecological measures, the point at which the environment is deemed to lose visual quality was identified; for these coral reefs, if the level of damage rose above 10.3% and the cover of branching corals dropped below 17.1%, the reefs were described as unhealthy. Therefore, by combining the information, a management agency can involve the community in identifying when remedial action is required or when management policies are effectively maintaining a healthy ecosystem.
Full Text Available This paper presents a sociolinguistic assessment of the Darwāzi speech varieties (including Tangshewi based on data collected during a survey conducted between August 31st and September 19th 2008 in the Darwāz area. The research was carried out under the auspices of the International Assistance Mission, a Non-Governmental Organization working in Afghanistan. The goal was to determine whether Dari, one of the two national languages, is adequate to be used in literature and primary school education, or whether the Darwāzi people would benefit from language development, including literature development and primary school education in the vernacular.
Johnson, Cheryl DeConde
Emphasis on classroom listening has gained importance for all children and especially for those with hearing loss and special listening needs. The rationale can be supported from trends in educational placements, the Response to Intervention initiative, student performance and accountability, the role of audition in reading, and improvement in hearing technologies. Speech-language pathologists have an instrumental role advocating for the accommodations that are necessary for effective listening for these children in school. To identify individual listening needs and make relevant recommendations for accommodations, a classroom listening assessment is suggested. Components of the classroom listening assessment include observation, behavioral assessment, self-assessment, and classroom acoustics measurements. Together, with a strong rationale, the results can be used to implement a plan that results in effective classroom listening for these children. Thieme Medical Publishers 333 Seventh Avenue, New York, NY 10001, USA.
Jørgensen, Søren; Cubick, Jens; Dau, Torsten
In the development process of modern telecommunication systems, such as mobile phones, it is common practice to use computer models to objectively evaluate the transmission quality of the system, instead of time-consuming perceptual listening tests. Such models have typically focused on the quality...... of the transmitted speech, while little or no attention has been provided to speech intelligibility. The present study investigated to what extent three state-of-the art speech intelligibility models could predict the intelligibility of noisy speech transmitted through mobile phones. Sentences from the Danish...... Dantale II speech material were mixed with three different kinds of background noise, transmitted through three different mobile phones, and recorded at the receiver via a local network simulator. The speech intelligibility of the transmitted sentences was assessed by six normal-hearing listeners...
This research originated from the need for a speech and language therapy assessment in te reo Maori for a particular child who attended a Maori immersion unit. A Speech and Language Therapy te reo assessment had already been developed but it needed to be revised and normative data collected. Discussions and assessments were carried out in a…
Tatiana D Dubovitskaya
Full Text Available Socio-perceptual attitude is defined as a predisposition of the subjects of communication to perceive, assess, and act in relation to each other in a certain way. The functions of social-perceptual attitudes are social adaptation (utilitarian function, cognitive function, expression and evaluation, and psychological protection. Author’s technique of diagnostics of social-perceptual attitude of a person in relation to other people is presented in the article. The inventory uses the proverbs of the peoples of the world as test material. The results of psychometric testing indicate the reliability and validity of the technique.High scores on the technique results indicate the individual’s characteristics such as willingness to trust, to help, and to notice positive experiences; the ability to see the positive potential; faith in people’s ability to develop and achieve better results; emotional acceptance, benevolence, and empathy.The average scores indicate the subject’s desire for close and trusting relationships, cooperation, sincerity; willingness to understand another person; desire to take into account individual psychological features of other people; contradictions with others are either absent or are resolved constructively.Low scores indicate that, in relation to others, the subject is characterized by the following: suspicion, anticipation of negative attitude to themselves, willingness to see the negative manifestations in the behavior of others, ignoring of their successes and achievements; emotional rejection, criticism, irony, malice; accusations against the others used to justify their own negative actions (aggression against said others.The author’s technique can be used in solution of the problem of psycho-diagnostics and therapy in order to optimize communication and interpersonal relations: achieve mutual understanding and cooperation, develop a constructive mutually beneficial solution, overcome the desire to criticize
Clausen, Marit Carolin; Fox-Boyer, Annette
The identification of speech sounds disorders is an important everyday task for speech and language therapists (SLTs) working with children. Therefore, assessment tools are needed that are able to correctly identify and diagnose a child with a suspected speech disorder and furthermore, that provide...... of the existing speech assessments in Denmark showed that none of the materials fulfilled current recommendations identified in research literature. Therefore, the aim of this paper is to describe the evaluation of a newly constructed instrument for assessing the speech development and disorders of Danish...... with suspected speech disorder (Clausen and Fox-Boyer, in prep). The results indicated that the instrument showed strong inter-examiner reliability for both populations as well as a high content and diagnostic validity. Hence, the study showed that the LogoFoVa can be regarded as a reliable and valid tool...
Cardoso-Leite, Pedro; Waszak, Florian
A briefly flashed target stimulus can become "invisible" when immediately followed by a mask-a phenomenon known as backward masking, which constitutes a major tool in the cognitive sciences. One form of backward masking is termed metacontrast masking. It is generally assumed that in metacontrast masking, the mask suppresses activity on which the conscious perception of the target relies. This assumption biases conclusions when masking is used as a tool-for example, to study the independence between perceptual detection and motor reaction. This is because other models can account for reduced perceptual performance without requiring suppression mechanisms. In this study, we used signal detection theory to test the suppression model against an alternative view of metacontrast masking, referred to as the summation model. This model claims that target- and mask-related activations fuse and that the difficulty in detecting the target results from the difficulty to discriminate this fused response from the response produced by the mask alone. Our data support this alternative view. This study is not a thorough investigation of metacontrast masking. Instead, we wanted to point out that when a different model is used to account for the reduced perceptual performance in metacontrast masking, there is no need to postulate a dissociation between perceptual and motor responses to account for the data. Metacontrast masking, as implemented in the Fehrer-Raab situation, therefore is not a valid method to assess perceptual-motor dissociations.
Chen, Zhipeng; Li, Jingyuan; Ren, Qingyi; Ge, Pingjiang
The objective of this study was to examine the perceptual structure and acoustic characteristics of speech of patients with adductor spasmodic dysphonia (ADSD) in Mandarin. Case-Control Study MATERIALS AND METHODS: For the estimation of dysphonia level, perceptual and acoustic analysis were used for patients with ADSD (N = 20) and the control group (N = 20) that are Mandarin-Chinese speakers. For both subgroups, a sustained vowel and connected speech samples were obtained. The difference of perceptual and acoustic parameters between the two subgroups was assessed and analyzed. For acoustic assessment, the percentage of phonatory breaks (PBs) of connected reading and the percentage of aperiodic segments and frequency shifts (FS) of vowel and reading in patients with ADSD were significantly worse than controls, the mean harmonics-to-noise ratio and the fundamental frequency standard deviation of vowel as well. For perceptual evaluation, the rating of speech and vowel in patients with ADSD are significantly higher than controls. The percentage of aberrant acoustic events (PB, frequency shift, and aperiodic segment) and the fundamental frequency standard deviation and mean harmonics-to-noise ratio were significantly correlated with the perceptual rating in the vowel and reading productions. The perceptual and acoustic parameters of connected vowel and reading in patients with ADSD are worse than those in normal controls, and could validly and reliably estimate dysphonia of ADSD in Mandarin-speaking Chinese. Copyright © 2017 The Voice Foundation. Published by Elsevier Inc. All rights reserved.
Patel, Rupal; Connaghan, Kathryn; Franco, Diana; Edsall, Erika; Forgit, Dory; Olsen, Laura; Ramage, Lianna; Tyler, Emily; Russell, Scott
Purpose: A review of the salient characteristics of motor speech disorders and common assessment protocols revealed the need for a novel reading passage tailored specifically to differentiate between and among the dysarthrias (DYSs) and apraxia of speech (AOS). Method: "The Caterpillar" passage was designed to provide a contemporary, easily read,…
Gillam, Sandra Laing; Ford, Mikenzi Bentley
The current study was designed to examine the relationships between performance on a nonverbal phoneme deletion task administered in a dynamic assessment format with performance on measures of phoneme deletion, word-level reading, and speech sound production that required verbal responses for school-age children with speech sound disorders (SSDs).…
Objective: To present the methodology for speech assessment in the Scandcleft project and discuss issues from a pilot study. Design: Description of methodology and blinded test for speech assessment. Speech samples and instructions for data collection and analysis for comparisons of speech outcomes...... across five included languages were developed and tested. Participants and Materials: Randomly selected video recordings of 10 5-year-old children from each language (n = 50) were included in the project. Speech material consisted of test consonants in single words, connected speech, and syllable chains......-sum and the overall rating of VPC was 78%. Conclusions: Pooling data of speakers of different languages in the same trial and comparing speech outcome across trials seems possible if the assessment of speech concerns consonants and is confined to speech units that are phonetically similar across languages. Agreed...
Mcleod, Sharynne; Baker, Elise
A survey of 231 Australian speech-language pathologists (SLPs) was undertaken to describe practices regarding assessment, analysis, target selection, intervention, and service delivery for children with speech sound disorders (SSD). The participants typically worked in private practice, education, or community health settings and 67.6% had a waiting list for services. For each child, most of the SLPs spent 10-40 min in pre-assessment activities, 30-60 min undertaking face-to-face assessments, and 30-60 min completing paperwork after assessments. During an assessment SLPs typically conducted a parent interview, single-word speech sampling, collected a connected speech sample, and used informal tests. They also determined children's stimulability and estimated intelligibility. With multilingual children, informal assessment procedures and English-only tests were commonly used and SLPs relied on family members or interpreters to assist. Common analysis techniques included determination of phonological processes, substitutions-omissions-distortions-additions (SODA), and phonetic inventory. Participants placed high priority on selecting target sounds that were stimulable, early developing, and in error across all word positions and 60.3% felt very confident or confident selecting an appropriate intervention approach. Eight intervention approaches were frequently used: auditory discrimination, minimal pairs, cued articulation, phonological awareness, traditional articulation therapy, auditory bombardment, Nuffield Centre Dyspraxia Programme, and core vocabulary. Children typically received individual therapy with an SLP in a clinic setting. Parents often observed and participated in sessions and SLPs typically included siblings and grandparents in intervention sessions. Parent training and home programs were more frequently used than the group therapy. Two-thirds kept up-to-date by reading journal articles monthly or every 6 months. There were many similarities with
K. Bettens; Anke Luyten; F. Wuyts; M. de Bodt; K. van Lierde; Y. Maryn
PURPOSE: The Nasality Severity Index 2.0 (NSI 2.0) forms a new, multiparametric approach in the identification of hypernasality. The present study aimed to investigate the correlation between the NSI 2.0 scores and the perceptual assessment of hypernasality. METHOD: Speech samples of 35 patients,
Seyyedeh Maryam khoddami
Full Text Available Background and Aim: Vocal abuse and misuse are the most frequent causes of voice disorders. Consequently some therapy is needed to stop or modify such behaviors. This research was performed to study the effectiveness of vocal hygiene program on perceptual signs of voice in people with dysphonia.Methods: A Vocal hygiene program was performed to 8 adults with dysphonia for 6 weeks. At first, Consensus Auditory- Perceptual Evaluation of Voice was used to assess perceptual signs. Then the program was delivered, Individuals were followed in second and forth weeks visits. In the last session, perceptual assessment was performed and individuals’ opinions were collected. Perceptual findings were compared before and after the therapy.Results: After the program, mean score of perceptual assessment decreased. Mean score of every perceptual sign revealed significant difference before and after the therapy (p≤0.0001. «Loudness» had maximum score and coordination between speech and respiration indicated minimum score. All participants confirmed efficiency of the therapy.Conclusion: The vocal hygiene program improves all perceptual signs of voice although not equally. This deduction is confirmed by both clinician-based and patient-based assessments. As a result, vocal hygiene program is necessary for a comprehensive voice therapy but is not solely effective to resolve all voice problems.
Erdener, Dogu; Burnham, Denis
Despite the body of research on auditory-visual speech perception in infants and schoolchildren, development in the early childhood period remains relatively uncharted. In this study, English-speaking children between three and four years of age were investigated for: (i) the development of visual speech perception--lip-reading and visual…
Haderlein, Tino; Döllinger, Michael; Matoušek, Václav; Nöth, Elmar
Automatic voice assessment is often performed using sustained vowels. In contrast, speech analysis of read-out texts can be applied to voice and speech assessment. Automatic speech recognition and prosodic analysis were used to find regression formulae between automatic and perceptual assessment of four voice and four speech criteria. The regression was trained with 21 men and 62 women (average age 49.2 years) and tested with another set of 24 men and 49 women (48.3 years), all suffering from chronic hoarseness. They read the text 'Der Nordwind und die Sonne' ('The North Wind and the Sun'). Five voice and speech therapists evaluated the data on 5-point Likert scales. Ten prosodic and recognition accuracy measures (features) were identified which describe all the examined criteria. Inter-rater correlation within the expert group was between r = 0.63 for the criterion 'match of breath and sense units' and r = 0.87 for the overall voice quality. Human-machine correlation was between r = 0.40 for the match of breath and sense units and r = 0.82 for intelligibility. The perceptual ratings of different criteria were highly correlated with each other. Likewise, the feature sets modeling the criteria were very similar. The automatic method is suitable for assessing chronic hoarseness in general and for subgroups of functional and organic dysphonia. In its current version, it is almost as reliable as a randomly picked rater from a group of voice and speech therapists.
Jain, Harsha; Rao, Dayashankara; Sharma, Shailender; Gupta, Saurabh
Treatment of the cleft palate has evolved over a long period of time. Various techniques of cleft palate repair that are practiced today are the results of principles learned through many years of modifications. The challenge in the art of modern palatoplasty is no longer successful closure of the cleft palate but an optimal speech outcome without compromising maxillofacial growth. Throughout these periods of evolution in the treatment of cleft palate, the effectiveness of various treatment protocols has been challenged by controversies concerning speech and maxillofacial growth. In this article we have evaluated the results of Pinto's modification of Wardill-Kilner palatoplasty without radical dissection of the levator veli palitini muscle on speech and post-op fistula in two different age groups in 20 patients. Preoperative and 6-month postoperative speech assessment values indicated that two-layer palatoplasty (modified Wardill-Kilner V-Y pushback technique) without an intravelar veloplasty technique was good for speech.
Breuls, M; Sell, D; Manders, E; Boulet, E; Vander Poorten, V
This paper presents an assessment protocol for the evaluation and description of speech, resonance and myofunctional characteristics commonly associated with cleft palate and/or velopharyngeal dysfunction. The protocol is partly based on the GOS.SP.ASS'98 and adapted to Flemish. It focuses on the relevant aspects of cleft type speech necessary to facilitate assessment, adequate diagnosis and management planning in a multi-disciplinary setting of cleft team care.
McMurray, Bob; Jongman, Allard
Most theories of categorization emphasize how continuous perceptual information is mapped to categories. However, equally important are the informational assumptions of a model, the type of information subserving this mapping. This is crucial in speech perception where the signal is variable and context dependent. This study assessed the…
Hearnshaw, Stephanie; Baker, Elise; Munro, Natalie
To investigate whether Australian-English speaking children with and without speech sound disorder (SSD) differ in their overall speech perception accuracy. Additionally, to investigate differences in the perception of specific phonemes and the association between speech perception and speech production skills. Twenty-five Australian-English speaking children aged 48-60 months participated in this study. The SSD group included 12 children and the typically developing (TD) group included 13 children. Children completed routine speech and language assessments in addition to an experimental Australian-English lexical and phonetic judgement task based on Rvachew's Speech Assessment and Interactive Learning System (SAILS) program (Rvachew, 2009). This task included eight words across four word-initial phonemes-/k, ɹ, ʃ, s/. Children with SSD showed significantly poorer perceptual accuracy on the lexical and phonetic judgement task compared with TD peers. The phonemes /ɹ/ and /s/ were most frequently perceived in error across both groups. Additionally, the phoneme /ɹ/ was most commonly produced in error. There was also a positive correlation between overall speech perception and speech production scores. Children with SSD perceived speech less accurately than their typically developing peers. The findings suggest that an Australian-English variation of a lexical and phonetic judgement task similar to the SAILS program is promising and worthy of a larger scale study. Copyright © 2017 Elsevier Inc. All rights reserved.
Seitz, Aaron R
Perceptual learning refers to how experience can change the way we perceive sights, sounds, smells, tastes, and touch. Examples abound: music training improves our ability to discern tones; experience with food and wines can refine our pallet (and unfortunately more quickly empty our wallet), and with years of training radiologists learn to save lives by discerning subtle details of images that escape the notice of untrained viewers. We often take perceptual learning for granted, but it has a profound impact on how we perceive the world. In this Primer, I will explain how perceptual learning is transformative in guiding our perceptual processes, how research into perceptual learning provides insight into fundamental mechanisms of learning and brain processes, and how knowledge of perceptual learning can be used to develop more effective training approaches for those requiring expert perceptual skills or those in need of perceptual rehabilitation (such as individuals with poor vision). I will make a case that perceptual learning is ubiquitous, scientifically interesting, and has substantial practical utility to us all. Copyright © 2017. Published by Elsevier Ltd.
Psillas, George; Psifidis, Anestis; Antoniadou-Hitoglou, Magda; Kouloulas, Athanasios
The purpose of this study was to detect any underlying hearing loss among the healthy pre-school children with speech delay. 76 children, aged from 1 to 5 years, underwent a thorough audiological examination consisting of tympanometry, free field testing, otoacoustic emission recordings and auditory brainstem responses (ABRs). If hearing was normal, then they were evaluated by a child neurologist-psychiatrist. According to our findings, the children were classified into 3 groups; those with normal hearing levels (group I, 52 children, 68.4%), sensorineural hearing loss (group II, 22 children, 28.9%) and conductive hearing loss (group III, 2 children, 2.6%). In group I, speech delay was attributed to pervasive developmental disorder (PDD), which represents high-functioning autistic children (37 cases). Other causes were specific language impairment (SLI)-expressive (3 cases), bilingualism (2 cases), and unknown etiology (10 cases). More than half (59%) of the children diagnosed with PDD evidenced significant language impairment limited to more than two words. Children with SLI-expressive and bilingualism used a maximum of two words. In group II, 13 children suffered from profound hearing loss in both ears, 3 from severe, 3 had profound hearing loss in one ear and severe in the other, 2 from moderate, and 1 had moderate in one ear and severe in the other. No child had mild sensorineural hearing loss. The children with profound hearing loss in at least one ear had total language impairment using no word at all (10 cases), or a maximum of two words (6 cases). When hearing loss was moderate to severe, then the speech vocabulary was confined to several words (more than two words-6 cases). Only two children suffering from conductive hearing loss both presented with complete lack of speech. A great number of healthy pre-school children with speech delay were found to have normal hearing. In this case, the otolaryngologist should be aware of the possible underlying clinical
Sommers, Mitchell S; Tye-Murray, Nancy; Barcroft, Joe; Spehar, Brent P
There has been considerable interest in measuring the perceptual effort required to understand speech, as well as to identify factors that might reduce such effort. In the current study, we investigated whether, in addition to improving speech intelligibility, auditory training also could reduce perceptual or listening effort. Perceptual effort was assessed using a modified version of the n-back memory task in which participants heard lists of words presented without background noise and were asked to continually update their memory of the three most recently presented words. Perceptual effort was indexed by memory for items in the three-back position immediately before, immediately after, and 3 months after participants completed the Computerized Learning Exercises for Aural Rehabilitation (clEAR), a 12-session computerized auditory training program. Immediate posttraining measures of perceptual effort indicated that participants could remember approximately one additional word compared to pretraining. Moreover, some training gains were retained at the 3-month follow-up, as indicated by significantly greater recall for the three-back item at the 3-month measurement than at pretest. There was a small but significant correlation between gains in intelligibility and gains in perceptual effort. The findings are discussed within the framework of a limited-capacity speech perception system.
Steel, Joanne; Ferguson, Alison; Spencer, Elizabeth; Togher, Leanne
To investigate speech pathologists' current practice with adults who are in post-traumatic amnesia (PTA). Speech pathologists with experience of adults in PTA were invited to take part in an online survey through Australian professional email/internet-based interest groups. Forty-five speech pathologists responded to the online survey. The majority of respondents (78%) reported using informal, observational assessment methods commencing at initial contact with people in PTA or when patients' level of alertness allowed and initiating formal assessment on emergence from PTA. Seven respondents (19%) reported undertaking no assessment during PTA. Clinicians described using a range of techniques to monitor cognitive-communication during PTA, including static, dynamic, functional and impairment-based methods. The study confirmed that speech pathologists have a key role in the multidisciplinary team caring for the person in PTA, especially with family education and facilitating interactions with the rehabilitation team and family. Decision-making around timing and means of assessment of cognitive-communication during PTA appeared primarily reliant on speech pathologists' professional experience and the culture of their workplace. The findings support the need for further research into the nature of cognitive-communication disorder and resolution over this period.
Munson, Benjamin; Bjorum, Elissa M.; Windsor, Jennifer
This study examined whether accuracy in producing linguistic stress reliably distinguished between five children with suspected developmental apraxia of speech (sDAS) and five children with phonological disorder (PD). No group differences in the production of stress were found; however, listeners judged that nonword repetitions of the children…
Rusz, Jan; Hlavnička, Jan; Tykalová, Tereza; Bušková, Jitka; Ulmanová, Olga; Růžička, Evžen; Šonka, Karel
Patients with idiopathic rapid eye movement sleep behaviour disorder (RBD) are at substantial risk for developing Parkinson's disease (PD) or related neurodegenerative disorders. Speech is an important indicator of motor function and movement coordination, and therefore may be an extremely sensitive early marker of changes due to prodromal neurodegeneration. Speech data were acquired from 16 RBD subjects and 16 age- and sex-matched healthy control subjects. Objective acoustic assessment of 15 speech dimensions representing various phonatory, articulatory, and prosodic deviations was performed. Statistical models were applied to characterise speech disorders in RBD and to estimate sensitivity and specificity in differentiating between RBD and control subjects. Some form of speech impairment was revealed in 88% of RBD subjects. Articulatory deficits were the most prominent findings in RBD. In comparison to controls, the RBD group showed significant alterations in irregular alternating motion rates (p = 0.009) and articulatory decay (p = 0.01). The combination of four distinctive speech dimensions, including aperiodicity, irregular alternating motion rates, articulatory decay, and dysfluency, led to 96% sensitivity and 79% specificity in discriminating between RBD and control subjects. Speech impairment was significantly more pronounced in RBD subjects with the motor score of the Unified Parkinson's Disease Rating Scale greater than 4 points when compared to other RBD individuals. Simple quantitative speech motor measures may be suitable for the reliable detection of prodromal neurodegeneration in subjects with RBD, and therefore may provide important outcomes for future therapy trials. Copyright © 2015 Elsevier B.V. All rights reserved.
Jacoby, Nori; Ahissar, Merav
In the 1980s to 1990s, studies of perceptual learning focused on the specificity of training to basic visual attributes such as retinal position and orientation. These studies were considered scientifically innovative since they suggested the existence of plasticity in the early stimulus-specific sensory cortex. Twenty years later, perceptual training has gradually shifted to potential applications, and research tends to be devoted to showing transfer. In this paper we analyze two key methodological issues related to the interpretation of transfer. The first has to do with the absence of a control group or the sole use of a test-retest group in traditional perceptual training studies. The second deals with claims of transfer based on the correlation between improvement on the trained and transfer tasks. We analyze examples from the general intelligence literature dealing with the impact on general intelligence of training on a working memory task. The re-analyses show that the reports of a significantly larger transfer of the trained group over the test-retest group fail to replicate when transfer is compared to an actively trained group. Furthermore, the correlations reported in this literature between gains on the trained and transfer tasks can be replicated even when no transfer is assumed.
McAuliffe, Megan J.; Kerr, Sarah E.; Gibson, Elizabeth M. R.; Anderson, Tim; LaShell, Patrick J.
Purpose: To determine how increased vocal loudness and reduced speech rate affect listeners' cognitive-perceptual processing of hypokinetic dysarthric speech associated with Parkinson's disease. Method: Fifty-one healthy listener participants completed a speech perception experiment. Listeners repeated phrases produced by 5 individuals…
The quality of a telecommunication voice service is largely inftuenced by the quality of the transmission system. Nevertheless, the analysis, synthesis and prediction of quality should take into account its multidimensional aspects. Quality can be regarded as a point where the perceived characteristics and the desired or expected ones meet. A schematic is presented which classifies different entities which contribute to the quality of a service, taking into account conversational, user as weIl as service related contributions. Starting from this concept, perceptively relevant constituents of speech communication quality are identified. The perceptive factors result from ele ments of the transmission configuration. A simulation model is developed and implemented which allows the most relevant parameters of traditional trans mission configurations to be manipulated, in real time and for the conversation situation. Inputs into the simulation are instrumentally measurable quality elements commonly used in tra...
Full Text Available Background and Aims: Dysarthria affects linguistic domains such as respiration, phonation, articulation, resonance and prosody due to upper motor neuron, lower motor neuron, cerebellar or extrapyramidal tract lesions. Although Bengali is one of the major languages globally, dysarthric Bengali speech has not been subjected to neurolinguistic analysis. We attempted such an analysis with the goal of identifying the speech defects in native Bengali speakers in various types of dysarthria encountered in neurological disorders. Settings and Design: A cross-sectional observational study was conducted with 66 dysarthric subjects, predominantly middle-aged males, attending the Neuromedicine OPD of a tertiary care teaching hospital in Kolkata. Materials and Methods: After neurological examination, an instrument comprising commonly used Bengali words and a text block covering all Bengali vowels and consonants were used to carry out perceptual analysis of dysarthric speech. From recorded speech, 24 parameters pertaining to five linguistic domains were assessed. The Kruskal-Wallis analysis of variance, Chi-square test and Fisher′s exact test were used for analysis. Results: The dysarthria types were spastic (15 subjects, flaccid (10, mixed (12, hypokinetic (12, hyperkinetic (9 and ataxic (8. Of the 24 parameters assessed, 15 were found to occur in one or more types with a prevalence of at least 25%. Imprecise consonant was the most frequently occurring defect in most dysarthrias. The spectrum of defects in each type was identified. Some parameters were capable of distinguishing between types. Conclusions: This perceptual analysis has defined linguistic defects likely to be encountered in dysarthric Bengali speech in neurological disorders. The speech distortion can be described and distinguished by a limited number of parameters. This may be of importance to the speech therapist and neurologist in planning rehabilitation and further management.
Stenneken, Prisca; Egetemeir, Johanna; Schulte-Körne, Gerd; Müller, Hermann J; Schneider, Werner X; Finke, Kathrin
The cognitive causes as well as the neurological and genetic basis of developmental dyslexia, a complex disorder of written language acquisition, are intensely discussed with regard to multiple-deficit models. Accumulating evidence has revealed dyslexics' impairments in a variety of tasks requiring visual attention. The heterogeneity of these experimental results, however, points to the need for measures that are sufficiently sensitive to differentiate between impaired and preserved attentional components within a unified framework. This first parameter-based group study of attentional components in developmental dyslexia addresses potentially altered attentional components that have recently been associated with parietal dysfunctions in dyslexia. We aimed to isolate the general attentional resources that might underlie reduced span performance, i.e., either a deficient working memory storage capacity, or a slowing in visual perceptual processing speed, or both. Furthermore, by analysing attentional selectivity in dyslexia, we addressed a potential lateralized abnormality of visual attention, i.e., a previously suggested rightward spatial deviation compared to normal readers. We investigated a group of high-achieving young adults with persisting dyslexia and matched normal readers in an experimental whole report and a partial report of briefly presented letter arrays. Possible deviations in the parametric values of the dyslexic compared to the control group were taken as markers for the underlying deficit. The dyslexic group showed a striking reduction in perceptual processing speed (by 26% compared to controls) while their working memory storage capacity was in the normal range. In addition, a spatial deviation of attentional weighting compared to the control group was confirmed in dyslexic readers, which was larger in participants with a more severe dyslexic disorder. In general, the present study supports the relevance of perceptual processing speed in disorders
Hurkmans, Joost; Jonkers, Roel; Boonstra, Anne M.; Stewart, Roy E.; Reinders-Messelink, Heleen A.
Background: The number of reliable and valid instruments to measure the effects of therapy in apraxia of speech (AoS) is limited. Aims: To evaluate the newly developed Modified Diadochokinesis Test (MDT), which is a task to assess the effects of rate and rhythm therapies for AoS in a multiple baseline across behaviours design. Methods: The…
Hurkmans, Joost; Jonkers, Roel; Boonstra, Anne M.; Stewart, Roy E.; Reinders-Messelink, Heleen A.
Background: The number of reliable and valid instruments to measure the effects of therapy in apraxia of speech (AoS) is limited. Aims: To evaluate the newly developed Modified Diadochokinesis Test (MDT), which is a task to assess the effects of rate and rhythm therapies for AoS in a multiple
Ertmer, David J.
Background: Newborn hearing screening, early intervention programs, and advancements in cochlear implant and hearing aid technology have greatly increased opportunities for children with hearing loss to become intelligible talkers. Optimizing speech intelligibility requires that progress be monitored closely. Although direct assessment of…
Constantinescu, Gabriella; Theodoros, Deborah; Russell, Trevor; Ward, Elizabeth; Wilson, Stephen; Wootton, Richard
Background: Patients with Parkinson's disease face numerous access barriers to speech pathology services for appropriate assessment and treatment. Telerehabilitation is a possible solution to this problem, whereby rehabilitation services may be delivered to the patient at a distance, via telecommunication and information technologies. A number of…
McLeod, Sharynne; Verdon, Sarah
The aim of this tutorial is to support speech-language pathologists (SLPs) undertaking assessments of multilingual children with suspected speech sound disorders, particularly children who speak languages that are not shared with their SLP. The tutorial was written by the International Expert Panel on Multilingual Children's Speech, which comprises 46 researchers (SLPs, linguists, phoneticians, and speech scientists) who have worked in 43 countries and used 27 languages in professional practice. Seventeen panel members met for a 1-day workshop to identify key points for inclusion in the tutorial, 26 panel members contributed to writing this tutorial, and 34 members contributed to revising this tutorial online (some members contributed to more than 1 task). This tutorial draws on international research evidence and professional expertise to provide a comprehensive overview of working with multilingual children with suspected speech sound disorders. This overview addresses referral, case history, assessment, analysis, diagnosis, and goal setting and the SLP's cultural competence and preparation for working with interpreters and multicultural support workers and dealing with organizational and government barriers to and facilitators of culturally competent practice. The issues raised in this tutorial are applied in a hypothetical case study of an English-speaking SLP's assessment of a multilingual Cantonese- and English-speaking 4-year-old boy. Resources are listed throughout the tutorial.
Marcelo de Gouveia Sahad
Full Text Available Through a transversal epidemiological study, conducted with 333 Brazilian children, males (157 and females (176, aged 3 to 6 years old, enrolled in a public preschool, this study aimed to evaluate the prevalence of the different types of vertical interincisal trespass (VIT and the relationship between these occlusal aspects and anterior lisping and/or anterior tongue thrust in the articulation of the lingua-alveolar phonemes /t/, /d/, /n/ and /l/. All children involved were submitted to a VIT examination and to a speech evaluation. Statistical significance was analyzed through the Qui-square test, at a significance level of 0.05 (95% confidence limit. The quantitative analysis of the data demonstrated the following prevalences: 1 - the different types of VIT: 48.3% for normal overbite (NO, 22.5% for deep overbite (DO, 9.3% for edge to edge (ETE and 19.8% for open bite (OB; 2 - interdental lisping in relation to the different types of VIT: 42% for NO, 12.5% for DO, 12.5% for ETE, 32.9% for OB; and 3 - children with anterior tongue thrust in the articulation of lingua-alveolar phonemes in relation to the different types of VIT: 42.1% for NO, 14% for DO, 10.5% for ETE, 33.3% for OB. The results demonstrated that there was a significant relationship between open bite and anterior lisping and/or anterior tongue thrust in the articulation of the lingua-alveolar phonemes /t/, /d/, /n/ and /l/; and that there was a significant relationship between deep overbite and the absence of anterior lisping and anterior tongue thrust in the articulation of the lingua-alveolar phonemes.
Strait, Dana L.; Parbery-Clark, Alexandra; Hittner, Emily; Kraus, Nina
For children, learning often occurs in the presence of background noise. As such, there is growing desire to improve a child's access to a target signal in noise. Given adult musicians' perceptual and neural speech-in-noise enhancements, we asked whether similar effects are present in musically-trained children. We assessed the perception and…
Zielinski, S.; Rumsey, F.; Bech, Søren
attempting to “bridge the gap” between the quality assessment methods used in various disciplines are indicated. Prospective challenges faced by researchers in the unification process are outlined. They include development of unified scales, defining unified anchors, integration of objective models......The paper addresses the need to develop unified methods for subjective and objective quality assessment across speech, audio, picture, and multimedia applications. Commonalities and differences between the currently used standards are overviewed. Examples of the already undertaken research...
Lagerberg, Tove B.; Johnels, Jakob Åsberg; Hartelius, Lena; Persson, Christina
Background: The assessment of intelligibility is an essential part of establishing the severity of a speech disorder. The intelligibility of a speaker is affected by a number of different variables relating, "inter alia," to the speech material, the listener and the listener task. Aims: To explore the impact of the number of…
McAllister, Sue; Lincoln, Michelle; Ferguson, Alison; McAllister, Lindy
Workplace-based learning is a critical component of professional preparation in speech pathology. A validated assessment of this learning is seen to be 'the gold standard', but it is difficult to develop because of design and validation issues. These issues include the role and nature of judgement in assessment, challenges in measuring quality, and the relationship between assessment and learning. Valid assessment of workplace-based performance needs to capture the development of competence over time and account for both occupation specific and generic competencies. This paper reviews important conceptual issues in the design of valid and reliable workplace-based assessments of competence including assessment content, process, impact on learning, measurement issues, and validation strategies. It then goes on to share what has been learned about quality assessment and validation of a workplace-based performance assessment using competency-based ratings. The outcomes of a four-year national development and validation of an assessment tool are described. A literature review of issues in conceptualizing, designing, and validating workplace-based assessments was conducted. Key factors to consider in the design of a new tool were identified and built into the cycle of design, trialling, and data analysis in the validation stages of the development process. This paper provides an accessible overview of factors to consider in the design and validation of workplace-based assessment tools. It presents strategies used in the development and national validation of a tool COMPASS, used in an every speech pathology programme in Australia, New Zealand, and Singapore. The paper also describes Rasch analysis, a model-based statistical approach which is useful for establishing validity and reliability of assessment tools. Through careful attention to conceptual and design issues in the development and trialling of workplace-based assessments, it has been possible to develop the
Casper, Maureen A.; Raphael, Lawrence J.; Harris, Katherine S.; Geibel, Jennifer M.
Persons with cerebellar ataxia exhibit changes in physical coordination and speech and voice production. Previously, these alterations of speech and voice production were described primarily via perceptual coordinates. In this study, the spatial-temporal properties of syllable production were examined in 12 speakers, six of whom were healthy…
of the terminology used in the multiparameter Danish Dysphonia Assessment (DDA) approach into the five-parameter GRBAS system. Methods. Voice samples illustrating type and grade of the voice qualities included in DDA were rated by five speech language pathologists using the GRBAS system with the aim of estimating...... terms and antagonists, reflecting muscular hypo- and hyperfunction. Key Words: Auditory-perceptual voice analysis–Dysphonia–GRBAS–Listening test–Voice ratings....
Aggelopoulos, Nikolaos C
Perceptual inference refers to the ability to infer sensory stimuli from predictions that result from internal neural representations built through prior experience. Methods of Bayesian statistical inference and decision theory model cognition adequately by using error sensing either in guiding action or in "generative" models that predict the sensory information. In this framework, perception can be seen as a process qualitatively distinct from sensation, a process of information evaluation using previously acquired and stored representations (memories) that is guided by sensory feedback. The stored representations can be utilised as internal models of sensory stimuli enabling long term associations, for example in operant conditioning. Evidence for perceptual inference is contributed by such phenomena as the cortical co-localisation of object perception with object memory, the response invariance in the responses of some neurons to variations in the stimulus, as well as from situations in which perception can be dissociated from sensation. In the context of perceptual inference, sensory areas of the cerebral cortex that have been facilitated by a priming signal may be regarded as comparators in a closed feedback loop, similar to the better known motor reflexes in the sensorimotor system. The adult cerebral cortex can be regarded as similar to a servomechanism, in using sensory feedback to correct internal models, producing predictions of the outside world on the basis of past experience. Copyright © 2015 Elsevier Ltd. All rights reserved.
Williams, Corinne J; McLeod, Sharynne
Within predominantly English-speaking countries such as the US, UK, Canada, New Zealand, and Australia, there are a significant number of people who speak languages other than English. This study aimed to examine Australian speech-language pathologists' (SLPs) perspectives and experiences of multilingualism, including their assessment and intervention practices, and service delivery methods when working with children who speak languages other than English. A questionnaire was completed by 128 SLPs who attended an SLP seminar about cultural and linguistic diversity. Approximately one half of the SLPs (48.4%) reported that they had at least minimal competence in a language(s) other than English; but only 12 (9.4%) reported that they were proficient in another language. The SLPs spoke a total of 28 languages other than English, the most common being French, Italian, German, Spanish, Mandarin, and Auslan (Australian sign language). Participants reported that they had, in the past 12 months, worked with a mean of 59.2 (range 1-100) children from multilingual backgrounds. These children were reported to speak between two and five languages each; the most common being: Vietnamese, Arabic, Cantonese, Mandarin, Australian Indigenous languages, Tagalog, Greek, and other Chinese languages. There was limited overlap between the languages spoken by the SLPs and the children on the SLPs' caseloads. Many of the SLPs assessed children's speech (50.5%) and/or language (34.2%) without assistance from others (including interpreters). English was the primary language used during assessments and intervention. The majority of SLPs always used informal speech (76.7%) and language (78.2%) assessments and, if standardized tests were used, typically they were in English. The SLPs sought additional information about the children's languages and cultural backgrounds, but indicated that they had limited resources to discriminate between speech and language difference vs disorder.
Full Text Available In patients suffering from head and neck cancer, speech intelligibility is often restricted. For assessment and outcome measurements, automatic speech recognition systems have previously been shown to be appropriate for objective and quick evaluation of intelligibility. In this study we investigate the applicability of the method to speech disorders caused by head and neck cancer. Intelligibility was quantified by speech recognition on recordings of a standard text read by 41 German laryngectomized patients with cancer of the larynx or hypopharynx and 49 German patients who had suffered from oral cancer. The speech recognition provides the percentage of correctly recognized words of a sequence, that is, the word recognition rate. Automatic evaluation was compared to perceptual ratings by a panel of experts and to an age-matched control group. Both patient groups showed significantly lower word recognition rates than the control group. Automatic speech recognition yielded word recognition rates which complied with experts' evaluation of intelligibility on a significant level. Automatic speech recognition serves as a good means with low effort to objectify and quantify the most important aspect of pathologic speech—the intelligibility. The system was successfully applied to voice and speech disorders.
Murdoch, B E; Pitt, G; Theodoros, D G; Ward, E C
The efficacy of traditional and physiological biofeedback methods for modifying abnormal speech breathing patterns was investigated in a child with persistent dysarthria following severe traumatic brain injury (TBI). An A-B-A-B single-subject experimental research design was utilized to provide the subject with two exclusive periods of therapy for speech breathing, based on traditional therapy techniques and physiological biofeedback methods, respectively. Traditional therapy techniques included establishing optimal posture for speech breathing, explanation of the movement of the respiratory muscles, and a hierarchy of non-speech and speech tasks focusing on establishing an appropriate level of sub-glottal air pressure, and improving the subject's control of inhalation and exhalation. The biofeedback phase of therapy utilized variable inductance plethysmography (or Respitrace) to provide real-time, continuous visual biofeedback of ribcage circumference during breathing. As in traditional therapy, a hierarchy of non-speech and speech tasks were devised to improve the subject's control of his respiratory pattern. Throughout the project, the subject's respiratory support for speech was assessed both instrumentally and perceptually. Instrumental assessment included kinematic and spirometric measures, and perceptual assessment included the Frenchay Dysarthria Assessment, Assessment of Intelligibility of Dysarthric Speech, and analysis of a speech sample. The results of the study demonstrated that real-time continuous visual biofeedback techniques for modifying speech breathing patterns were not only effective, but superior to the traditional therapy techniques for modifying abnormal speech breathing patterns in a child with persistent dysarthria following severe TBI. These results show that physiological biofeedback techniques are potentially useful clinical tools for the remediation of speech breathing impairment in the paediatric dysarthric population.
Bele, Irene Velsvik
This study focuses on speaking voice quality in male teachers (n = 35) and male actors (n = 36), who represent untrained and trained voice users, because we wanted to investigate normal and supranormal voices. In this study, both substantial and methodologic aspects were considered. It includes a method for perceptual voice evaluation, and a basic issue was rater reliability. A listening group of 10 listeners, 7 experienced speech-language therapists, and 3 speech-language therapist students evaluated the voices by 15 vocal characteristics using VA scales. Two sets of voice signals were investigated: text reading (2 loudness levels) and sustained vowel (3 levels). The results indicated a high interrater reliability for most perceptual characteristics. Connected speech was evaluated more reliably, especially at the normal level, but both types of voice signals were evaluated reliably, although the reliability for connected speech was somewhat higher than for vowels. Experienced listeners tended to be more consistent in their ratings than did the student raters. Some vocal characteristics achieved acceptable reliability even with a smaller panel of listeners. The perceptual characteristics grouped in 4 factors reflected perceptual dimensions.
Iliadou, Vasiliki Vivian; Chermak, Gail D; Bamiou, Doris-Eva
According to the Diagnostic and Statistical Manual of Mental Disorders, Fifth Edition, diagnosis of speech sound disorder (SSD) requires a determination that it is not the result of other congenital or acquired conditions, including hearing loss or neurological conditions that may present with similar symptomatology. To examine peripheral and central auditory function for the purpose of determining whether a peripheral or central auditory disorder was an underlying factor or contributed to the child's SSD. Central auditory processing disorder clinic pediatric case reports. Three clinical cases are reviewed of children with diagnosed SSD who were referred for audiological evaluation by their speech-language pathologists as a result of slower than expected progress in therapy. Audiological testing revealed auditory deficits involving peripheral auditory function or the central auditory nervous system. These cases demonstrate the importance of increasing awareness among professionals of the need to fully evaluate the auditory system to identify auditory deficits that could contribute to a patient's speech sound (phonological) disorder. Audiological assessment in cases of suspected SSD should not be limited to pure-tone audiometry given its limitations in revealing the full range of peripheral and central auditory deficits, deficits which can compromise treatment of SSD. American Academy of Audiology.
Best, Virginia; Keidser, Gitte; Freeston, Katrina; Buchholz, Jörg M
Many listeners with hearing loss report particular difficulties with multitalker communication situations, but these difficulties are not well predicted using current clinical and laboratory assessment tools. The overall aim of this work is to create new speech tests that capture key aspects of multitalker communication situations and ultimately provide better predictions of real-world communication abilities and the effect of hearing aids. A test of ongoing speech comprehension introduced previously was extended to include naturalistic conversations between multiple talkers as targets, and a reverberant background environment containing competing conversations. In this article, we describe the development of this test and present a validation study. Thirty listeners with normal hearing participated in this study. Speech comprehension was measured for one-, two-, and three-talker passages at three different signal-to-noise ratios (SNRs), and working memory ability was measured using the reading span test. Analyses were conducted to examine passage equivalence, learning effects, and test-retest reliability, and to characterize the effects of number of talkers and SNR. Although we observed differences in difficulty across passages, it was possible to group the passages into four equivalent sets. Using this grouping, we achieved good test-retest reliability and observed no significant learning effects. Comprehension performance was sensitive to the SNR but did not decrease as the number of talkers increased. Individual performance showed associations with age and reading span score. This new dynamic speech comprehension test appears to be valid and suitable for experimental purposes. Further work will explore its utility as a tool for predicting real-world communication ability and hearing aid benefit. American Academy of Audiology.
Lametti, Daniel R.; Rochet-Capellan, Amélie; Neufeld, Emily; Shiller, Douglas M.
Recent studies of human speech motor learning suggest that learning is accompanied by changes in auditory perception. But what drives the perceptual change? Is it a consequence of changes in the motor system? Or is it a result of sensory inflow during learning? Here, subjects participated in a speech motor-learning task involving adaptation to altered auditory feedback and they were subsequently tested for perceptual change. In two separate experiments, involving two different auditory perceptual continua, we show that changes in the speech motor system that accompany learning drive changes in auditory speech perception. Specifically, we obtained changes in speech perception when adaptation to altered auditory feedback led to speech production that fell into the phonetic range of the speech perceptual tests. However, a similar change in perception was not observed when the auditory feedback that subjects' received during learning fell into the phonetic range of the perceptual tests. This indicates that the central motor outflow associated with vocal sensorimotor adaptation drives changes to the perceptual classification of speech sounds. PMID:25080594
Aldous, Kerryn; Tolmie, Rhiannon; Worrall, Linda; Ferguson, Alison
Speech-language pathologists' scope of practice is currently unclear in relation to their contribution to the multi-disciplinary assessment of decision-making capacity for clients with aphasia and related neurogenic communication disorders. The primary aim of the current research study was to investigate the common practices of speech-language pathologists involved in assessments of decision-making capacity. The study was completed through the use of an online survey. There were 51 of 59 respondents who indicated involvement in evaluations of decision-making. Involvement in this kind of assessment was most commonly reported by speech-language pathologists working in inpatient acute and rehabilitation settings. Respondents reported using a variety of formal and informal assessment methods in their contributions to capacity assessment. Discussion with multidisciplinary team members was reported to have the greatest influence on their recommendations. Speech-language pathologists reported that they were dissatisfied with current protocols for capacity assessments in their workplace and indicated they would benefit from further education and training in this area. The findings of this study are discussed in light of their implications for speech-language pathology practice.
in cross-language and second language speech perception research: The mapping issue (the perceptual relationship of sounds of the native and the nonnative language in the mind of the native listener and the L2 learner), the perceptual and learning difficulty/ease issue (how this relationship may or may...... not cause perceptual and learning difficulty), and the plasticity issue (whether and how experience with the nonnative language affects the perceptual organization of speech sounds in the mind of L2 learners). One important general conclusion from this research is that perceptual learning is possible at all...
C. R. C. Cruz et al.
"Biotecnological War" board game, a conceptual and perceptual assessment tool for biotechnology and protein chemistry teaching for undergraduate students in biological sciences and related areas. It is a proposal initially conceived as an alternative complementary tool for biochemistry teaching of proteins and peptides, challenging students, aiming to review concepts transmitted in classroom, stimulating diverse student’s abilities, such as their creativity, competitiveness and resource manag...
Full Text Available Obstructive sleep apnea (OSA is a common sleep disorder characterized by recurring breathing pauses during sleep caused by a blockage of the upper airway (UA. OSA is generally diagnosed through a costly procedure requiring an overnight stay of the patient at the hospital. This has led to proposing less costly procedures based on the analysis of patients’ facial images and voice recordings to help in OSA detection and severity assessment. In this paper we investigate the use of both image and speech processing to estimate the apnea-hypopnea index, AHI (which describes the severity of the condition, over a population of 285 male Spanish subjects suspected to suffer from OSA and referred to a Sleep Disorders Unit. Photographs and voice recordings were collected in a supervised but not highly controlled way trying to test a scenario close to an OSA assessment application running on a mobile device (i.e., smartphones or tablets. Spectral information in speech utterances is modeled by a state-of-the-art low-dimensional acoustic representation, called i-vector. A set of local craniofacial features related to OSA are extracted from images after detecting facial landmarks using Active Appearance Models (AAMs. Support vector regression (SVR is applied on facial features and i-vectors to estimate the AHI.
Haley, Katarina L.; Jacks, Adam; de Riesthal, Michael; Abou-Khalil, Rima; Roth, Heidi L.
Purpose: We explored the reliability and validity of 2 quantitative approaches to document presence and severity of speech properties associated with apraxia of speech (AOS). Method: A motor speech evaluation was administered to 39 individuals with aphasia. Audio-recordings of the evaluation were presented to 3 experienced clinicians to determine…
Wijngaarden, S.J. van; Steeneken, H.J.M.; Houtgast, T.
To deal with the effects of nonnative speech communication on speech intelligibility, one must know the magnitude of these effects. To measure this magnitude, suitable test methods must be available. Many of the methods used in cross-language speech communication research are not very suitable for
Hickok, Gregory; Rogalsky, Corianne; Matchin, William; Basilakos, Alexandra; Cai, Julia; Pillay, Sara; Ferrill, Michelle; Mickelsen, Soren; Anderson, Steven W; Love, Tracy; Binder, Jeffrey; Fridriksson, Julius
Auditory and visual speech information are often strongly integrated resulting in perceptual enhancements for audiovisual (AV) speech over audio alone and sometimes yielding compelling illusory fusion percepts when AV cues are mismatched, the McGurk-MacDonald effect. Previous research has identified three candidate regions thought to be critical for AV speech integration: the posterior superior temporal sulcus (STS), early auditory cortex, and the posterior inferior frontal gyrus. We assess the causal involvement of these regions (and others) in the first large-scale (N = 100) lesion-based study of AV speech integration. Two primary findings emerged. First, behavioral performance and lesion maps for AV enhancement and illusory fusion measures indicate that classic metrics of AV speech integration are not necessarily measuring the same process. Second, lesions involving superior temporal auditory, lateral occipital visual, and multisensory zones in the STS are the most disruptive to AV speech integration. Further, when AV speech integration fails, the nature of the failure-auditory vs visual capture-can be predicted from the location of the lesions. These findings show that AV speech processing is supported by unimodal auditory and visual cortices as well as multimodal regions such as the STS at their boundary. Motor related frontal regions do not appear to play a role in AV speech integration. Copyright © 2018 Elsevier Ltd. All rights reserved.
Full Text Available In most of the world, people have regular exposure to multiple accents. Therefore, learning to quickly process accented speech is a prerequisite to successful communication. In this paper, we examine work on the perception of accented speech across the lifespan, from early infancy to late adulthood. Unfamiliar accents initially impair linguistic processing by infants, children, younger adults, and older adults, but listeners of all ages come to adapt to accented speech. Emergent research also goes beyond these perceptual abilities, by assessing links with production and the relative contributions of linguistic knowledge and general cognitive skills. We conclude by underlining points of convergence across ages, and the gaps left to face in future work.
Mahmoud Y Abu El-ella
Uvulopalatopharyngoplasty (UPPP) is a commonly used surgical technique for oropharyngeal reconstruction in patients with obstructive sleep apnea (OSA). This procedure can be done either through the classic or the laser-assisted uvulopalatopharyngoplasty (LAUP) technique. The purpose of this study was to evaluate the effect of classic UPPP and LAUP on acoustics of voice and speech nasalance, and to compare the effect of each operation on these two domains. Patients and The study included 27 patients with a mean age of 46 years. All patients were diagnosed with OSA based on polysomnographic examination. Patients were divided into two groups according to the type of surgical procedure. Fifteen patients underwent classic UPPP, whereas 12 patients were subjected to LAUP. A full assessment was done for all patients preoperatively and postoperatively, including auditory perceptual assessment (APA) of voice and speech, objective assessment using acoustic voice analysis and nasometry. Auditory perceptual assessment of speech and voice, acoustic analysis of voice and nasometric analysis of speech did not show statistically significant differences between the preoperative and postoperative evaluations in either group (P>.05).The results of this study demonstrated that in patients with OSA, the surgical technique, whether classic UPPP or LAUP, does not have significant effects on the patients' voice quality or their speech outcomes (Author).
Reiter, R; Brosch, S
Demographic data, subjective und objective voice analysis as well as self-assessment of voice quality from applicants for a school of speech therapists were investigated. Demographic data from 116 applicants were collected and their voice quality assessed by three independent judges. An objective evaluation was done by maximum phonation time, average fundamental frequency, dynamic range and percent of jitter and shimmer by means of Goettinger Hoarseness diagram. Self-assessment of voice quality was done by "voice handicap index questionnaire". The twenty successful applicants had a physiological voice in 95 %, they were all musical and had university entrance qualifications. Subjective voice assessment showed in 16 % of the applicants a hoarse voice. In this subgroup an unphysiological vocal use was observed in 72 % and a reduced articulation in 45 %. The objective voice parameters did not show a significant difference between the 3 groups. Self-assessment of the voice was inconspicuous in all applicants. Applicants with general qualification for university entrance, musicality and a physiological voice were more likely to be successful. There were main differences between self assessment of voice and quantitative analysis or subjective assessment by three independent judges.
McAuliffe, Megan J; Kerr, Sarah E; Gibson, Elizabeth M R; Anderson, Tim; LaShell, Patrick J
To determine how increased vocal loudness and reduced speech rate affect listeners' cognitive-perceptual processing of hypokinetic dysarthric speech associated with Parkinson's disease. Fifty-one healthy listener participants completed a speech perception experiment. Listeners repeated phrases produced by 5 individuals with dysarthria across habitual, loud, and slow speaking modes. Listeners were allocated to habitual ( n = 17), loud ( n = 17), or slow ( n = 17) experimental conditions. Transcripts derived from the phrase repetition task were coded for overall accuracy (i.e., intelligibility), and perceptual error analyses examined how these conditions affected listeners' phonemic mapping (i.e., syllable resemblance) and lexical segmentation (i.e., lexical boundary error analysis). Both speech conditions provided obvious perceptual benefits to listeners. Overall, transcript accuracy was highest in the slow condition. In the loud condition, however, improvement was evidenced across the experiment. An error analysis suggested that listeners in the loud condition prioritized acoustic-phonetic cues in their attempts to resolve the degraded signal, whereas those in the slow condition appeared to preferentially weight lexical stress cues. Increased loudness and reduced rate exhibited differential effects on listeners' perceptual processing of dysarthric speech. The current study highlights the insights that may be gained from a cognitive-perceptual approach.
Lalonde, Kaylah; Holt, Rachael Frush
Purpose: This study explored visual speech influence in preschoolers using 3 developmentally appropriate tasks that vary in perceptual difficulty and task demands. They also examined developmental differences in the ability to use visually salient speech cues and visual phonological knowledge. Method: Twelve adults and 27 typically developing 3-…
Full Text Available There is increasing evidence to show that indicators other than socio-cognitive abilities might predict communicative function in Autism Spectrum Disorders (ASD. A potential area of research is the development of speech motor function in toddlers. Utilizing a novel measure called ‘articulatory features’, we assess the abilities of toddlers to produce sounds at different timescales as a metric of their speech motor skills. In the current study, we examined 1 whether speech motor function differed between toddlers with ASD, developmental delay, and typical development; and 2 whether differences in speech motor function are correlated with standard measures of language in toddlers with ASD. Our results revealed significant differences between a subgroup of the ASD population with poor verbal skills, and the other groups for the articulatory features associated with the shortest time scale, namely place of articulation, (p<0.05. We also found significant correlations between articulatory features and language and motor ability as assessed by the Mullen and the Vineland scales for the ASD group. Our findings suggest that articulatory features may be an additional measure of speech motor function that could potentially be useful as an early risk indicator of ASD.
Full Text Available Restoring natural speech in paralyzed and aphasic people could be achieved using a Brain-Computer Interface (BCI controlling a speech synthesizer in real-time. To reach this goal, a prerequisite is to develop a speech synthesizer producing intelligible speech in real-time with a reasonable number of control parameters. We present here an articulatory-based speech synthesizer that can be controlled in real-time for future BCI applications. This synthesizer converts movements of the main speech articulators (tongue, jaw, velum, and lips into intelligible speech. The articulatory-to-acoustic mapping is performed using a deep neural network (DNN trained on electromagnetic articulography (EMA data recorded on a reference speaker synchronously with the produced speech signal. This DNN is then used in both offline and online modes to map the position of sensors glued on different speech articulators into acoustic parameters that are further converted into an audio signal using a vocoder. In offline mode, highly intelligible speech could be obtained as assessed by perceptual evaluation performed by 12 listeners. Then, to anticipate future BCI applications, we further assessed the real-time control of the synthesizer by both the reference speaker and new speakers, in a closed-loop paradigm using EMA data recorded in real time. A short calibration period was used to compensate for differences in sensor positions and articulatory differences between new speakers and the reference speaker. We found that real-time synthesis of vowels and consonants was possible with good intelligibility. In conclusion, these results open to future speech BCI applications using such articulatory-based speech synthesizer.
Murray, Elizabeth; McCabe, Patricia; Heard, Robert; Ballard, Kirrie J.
Purpose: The gold standard for diagnosing childhood apraxia of speech (CAS) is expert judgment of perceptual features. The aim of this study was to identify a set of objective measures that differentiate CAS from other speech disorders. Method: Seventy-two children (4-12 years of age) diagnosed with suspected CAS by community speech-language…
May, Tobias; Dau, Torsten
Recent studies on computational speech segregation reported improved speech intelligibility in noise when estimating and applying an ideal binary mask with supervised learning algorithms. However, an important requirement for such systems in technical applications is their robustness to acoustic...... associated with perceptual attributes in speech segregation. The results could help establish a framework for a systematic evaluation of future segregation systems....
Eckert, Mark A.; Matthews, Lois J.; Dubno, Judy R.
Purpose: Even older adults with relatively mild hearing loss report hearing handicap, suggesting that hearing handicap is not completely explained by reduced speech audibility. Method: We examined the extent to which self-assessed ratings of hearing handicap using the Hearing Handicap Inventory for the Elderly (HHIE; Ventry & Weinstein, 1982)…
Machiel Zwarts; Johanna Kalf; Bastiaan Bloem; George Borm; Marten Munneke; Bert de Swart
To report on the development and psychometric evaluation of the Radboud Oral Motor Inventory for Parkinson's Disease (ROMP), a newly developed patient-rated assessment of speech, swallowing, and saliva control in patients with Parkinson's disease (PD). To evaluate reproducibility, 60 patients
Schmetz, Emilie; Rousselle, Laurence; Ballaz, Cécile; Detraux, Jean-Jacques; Barisnikov, Koviljka
This study aims to examine the different levels of visual perceptual object recognition (early, intermediate, and late) defined in Humphreys and Riddoch's model as well as basic visual spatial processing in children using a new test battery (BEVPS). It focuses on the age sensitivity, internal coherence, theoretical validity, and convergent validity of this battery. French-speaking, typically developing children (n = 179; 5 to 14 years) were assessed using 15 new computerized subtests. After selecting the most age-sensitive tasks though ceiling effect and correlation analyses, an exploratory factorial analysis was run with the 12 remaining subtests to examine the BEVPS' theoretical validity. Three separate factors were identified for the assessment of the stimuli's basic features (F1, four subtests), view-dependent and -independent object representations (F2, six subtests), and basic visual spatial processing (F3, two subtests). Convergent validity analyses revealed positive correlations between F1 and F2 and the Beery-VMI visual perception subtest, while no such correlations were found for F3. Children's performances progressed until the age of 9-10 years in F1 and in view-independent representations (F2), and until 11-12 years in view-dependent representations (F2). However, no progression with age was observed in F3. Moreover, the selected subtests, present good-to-excellent internal consistency, which indicates that they provide reliable measures for the assessment of visual perceptual processing abilities in children.
Shriberg, Lawrence D.; Paul, Rhea; Black, Lois M.; van Santen, Jan P.
In a sample of 46 children aged 4 to 7 years with Autism Spectrum Disorder (ASD) and intelligible speech, there was no statistical support for the hypothesis of concomitant Childhood Apraxia of Speech (CAS). Perceptual and acoustic measures of participants’ speech, prosody, and voice were compared with data from 40 typically-developing children, 13 preschool children with Speech Delay, and 15 participants aged 5 to 49 years with CAS in neurogenetic disorders. Speech Delay and Speech Errors, r...
Konig, Alexandra; Satt, Aharon; Sorin, Alex; Hoory, Ran; Derreumaux, Alexandre; David, Renaud; Robert, Phillippe H
Various types of dementia and Mild Cognitive Impairment (MCI) are manifested as irregularities in human speech and language, which have proven to be strong predictors for the disease presence and progress ion. Therefore, automatic speech analytics provided by a mobile application may be a useful tool in providing additional indicators for assessment and detection of early stage dementia and MCI. 165 participants (subjects with subjective cognitive impairment (SCI), MCI patients, Alzheimer's disease (AD) and mixed dementia (MD) patients) were recorded with a mobile application while performing several short vocal cognitive tasks during a regular consultation. These tasks included verbal fluency, picture description, counting down and a free speech task. The voice recordings were processed in two steps: in the first step, vocal markers were extracted using speech signal processing techniques; in the second, the vocal markers were tested to assess their 'power' to distinguish between SCI, MCI, AD and MD. The second step included training automatic classifiers for detecting MCI and AD, based on machine learning methods, and testing the detection accuracy. The fluency and free speech tasks obtain the highest accuracy rates of classifying AD vs. MD vs. MCI vs. SCI. Using the data, we demonstrated classification accuracy as follows: SCI vs. AD = 92% accuracy; SCI vs. MD = 92% accuracy; SCI vs. MCI = 86% accuracy and MCI vs. AD = 86%. Our results indicate the potential value of vocal analytics and the use of a mobile application for accurate automatic differentiation between SCI, MCI and AD. This tool can provide the clinician with meaningful information for assessment and monitoring of people with MCI and AD based on a non-invasive, simple and low-cost method. Copyright© Bentham Science Publishers; For any queries, please email at email@example.com.
Schalling, Ellika; Hartelius, Lena
Spinocerebellar ataxias (SCAs) are a heterogeneous group of autosomal dominant cerebellar ataxias clinically characterized by progressive ataxia, dysarthria and a range of other concomitant neurological symptoms. Only a few studies include detailed characterization of speech symptoms in SCA. Speech symptoms in SCA resemble ataxic dysarthria but symptoms related to phonation may be more prominent. One study to date has shown an association between differences in speech and voice symptoms related to genotype. More studies of speech and voice phenotypes are motivated, to possibly aid in clinical diagnosis. In addition, instrumental speech analysis has been demonstrated to be a reliable measure that may be used to monitor disease progression or therapy outcomes in possible future pharmacological treatments. Intervention by speech and language pathologists should go beyond assessment. Clinical guidelines for management of speech, communication and swallowing need to be developed for individuals with progressive cerebellar ataxia. Copyright © 2013 Elsevier Inc. All rights reserved.
Guddattu, Vasudeva; Krishna, Y.
The speech produced by human vocal tract is a complex acoustic signal, with diverse applications in phonetics, speech synthesis, automatic speech recognition, speaker identification, communication aids, speech pathology, speech perception, machine translation, hearing research, rehabilitation and assessment of communication disorders and many…
Bettens, Kim; De Bodt, Marc; Maryn, Youri; Luyten, Anke; Wuyts, Floris L; Van Lierde, Kristiane M
The Nasality Severity Index 2.0 (NSI 2.0) forms a new, multiparametric approach in the identification of hypernasality. The present study aimed to investigate the correlation between the NSI 2.0 scores and the perceptual assessment of hypernasality. Speech samples of 35 patients, representing a range of nasality from normal to severely hypernasal, were rated by four expert speech-language pathologists using visual analogue scaling (VAS) judging the degree of hypernasality, audible nasal airflow (ANA) and speech intelligibility. Inter- and intra-listener reliability was verified using intraclass correlation coefficients. Correlations between NSI 2.0 scores and its parameters (i.e. nasalance score of an oral text and vowel /u/, voice low tone to high tone ratio of the vowel /i/) and the degree of hypernasality were determined using Pearson correlation coefficients. Multiple linear regression analysis was used to investigate the possible influence of ANA and speech intelligibility on the NSI 2.0 scores. Overall good to excellent inter- and intra-listener reliability was found for the perceptual ratings. A moderate, but significant negative correlation between NSI 2.0 scores and perceived hypernasality (r=-0.64) was found, in which a more negative NSI 2.0 score indicates the presence of more severe hypernasality. No significant influence of ANA or intelligibility on the NSI 2.0 was observed based on the regression analysis. Because the NSI 2.0 correlates significantly with perceived hypernasality, it provides an easy-to-interpret severity score of hypernasality which will facilitate the evaluation of therapy outcomes, communication to the patient and other clinicians, and decisions for treatment planning, based on a multiparametric approach. However, research is still necessary to further explore the instrumental correlates of perceived hypernasality. The reader will be able to (1) describe and discuss current issues and influencing variables regarding perceptual
... Staying Safe Videos for Educators Search English Español Speech Problems KidsHealth / For Teens / Speech Problems What's in ... a person's ability to speak clearly. Some Common Speech and Language Disorders Stuttering is a problem that ...
Bhuskute, Aditi; Skirko, Jonathan R; Roth, Christina; Bayoumi, Ahmed; Durbin-Johnson, Blythe; Tollefson, Travis T
Patients with cleft palate and other causes of velopharyngeal insufficiency (VPI) suffer adverse effects on social interactions and communication. Measurement of these patient-reported outcomes is needed to help guide surgical and nonsurgical care. To further validate the VPI Effects on Life Outcomes (VELO) instrument, measure the change in quality of life (QOL) after speech surgery, and test the association of change in speech with change in QOL. Prospective descriptive cohort including children and young adults undergoing speech surgery for VPI in a tertiary academic center. Participants completed the validated VELO instrument before and after surgical treatment. The main outcome measures were preoperative and postoperative VELO scores and the perceptual speech assessment of speech intelligibility. The VELO scores are divided into subscale domains. Changes in VELO after surgery were analyzed using linear regression models. VELO scores were analyzed as a function of speech intelligibility adjusting for age and cleft type. The correlation between speech intelligibility rating and VELO scores was estimated using the polyserial correlation. Twenty-nine patients (13 males and 16 females) were included. Mean (SD) age was 7.9 (4.1) years (range, 4-20 years). Pharyngeal flap was used in 14 (48%) cases, Furlow palatoplasty in 12 (41%), and sphincter pharyngoplasty in 1 (3%). The mean (SD) preoperative speech intelligibility rating was 1.71 (1.08), which decreased postoperatively to 0.79 (0.93) in 24 patients who completed protocol (P Speech Intelligibility was correlated with preoperative and postoperative total VELO score (P speech intelligibility. Speech surgery improves VPI-specific quality of life. We confirmed validation in a population of untreated patients with VPI and included pharyngeal flap surgery, which had not previously been included in validation studies. The VELO instrument provides patient-specific outcomes, which allows a broader understanding of the
Zheng, Yingjun; Wu, Chao; Li, Juanhua; Li, Ruikeng; Peng, Hongjun; She, Shenglin; Ning, Yuping; Li, Liang
Speech recognition under noisy "cocktail-party" environments involves multiple perceptual/cognitive processes, including target detection, selective attention, irrelevant signal inhibition, sensory/working memory, and speech production. Compared to health listeners, people with schizophrenia are more vulnerable to masking stimuli and perform worse in speech recognition under speech-on-speech masking conditions. Although the schizophrenia-related speech-recognition impairment under "cocktail-party" conditions is associated with deficits of various perceptual/cognitive processes, it is crucial to know whether the brain substrates critically underlying speech detection against informational speech masking are impaired in people with schizophrenia. Using functional magnetic resonance imaging (fMRI), this study investigated differences between people with schizophrenia (n = 19, mean age = 33 ± 10 years) and their matched healthy controls (n = 15, mean age = 30 ± 9 years) in intra-network functional connectivity (FC) specifically associated with target-speech detection under speech-on-speech-masking conditions. The target-speech detection performance under the speech-on-speech-masking condition in participants with schizophrenia was significantly worse than that in matched healthy participants (healthy controls). Moreover, in healthy controls, but not participants with schizophrenia, the strength of intra-network FC within the bilateral caudate was positively correlated with the speech-detection performance under the speech-masking conditions. Compared to controls, patients showed altered spatial activity pattern and decreased intra-network FC in the caudate. In people with schizophrenia, the declined speech-detection performance under speech-on-speech masking conditions is associated with reduced intra-caudate functional connectivity, which normally contributes to detecting target speech against speech masking via its functions of suppressing masking-speech signals.
Carbonell, Kathy M.
One of the lasting concerns in audiology is the unexplained individual differences in speech perception performance even for individuals with similar audiograms. One proposal is that there are cognitive/perceptual individual differences underlying this vulnerability and that these differences are present in normal hearing (NH) individuals but do not reveal themselves in studies that use clear speech produced in quiet (because of a ceiling effect). However, previous studies have failed to uncover cognitive/perceptual variables that explain much of the variance in NH performance on more challenging degraded speech tasks. This lack of strong correlations may be due to either examining the wrong measures (e.g., working memory capacity) or to there being no reliable differences in degraded speech performance in NH listeners (i.e., variability in performance is due to measurement noise). The proposed project has 3 aims; the first, is to establish whether there are reliable individual differences in degraded speech performance for NH listeners that are sustained both across degradation types (speech in noise, compressed speech, noise-vocoded speech) and across multiple testing sessions. The second aim is to establish whether there are reliable differences in NH listeners' ability to adapt their phonetic categories based on short-term statistics both across tasks and across sessions; and finally, to determine whether performance on degraded speech perception tasks are correlated with performance on phonetic adaptability tasks, thus establishing a possible explanatory variable for individual differences in speech perception for NH and hearing impaired listeners.
The E-model brings a modern approach to the computation of estimated quality, allowing for easy implementation. One of its advantages is that it can be applied in real time. The method is based on a mathematical computation model evaluating transmission path impairments influencing speech signal, especially delays and packet losses. These parameters, common in an IP network, can affect speech quality dramatically. The paper deals with a proposal for a simplified E-model and its pr...
Stark, Brielle C; Geva, Sharon; Warburton, Elizabeth A
Relatively preserved inner speech alongside poor overt speech has been documented in some persons with aphasia (PWA), but the relationship of overt speech with inner speech is still largely unclear, as few studies have directly investigated these factors. The present study investigates the relationship of relatively preserved inner speech in aphasia with selected measures of language and cognition. Thirty-eight persons with chronic aphasia (27 men, 11 women; average age 64.53 ± 13.29 years, time since stroke 8-111 months) were classified as having relatively preserved inner and overt speech (n = 21), relatively preserved inner speech with poor overt speech (n = 8), or not classified due to insufficient measurements of inner and/or overt speech (n = 9). Inner speech scores (by group) were correlated with selected measures of language and cognition from the Comprehensive Aphasia Test (Swinburn, Porter, & Al, 2004). The group with poor overt speech showed a significant relationship of inner speech with overt naming (r = .95, p speech and language and cognition factors were not significant for the group with relatively good overt speech. As in previous research, we show that relatively preserved inner speech is found alongside otherwise severe production deficits in PWA. PWA with poor overt speech may rely more on preserved inner speech for overt picture naming (perhaps due to shared resources with verbal working memory) and for written picture description (perhaps due to reliance on inner speech due to perceived task difficulty). Assessments of inner speech may be useful as a standard component of aphasia screening, and therapy focused on improving and using inner speech may prove clinically worthwhile. https://doi.org/10.23641/asha.5303542.
Millena Maria Ramalho Matta Vieira
Full Text Available The CD-ROM "Voice Assessment: Speech-Language Pathology and Audiology & Medicine" was developed as a teaching tool for people interested in the production of the spoken or sung human voice. Its content comprises several subjects concerning the anatomy and physiology of spoken and sung voice. A careful assessment becomes necessary in order to ensure the effectiveness of teaching and learning educational materials, whether related to education or health, within the proposal of education mediated by technology. OBJECTIVE: This study aimed to evaluate the efficacy of the Virtual Man Project's CD-ROM "Voice Assessment: Speech-Language Pathology and Audiology & Medicine", as a self-learning material, in two different populations: Speech-Language Pathology and Audiology students and Lyrical Singing students. The participants were instructed to study the CD-ROM during 1 month and answer two questionnaires: one before and another one after studying the CD-ROM. The quantitative results were compared statistically by the Student's t-test at a significance level of 5%. RESULTS: Seventeen out of the 28 students who completed the study, were Speech-Language Pathology and Audiology students, while 11 were Lyrical Singing students (dropout rate of 44%. Comparison of the answers to the questionnaires before and after studying the CD-ROM showed a statistically significant increase of the scores for the questionnaire applied after studying the CD-ROM for both Speech-Language Pathology and Audiology and Lyrical Singing students, with p<0.001 and p<0.004, respectively. There was also a statistically significant difference in all topics of this questionnaire for both groups of students. CONCLUSION: The results concerning the evaluation of the Speech-Language Pathology and Audiology and Lyrical Singing students' knowledge before and after learning from the CD-ROM allowed concluding that the participants made significant improvement in their knowledge of the proposed
Strelcyk, Olaf; Dau, Torsten
Hearing-impaired people often experience great difficulty with speech communication when background noise is present, even if reduced audibility has been compensated for. Other impairment factors must be involved. In order to minimize confounding effects, the subjects participating in this study...... consisted of groups with homogeneous, symmetric audiograms. The perceptual listening experiments assessed the intelligibility of full-spectrum as well as low-pass filtered speech in the presence of stationary and fluctuating interferers, the individual's frequency selectivity and the integrity of temporal...... modulation were obtained. In addition, these binaural and monaural thresholds were measured in a stationary background noise in order to assess the persistence of the fine-structure processing to interfering noise. Apart from elevated speech reception thresholds, the hearing impaired listeners showed poorer...
Studer-Eichenberger, Esther; Studer-Eichenberger, Felix; Koenig, Thomas
The objectives of the present study were to investigate temporal/spectral sound-feature processing in preschool children (4 to 7 years old) with peripheral hearing loss compared with age-matched controls. The results verified the presence of statistical learning, which was diminished in children with hearing impairments (HIs), and elucidated possible perceptual mediators of speech production. Perception and production of the syllables /ba/, /da/, /ta/, and /na/ were recorded in 13 children with normal hearing and 13 children with HI. Perception was assessed physiologically through event-related potentials (ERPs) recorded by EEG in a multifeature mismatch negativity paradigm and behaviorally through a discrimination task. Temporal and spectral features of the ERPs during speech perception were analyzed, and speech production was quantitatively evaluated using speech motor maximum performance tasks. Proximal to stimulus onset, children with HI displayed a difference in map topography, indicating diminished statistical learning. In later ERP components, children with HI exhibited reduced amplitudes in the N2 and early parts of the late disciminative negativity components specifically, which are associated with temporal and spectral control mechanisms. Abnormalities of speech perception were only subtly reflected in speech production, as the lone difference found in speech production studies was a mild delay in regulating speech intensity. In addition to previously reported deficits of sound-feature discriminations, the present study results reflect diminished statistical learning in children with HI, which plays an early and important, but so far neglected, role in phonological processing. Furthermore, the lack of corresponding behavioral abnormalities in speech production implies that impaired perceptual capacities do not necessarily translate into productive deficits.
Selouani, Sid-Ahmed; Sidi Yakoub, Mohammed; O'Shaughnessy, Douglas
Assistive speech-enabled systems are proposed to help both French and English speaking persons with various speech disorders. The proposed assistive systems use automatic speech recognition (ASR) and speech synthesis in order to enhance the quality of communication. These systems aim at improving the intelligibility of pathologic speech making it as natural as possible and close to the original voice of the speaker. The resynthesized utterances use new basic units, a new concatenating algorithm and a grafting technique to correct the poorly pronounced phonemes. The ASR responses are uttered by the new speech synthesis system in order to convey an intelligible message to listeners. Experiments involving four American speakers with severe dysarthria and two Acadian French speakers with sound substitution disorders (SSDs) are carried out to demonstrate the efficiency of the proposed methods. An improvement of the Perceptual Evaluation of the Speech Quality (PESQ) value of 5% and more than 20% is achieved by the speech synthesis systems that deal with SSD and dysarthria, respectively.
Russell, Ginny; Miller, Laura L; Ford, Tamsin; Golding, Jean
Retrospective recall about children's symptoms is used to establish early developmental patterns in clinical practice and is also utilised in child psychopathology research. Some studies have indicated that the accuracy of retrospective recall is influenced by life events. Our hypothesis was that an intervention: speech and language therapy, would adversely affect the accuracy of parent recall of early concerns about their child's speech and language development. Mothers (n = 5,390) reported on their child's speech development (child male to female ratio = 50:50) when their children were aged 18 or 30 months, and also reported on these early concerns retrospectively, 10 years later, when their children were 13 years old. Overall reliability of retrospective recall was good, 86 % of respondents accurately recalling their earlier concerns. As hypothesised, however, the speech and language intervention was strongly associated with inaccurate retrospective recall about concerns in the early years (Relative Risk Ratio = 19.03; 95 % CI:14.78-24.48). Attendance at speech therapy was associated with increased recall of concerns that were not reported at the time. The study suggests caution is required when interpreting retrospective reports of abnormal child development as recall may be influenced by intervening events.
Power, Maxine; Laasch, Hans-Ulrich; Kasthuri, Ram S.; Nicholson, David A.; Hamdy, Shaheen
Videofluoroscopy (VF) is the 'gold standard' assessment for oropharyngeal dysphagia and radiographers are beginning to direct this examination independently, yet little is known about the roles and responsibilities of the core professions of radiology and speech and language therapy and their practice in this examination. Aim: To evaluate VF practice and identify the roles and responsibilities of radiology and speech and language therapy personnel. Materials and methods: A questionnaire was developed and distributed to speech and language therapists (SALT) and radiologists via national special interest networks. Information regarding protocols, test materials, supervision, radiation protection and training was obtained. Results: One hundred and thirteen questionnaires were completed, 83% of respondents had more than 5 years service. Most were carrying out VF on an 'ad hoc' basis with only 32% participating in more than 6 assessments per month. There was no consensus on protocol and 41% chose to thicken barium solutions by adding more barium sulphate powder, potentially predisposing patients to complications. Over 50% of SALTs had received one day post-graduate training in VF, whereas, only one radiologist had specific VF training. Conclusion: Despite its importance in determining the feeding route for patients, VF is carried out infrequently by most clinicians and protocols vary widely. Moreover, intra- and inter-disciplinary training and supervision is minimal. More work is needed to develop standard guidelines, to improve the quality of the examination and its reproducibility
Rogalsky, Corianne; Love, Tracy; Driscoll, David; Anderson, Steven W.; Hickok, Gregory
The discovery of mirror neurons in macaque has led to a resurrection of motor theories of speech perception. Although the majority of lesion and functional imaging studies have associated perception with the temporal lobes, it has also been proposed that the ‘human mirror system’, which prominently includes Broca’s area, is the neurophysiological substrate of speech perception. Although numerous studies have demonstrated a tight link between sensory and motor speech processes, few have directly assessed the critical prediction of mirror neuron theories of speech perception, namely that damage to the human mirror system should cause severe deficits in speech perception. The present study measured speech perception abilities of patients with lesions involving motor regions in the left posterior frontal lobe and/or inferior parietal lobule (i.e., the proposed human ‘mirror system’). Performance was at or near ceiling in patients with fronto-parietal lesions. It is only when the lesion encroaches on auditory regions in the temporal lobe that perceptual deficits are evident. This suggests that ‘mirror system’ damage does not disrupt speech perception, but rather that auditory systems are the primary substrate for speech perception. PMID:21207313
Jerry D. Gibson
Full Text Available Speech compression is a key technology underlying digital cellular communications, VoIP, voicemail, and voice response systems. We trace the evolution of speech coding based on the linear prediction model, highlight the key milestones in speech coding, and outline the structures of the most important speech coding standards. Current challenges, future research directions, fundamental limits on performance, and the critical open problem of speech coding for emergency first responders are all discussed.
Vogel, Adam P; Poole, Matthew L; Pemberton, Hugh; Caverlé, Marja W J; Boonstra, Frederique M C; Low, Essie; Darby, David; Brodtmann, Amy
To provide a comprehensive description of motor speech function in behavioral variant frontotemporal dementia (bvFTD). Forty-eight individuals (24 bvFTD and 24 age- and sex-matched healthy controls) provided speech samples. These varied in complexity and thus cognitive demand. Their language was assessed using the Progressive Aphasia Language Scale and verbal fluency tasks. Speech was analyzed perceptually to describe the nature of deficits and acoustically to quantify differences between patients with bvFTD and healthy controls. Cortical thickness and subcortical volume derived from MRI scans were correlated with speech outcomes in patients with bvFTD. Speech of affected individuals was significantly different from that of healthy controls. The speech signature of patients with bvFTD is characterized by a reduced rate (75%) and accuracy (65%) on alternating syllable production tasks, and prosodic deficits including reduced speech rate (45%), prolonged intervals (54%), and use of short phrases (41%). Groups differed on acoustic measures derived from the reading, unprepared monologue, and diadochokinetic tasks but not the days of the week or sustained vowel tasks. Variability of silence length was associated with cortical thickness of the inferior frontal gyrus and insula and speech rate with the precentral gyrus. One in 8 patients presented with moderate speech timing deficits with a further two-thirds rated as mild or subclinical. Subtle but measurable deficits in prosody are common in bvFTD and should be considered during disease management. Language function correlated with speech timing measures derived from the unprepared monologue only. © 2017 American Academy of Neurology.
Pedersen, Søren Nygaard
The research presented in this PhD thesis has focused on a perceptual approach to robust design. The results of the research and the original contribution to knowledge is a preliminary framework for understanding, positioning, and applying perceptual robust design. Product quality is a topic...... been presented. Therefore, this study set out to contribute to the understanding and application of perceptual robust design. To achieve this, a state-of-the-art and current practice review was performed. From the review two main research problems were identified. Firstly, a lack of tools...... for perceptual robustness was found to overlap with the optimum for functional robustness and at most approximately 2.2% out of the 14.74% could be ascribed solely to the perceptual robustness optimisation. In conclusion, the thesis have offered a new perspective on robust design by merging robust design...
Pries, Lotta-Katrin; Guloksuz, Sinan; Menne-Lothmann, Claudia; Decoster, Jeroen; van Winkel, Ruud; Collip, Dina; Delespaul, Philippe; De Hert, Marc; Derom, Catherine; Thiery, Evert; Jacobs, Nele; Wichers, Marieke; Simons, Claudia J P; Rutten, Bart P F; van Os, Jim
An association between white noise speech illusion and psychotic symptoms has been reported in patients and their relatives. This supports the theory that bottom-up and top-down perceptual processes are involved in the mechanisms underlying perceptual abnormalities. However, findings in nonclinical populations have been conflicting. The aim of this study was to examine the association between white noise speech illusion and subclinical expression of psychotic symptoms in a nonclinical sample. Findings were compared to previous results to investigate potential methodology dependent differences. In a general population adolescent and young adult twin sample (n = 704), the association between white noise speech illusion and subclinical psychotic experiences, using the Structured Interview for Schizotypy-Revised (SIS-R) and the Community Assessment of Psychic Experiences (CAPE), was analyzed using multilevel logistic regression analyses. Perception of any white noise speech illusion was not associated with either positive or negative schizotypy in the general population twin sample, using the method by Galdos et al. (2011) (positive: ORadjusted: 0.82, 95% CI: 0.6-1.12, p = 0.217; negative: ORadjusted: 0.75, 95% CI: 0.56-1.02, p = 0.065) and the method by Catalan et al. (2014) (positive: ORadjusted: 1.11, 95% CI: 0.79-1.57, p = 0.557). No association was found between CAPE scores and speech illusion (ORadjusted: 1.25, 95% CI: 0.88-1.79, p = 0.220). For the Catalan et al. (2014) but not the Galdos et al. (2011) method, a negative association was apparent between positive schizotypy and speech illusion with positive or negative affective valence (ORadjusted: 0.44, 95% CI: 0.24-0.81, p = 0.008). Contrary to findings in clinical populations, white noise speech illusion may not be associated with psychosis proneness in nonclinical populations.
Nguyen, Duong Duy; Kenny, Dianna T
Muscle tension dysphonia (MTD) is a hyperfunctional voice disorder commonly seen in professional voice users. To date, published acoustic studies of this disorder have mainly focused on nontonal language speakers, and no publication has documented its impact on lexical tone characteristics. In this study, we examined whether and how this voice disorder affected acoustically and perceptually the characteristics of tones in Vietnamese teachers. Voice data were obtained from 42 Vietnamese female primary school teachers diagnosed with MTD and 30 vocally healthy teachers. Tonal data were analyzed using Computerized Speech Lab (CSL-4300B) and Speech Analyzer. Parameters analyzed included the two most important acoustic cues in Vietnamese tones, that is, tonal fundamental frequency (F(0)) and laryngealization. Tonal F(0) was assessed using a factorial analysis of variance with group and career durations as independent variables. Tonal samples were also perceptually assessed by a panel of native speakers of the same dialect. The results showed that MTD lowered tonal F(0) in high tones and tones with extensive fundamental frequency variation. There was also a significant main effect for career duration; in MTD group, tonal F(0) was lower in teachers with longer career duration. The teachers with MTD showed different patterns of laryngealization compared with the control group. Tone perception was poorer for tones with extensive fundamental frequency variation and without a typical phonation type. The results in this group of teachers supported our hypothesis that MTD impairs lexical tone phonation.
There is a lack of agreement on the features used to differentiate Childhood Apraxia of Speech (CAS) from Phonological Disorders (PD). One criterion which has gained consensus is lexical inconsistency of speech (ASHA, 2007); however, no accepted measure of this feature has been defined. Although lexical assessment provides information about consistency of an item across repeated trials, it may not capture the magnitude of inconsistency within an item. In contrast, segmental analysis provides more extensive information about consistency of phoneme usage across multiple contexts and word-positions. The current research compared segmental and lexical inconsistency metrics in preschool-aged children with PD, CAS, and typical development (TD) to determine how inconsistency varies with age in typical and disordered speakers, and whether CAS and PD were differentiated equally well by both assessment levels. Whereas lexical and segmental analyses may be influenced by listener characteristics or speaker intelligibility, the acoustic signal is less vulnerable to these factors. In addition, the acoustic signal may reveal information which is not evident in the perceptual signal. A second focus of the current research was motivated by Blumstein et al.'s (1980) classic study on voice onset time (VOT) in adults with acquired apraxia of speech (AOS) which demonstrated a motor impairment underlying AOS. In the current study, VOT analyses were conducted to determine the relationship between age and group with the voicing distribution for bilabial and alveolar plosives. Findings revealed that 3-year-olds evidenced significantly higher inconsistency than 5-year-olds; segmental inconsistency approached 0% in 5-year-olds with TD, whereas it persisted in children with PD and CAS suggesting that for child in this age-range, inconsistency is a feature of speech disorder rather than typical development (Holm et al., 2007). Likewise, whereas segmental and lexical inconsistency were
Hossain, Mohammad E.; Jassim, Wissam A.; Zilany, Muhammad S. A.
Sensorineural hearing loss occurs due to damage to the inner and outer hair cells of the peripheral auditory system. Hearing loss can cause decreases in audibility, dynamic range, frequency and temporal resolution of the auditory system, and all of these effects are known to affect speech intelligibility. In this study, a new reference-free speech intelligibility metric is proposed using 2-D neurograms constructed from the output of a computational model of the auditory periphery. The responses of the auditory-nerve fibers with a wide range of characteristic frequencies were simulated to construct neurograms. The features of the neurograms were extracted using third-order statistics referred to as bispectrum. The phase coupling of neurogram bispectrum provides a unique insight for the presence (or deficit) of supra-threshold nonlinearities beyond audibility for listeners with normal hearing (or hearing loss). The speech intelligibility scores predicted by the proposed method were compared to the behavioral scores for listeners with normal hearing and hearing loss both in quiet and under noisy background conditions. The results were also compared to the performance of some existing methods. The predicted results showed a good fit with a small error suggesting that the subjective scores can be estimated reliably using the proposed neural-response-based metric. The proposed metric also had a wide dynamic range, and the predicted scores were well-separated as a function of hearing loss. The proposed metric successfully captures the effects of hearing loss and supra-threshold nonlinearities on speech intelligibility. This metric could be applied to evaluate the performance of various speech-processing algorithms designed for hearing aids and cochlear implants. PMID:26967160
The VDE system developed had the capability of recognizing up to 248 separate words in syntactic structures. 4 The two systems described are isolated...AND SPEAKER RECOGNITION by M.J.Hunt 5 ASSESSMENT OF SPEECH SYSTEMS ’ ..- * . by R.K.Moore 6 A SURVEY OF CURRENT EQUIPMENT AND RESEARCH’ by J.S.Bridle...TECHNOLOGY IN NAVY TRAINING SYSTEMS by R.Breaux, M.Blind and R.Lynchard 10 9 I-I GENERAL REVIEW OF MILITARY APPLICATIONS OF VOICE PROCESSING DR. BRUNO
The field of motor speech disorders in Greek is substantially underresearched. Additionally, acoustic studies on lexical stress in dysarthria are generally very rare (Kim et al. 2010). This dissertation examined the acoustic and perceptual effects of Greek dysarthria focusing on lexical stress. Additional possibly deviant speech characteristics were acoustically analyzed. Data from three dysarthric participants and matched controls was analyzed using a case study design. The analysis of lexical stress was based on data drawn from a single word repetition task that included pairs of disyllabic words differentiated by stress location. This data was acoustically analyzed in terms of the use of the acoustic cues for Greek stress. The ability of the dysarthric participants to signal stress in single words was further assessed in a stress identification task carried out by 14 naive Greek listeners. Overall, the acoustic and perceptual data indicated that, although all three dysarthric speakers presented with some difficulty in the patterning of stressed and unstressed syllables, each had different underlying problems that gave rise to quite distinct patterns of deviant speech characteristics. The atypical use of lexical stress cues in Anna's data obscured the prominence relations of stressed and unstressed syllables to the extent that the position of lexical stress was usually not perceptually transparent. Chris and Maria on the other hand, did not have marked difficulties signaling lexical stress location, although listeners were not 100% successful in the stress identification task. For the most part, Chris' atypical phonation patterns and Maria's very slow rate of speech did not interfere with lexical stress signaling. The acoustic analysis of the lexical stress cues was generally in agreement with the participants' performance in the stress identification task. Interestingly, in all three dysarthric participants, but more so in Anna, targets stressed on the 1st
Choi, Lark Kwon; You, Jaehee; Bovik, Alan Conrad
We propose a referenceless perceptual fog density prediction model based on natural scene statistics (NSS) and fog aware statistical features. The proposed model, called Fog Aware Density Evaluator (FADE), predicts the visibility of a foggy scene from a single image without reference to a corresponding fog-free image, without dependence on salient objects in a scene, without side geographical camera information, without estimating a depth-dependent transmission map, and without training on human-rated judgments. FADE only makes use of measurable deviations from statistical regularities observed in natural foggy and fog-free images. Fog aware statistical features that define the perceptual fog density index derive from a space domain NSS model and the observed characteristics of foggy images. FADE not only predicts perceptual fog density for the entire image, but also provides a local fog density index for each patch. The predicted fog density using FADE correlates well with human judgments of fog density taken in a subjective study on a large foggy image database. As applications, FADE not only accurately assesses the performance of defogging algorithms designed to enhance the visibility of foggy images, but also is well suited for image defogging. A new FADE-based referenceless perceptual image defogging, dubbed DEnsity of Fog Assessment-based DEfogger (DEFADE) achieves better results for darker, denser foggy images as well as on standard foggy images than the state of the art defogging methods. A software release of FADE and DEFADE is available online for public use: http://live.ece.utexas.edu/research/fog/index.html.
Rählmann, Sebastian; Meis, Markus; Schulte, Michael; Kießling, Jürgen; Walger, Martin; Meister, Hartmut
Model-based hearing aid development considers the assessment of speech recognition using a master hearing aid (MHA). It is known that aided speech recognition in noise is related to cognitive factors such as working memory capacity (WMC). This relationship might be mediated by hearing aid experience (HAE). The aim of this study was to examine the relationship of WMC and speech recognition with a MHA for listeners with different HAE. Using the MHA, unaided and aided 80% speech recognition thresholds in noise were determined. Individual WMC capacity was assed using the Verbal Learning and Memory Test (VLMT) and the Reading Span Test (RST). Forty-nine hearing aid users with mild to moderate sensorineural hearing loss divided into three groups differing in HAE. Whereas unaided speech recognition did not show a significant relationship with WMC, a significant correlation could be observed between WMC and aided speech recognition. However, this only applied to listeners with HAE of up to approximately three years, and a consistent weakening of the correlation could be observed with more experience. Speech recognition scores obtained in acute experiments with an MHA are less influenced by individual cognitive capacity when experienced HA users are taken into account.
Strait, Dana L; Parbery-Clark, Alexandra; Hittner, Emily; Kraus, Nina
For children, learning often occurs in the presence of background noise. As such, there is growing desire to improve a child's access to a target signal in noise. Given adult musicians' perceptual and neural speech-in-noise enhancements, we asked whether similar effects are present in musically-trained children. We assessed the perception and subcortical processing of speech in noise and related cognitive abilities in musician and nonmusician children that were matched for a variety of overarching factors. Outcomes reveal that musicians' advantages for processing speech in noise are present during pivotal developmental years. Supported by correlations between auditory working memory and attention and auditory brainstem response properties, we propose that musicians' perceptual and neural enhancements are driven in a top-down manner by strengthened cognitive abilities with training. Our results may be considered by professionals involved in the remediation of language-based learning deficits, which are often characterized by poor speech perception in noise. Copyright © 2012 Elsevier Inc. All rights reserved.
Mekonnen, Abebayehu Messele
This article presents a case study of speech production in a 14-year-old Amharic-speaking boy. The boy had developed secondary macroglossia, related to a disturbance of growth hormones, following a history of normal speech development. Perceptual analysis combined with acoustic analysis and static palatography is used to investigate the specific…
The general topic addressed by this dissertation is that of bilingualism, and more specifically, the topic of bilingual acquisition of speech sounds. The central question in this study is the following: does bilingualism affect children’s perceptual development of speech sounds? The term bilingual
Full Text Available Speech recognition is about what is being said, irrespective of who is saying. Speech recognition is a growing field. Major progress is taking place on the technology of automatic speech recognition (ASR. Still, there are lots of barriers in this field in terms of recognition rate, background noise, speaker variability, speaking rate, accent etc. Speech recognition rate mainly depends on the selection of features and feature extraction methods. This paper outlines the feature extraction techniques for speaker dependent speech recognition for isolated words. A brief survey of different feature extraction techniques like Mel-Frequency Cepstral Coefficients (MFCC, Linear Predictive Coding Coefficients (LPCC, Perceptual Linear Prediction (PLP, Relative Spectra Perceptual linear Predictive (RASTA-PLP analysis are presented and evaluation is done. Speech recognition has various applications from daily use to commercial use. We have made a speaker dependent system and this system can be useful in many areas like controlling a patient vehicle using simple commands.
Borges, Ana Filipa Teixeira; Giraud, Anne Lise; Mansvelder, Huibert D.; Linkenkaer-Hansen, Klaus
Speech comprehension is preserved up to a threefold acceleration, but deteriorates rapidly at higher speeds. Current models posit that perceptual resilience to accelerated speech is limited by the brain’s ability to parse speech into syllabic units using δ/θ oscillations. Here, we investigated
Shriberg, Lawrence D.; Paul, Rhea; Black, Lois M.; van Santen, Jan P.
In a sample of 46 children aged 4-7 years with Autism Spectrum Disorder (ASD) and intelligible speech, there was no statistical support for the hypothesis of concomitant Childhood Apraxia of Speech (CAS). Perceptual and acoustic measures of participants' speech, prosody, and voice were compared with data from 40 typically-developing children, 13…
Hasse Jørgensen, Stina
About Speech Matters - Katarina Gregos, the Greek curator's exhibition at the Danish Pavillion, the Venice Biannual 2011.......About Speech Matters - Katarina Gregos, the Greek curator's exhibition at the Danish Pavillion, the Venice Biannual 2011....
.... Attention may affect the perceived clarity of visual displays and improve performance. In this project, a powerful external noise method was developed to identify and characterize the effect of attention on perceptual performance in visual tasks...
.... Attention may affect the perceived clarity of visual displays and improve performance. In this project, a powerful external noise method was developed to identify and characterize the effect of attention on perceptual performance in visual tasks...
Consumer Guide Speech to Speech Relay Service Speech-to-Speech (STS) is one form of Telecommunications Relay Service (TRS). TRS is a service that allows persons with hearing and speech disabilities ...
Tomala, Karel; Voznak, Miroslav; Partila, Pavol; Rezac, Filip; Safarik, Jakub
The paper discusses the use of Discrete Wavelet Transform (DWT) and Stationary Wavelet Transform (SWT) wavelet in removing noise from voice samples and evaluation of its impact on speech quality. One significant part of Quality of Service (QoS) in communication technology is the speech quality assessment. However, this part is seriously overlooked as telecommunication providers often focus on increasing network capacity, expansion of services offered and their enforcement in the market. Among the fundamental factors affecting the transmission properties of the communication chain is noise, either at the transmitter or the receiver side. A wavelet transform (WT) is a modern tool for signal processing. One of the most significant areas in which wavelet transforms are used is applications designed to suppress noise in signals. To remove noise from the voice sample in our experiment, we used the reference segment of the voice which was distorted by Gaussian white noise. An evaluation of the impact on speech quality was carried out by an intrusive objective algorithm Perceptual Evaluation of Speech Quality (PESQ). DWT and SWT transformation was applied to voice samples that were devalued by Gaussian white noise. Afterwards, we determined the effectiveness of DWT and SWT by means of objective algorithm PESQ. The decisive criterion for determining the quality of a voice sample once the noise had been removed was Mean Opinion Score (MOS) which we obtained in PESQ. The contribution of this work lies in the evaluation of efficiency of wavelet transformation to suppress noise in voice samples.
Davis, Matthew H.; Coleman, Martin R.; Absalom, Anthony R.; Rodd, Jennifer M.; Johnsrude, Ingrid S.; Matta, Basil F.; Owen, Adrian M.; Menon, David K.
We used functional MRI and the anesthetic agent propofol to assess the relationship among neural responses to speech, successful comprehension, and conscious awareness. Volunteers were scanned while listening to sentences containing ambiguous words, matched sentences without ambiguous words, and signal-correlated noise (SCN). During three scanning sessions, participants were nonsedated (awake), lightly sedated (a slowed response to conversation), and deeply sedated (no conversational response, rousable by loud command). Bilateral temporal-lobe responses for sentences compared with signal-correlated noise were observed at all three levels of sedation, although prefrontal and premotor responses to speech were absent at the deepest level of sedation. Additional inferior frontal and posterior temporal responses to ambiguous sentences provide a neural correlate of semantic processes critical for comprehending sentences containing ambiguous words. However, this additional response was absent during light sedation, suggesting a marked impairment of sentence comprehension. A significant decline in postscan recognition memory for sentences also suggests that sedation impaired encoding of sentences into memory, with left inferior frontal and temporal lobe responses during light sedation predicting subsequent recognition memory. These findings suggest a graded degradation of cognitive function in response to sedation such that “higher-level” semantic and mnemonic processes can be impaired at relatively low levels of sedation, whereas perceptual processing of speech remains resilient even during deep sedation. These results have important implications for understanding the relationship between speech comprehension and awareness in the healthy brain in patients receiving sedation and in patients with disorders of consciousness. PMID:17938125
Abdel-Aziz, Mosaad; Khalifa, Badawy; Shawky, Ahmed; Rashed, Mohammed; Naguib, Nader; Abdel-Hameed, Asmaa
Adenoid hypertrophy may play a role in velopharyngeal closure especially in patients with palatal abnormality; adenoidectomy may lead to velopharyngeal insufficiency and hyper nasal speech. Patients with cleft palate even after repair should not undergo adenoidectomy unless absolutely needed, and in such situations, conservative or partial adenoidectomy is performed to avoid the occurrence of velopharyngeal insufficiency. Trans-oral endoscopic adenoidectomy enables the surgeon to inspect the velopharyngeal valve during the procedure. The aim of this study was to assess the effect of transoral endoscopic partial adenoidectomy on the speech of children with repaired cleft palate. Twenty children with repaired cleft palate underwent transoral endoscopic partial adenoidectomy to relieve their airway obstruction. The procedure was completely visualized with the use of a 70° 4mm nasal endoscope; the upper part of the adenoid was removed using adenoid curette and St. Claire Thompson forceps, while the lower part was retained to maintain the velopharyngeal competence. Preoperative and postoperative evaluation of speech was performed, subjectively by auditory perceptual assessment, and objectively by nasometric assessment. Speech was not adversely affected after surgery. The difference between preoperative and postoperative auditory perceptual assessment and nasalance scores for nasal and oral sentences was insignificant (p=0.231, 0.442, 0.118 respectively). Transoral endoscopic partial adenoidectomy is a safe method; it does not worsen the speech of repaired cleft palate patients. It enables the surgeon to strictly inspect the velopharyngeal valve during the procedure with better determination of the adenoidal part that may contribute in velopharyngeal closure. Copyright © 2015 Associação Brasileira de Otorrinolaringologia e Cirurgia Cérvico-Facial. Published by Elsevier Editora Ltda. All rights reserved.
... Health Info » Voice, Speech, and Language Apraxia of Speech On this page: What is apraxia of speech? ... about apraxia of speech? What is apraxia of speech? Apraxia of speech (AOS)—also known as acquired ...
Borggreven, PA; Verdonck-de Leeuw, [No Value; Langendijk, JA; Doornaert, P; Koster, MN; de Bree, R; Leemans, R
Background. The aim of the study was to analyze speech outcome for patients with advanced oral/oropharyngeal cancer treated with reconstructive surgery and adjuvant radiotherapy. Methods. Speech tests (communicative suitability, intelligibility, articulation, nasality, and consonant errors) were
Normand, Alice; Autin, Frédérique; Croizet, Jean-Claude
Perceptual load has been found to be a powerful bottom-up determinant of distractibility, with high perceptual load preventing distraction by any irrelevant information. However, when under evaluative pressure, individuals exert top-down attentional control by giving greater weight to task-relevant features, making them more distractible from task-relevant distractors. One study tested whether the top-down modulation of attention under evaluative pressure overcomes the beneficial bottom-up effect of high perceptual load on distraction. Using a response-competition task, we replicated previous findings that high levels of perceptual load suppress task-relevant distractor response interference, but only for participants in a control condition. Participants under evaluative pressure (i.e., who believed their intelligence was assessed) showed interference from task-relevant distractor at all levels of perceptual load. This research challenges the assumptions of the perceptual load theory and sheds light on a neglected determinant of distractibility: the self-relevance of the performance situation in which attentional control is solicited.
This CD is multimedia presentation of programme safety upgrading of Bohunice V1 NPP. This chapter consist of introductory commentary and 4 introductory speeches (video records): (1) Introductory speech of Vincent Pillar, Board chairman and director general of Slovak electric, Plc. (SE); (2) Introductory speech of Stefan Schmidt, director of SE - Bohunice Nuclear power plants; (3) Introductory speech of Jan Korec, Board chairman and director general of VUJE Trnava, Inc. - Engineering, Design and Research Organisation, Trnava; Introductory speech of Dietrich Kuschel, Senior vice-president of FRAMATOME ANP Project and Engineering
Full Text Available Introduction: Speech recognition in adverse listening conditions becomes more difficult as we age, particularly for individuals with age-related hearing loss (ARHL. Whether these difficulties can be eased with training remains debated, because it is not clear whether the outcomes are sufficiently general to be of use outside of the training context. The aim of the current study was to compare training-induced learning and generalization between normal-hearing older adults and those with ARHL.Methods: 56 listeners (60-72 y/o, 35 participants with ARHL and 21 normal hearing adults participated in the study. The study design was a cross over design with three groups (immediate-training, delayed-training and no-training group. Trained participants received 13 sessions of home-based auditory training over the course of 4 weeks. Three adverse listening conditions were targeted: (1 Speech-in-noise (2 time compressed speech and (3 competing speakers, and the outcomes of training were compared between normal and ARHL groups. Pre- and post-test sessions were completed by all participants. Outcome measures included tests on all of the trained conditions as well as on a series of untrained conditions designed to assess the transfer of learning to other speech and non-speech conditions. Results: Significant improvements on all trained conditions were observed in both ARHL and normal-hearing groups over the course of training. Normal hearing participants learned more than participants with ARHL in the speech-in-noise condition, but showed similar patterns of learning in the other conditions. Greater pre- to post-test changes were observed in trained than in untrained listeners on all trained conditions. In addition, the ability of trained listeners from the ARHL group to discriminate minimally different pseudowords in noise also improved with training. Conclusions: ARHL did not preclude auditory perceptual learning but there was little generalization to
Frtusova, Jana B; Phillips, Natalie A
This study examined the effect of auditory-visual (AV) speech stimuli on working memory in older adults with poorer-hearing (PH) in comparison to age- and education-matched older adults with better hearing (BH). Participants completed a working memory n-back task (0- to 2-back) in which sequences of digits were presented in visual-only (i.e., speech-reading), auditory-only (A-only), and AV conditions. Auditory event-related potentials (ERP) were collected to assess the relationship between perceptual and working memory processing. The behavioral results showed that both groups were faster in the AV condition in comparison to the unisensory conditions. The ERP data showed perceptual facilitation in the AV condition, in the form of reduced amplitudes and latencies of the auditory N1 and/or P1 components, in the PH group. Furthermore, a working memory ERP component, the P3, peaked earlier for both groups in the AV condition compared to the A-only condition. In general, the PH group showed a more robust AV benefit; however, the BH group showed a dose-response relationship between perceptual facilitation and working memory improvement, especially for facilitation of processing speed. Two measures, reaction time and P3 amplitude, suggested that the presence of visual speech cues may have helped the PH group to counteract the demanding auditory processing, to the level that no group differences were evident during the AV modality despite lower performance during the A-only condition. Overall, this study provides support for the theory of an integrated perceptual-cognitive system. The practical significance of these findings is also discussed.
Jana B. Frtusova
Full Text Available This study examined the effect of auditory-visual (AV speech stimuli on working memory in hearing impaired participants (HIP in comparison to age- and education-matched normal elderly controls (NEC. Participants completed a working memory n-back task (0- to 2-back in which sequences of digits were presented in visual-only (i.e., speech-reading, auditory-only (A-only, and AV conditions. Auditory event-related potentials (ERP were collected to assess the relationship between perceptual and working memory processing. The behavioural results showed that both groups were faster in the AV condition in comparison to the unisensory conditions. The ERP data showed perceptual facilitation in the AV condition, in the form of reduced amplitudes and latencies of the auditory N1 and/or P1 components, in the HIP group. Furthermore, a working memory ERP component, the P3, peaked earlier for both groups in the AV condition compared to the A-only condition. In general, the HIP group showed a more robust AV benefit; however, the NECs showed a dose-response relationship between perceptual facilitation and working memory improvement, especially for facilitation of processing speed. Two measures, reaction time and P3 amplitude, suggested that the presence of visual speech cues may have helped the HIP to counteract the demanding auditory processing, to the level that no group differences were evident during the AV modality despite lower performance during the A-only condition. Overall, this study provides support for the theory of an integrated perceptual-cognitive system. The practical significance of these findings is also discussed.
Grandison, Alexandra; Sowden, Paul T; Drivonikou, Vicky G; Notman, Leslie A; Alexander, Iona; Davies, Ian R L
Perceptual learning involves an improvement in perceptual judgment with practice, which is often specific to stimulus or task factors. Perceptual learning has been shown on a range of visual tasks but very little research has explored chromatic perceptual learning. Here, we use two low level perceptual threshold tasks and a supra-threshold target detection task to assess chromatic perceptual learning and category effects. Experiment 1 investigates whether chromatic thresholds reduce as a result of training and at what level of analysis learning effects occur. Experiment 2 explores the effect of category training on chromatic thresholds, whether training of this nature is category specific and whether it can induce categorical responding. Experiment 3 investigates the effect of category training on a higher level, lateralized target detection task, previously found to be sensitive to category effects. The findings indicate that performance on a perceptual threshold task improves following training but improvements do not transfer across retinal location or hue. Therefore, chromatic perceptual learning is category specific and can occur at relatively early stages of visual analysis. Additionally, category training does not induce category effects on a low level perceptual threshold task, as indicated by comparable discrimination thresholds at the newly learned hue boundary and adjacent test points. However, category training does induce emerging category effects on a supra-threshold target detection task. Whilst chromatic perceptual learning is possible, learnt category effects appear to be a product of left hemisphere processing, and may require the input of higher level linguistic coding processes in order to manifest.
Ullrich, Dieter; Ullrich, Katja; Marten, Magret
Background: In Lower Saxony, Germany, pre-school children with language- and speech-deficits have the opportunity to access kindergartens with integrated language-/speech therapy prior to attending primary school, both regular or with integrated speech therapy. It is unknown whether these early childhood education treatments are helpful and…
Bruderer, Alison G; Danielson, D Kyle; Kandhadai, Padmapriya; Werker, Janet F
The influence of speech production on speech perception is well established in adults. However, because adults have a long history of both perceiving and producing speech, the extent to which the perception-production linkage is due to experience is unknown. We addressed this issue by asking whether articulatory configurations can influence infants' speech perception performance. To eliminate influences from specific linguistic experience, we studied preverbal, 6-mo-old infants and tested the discrimination of a nonnative, and hence never-before-experienced, speech sound distinction. In three experimental studies, we used teething toys to control the position and movement of the tongue tip while the infants listened to the speech sounds. Using ultrasound imaging technology, we verified that the teething toys consistently and effectively constrained the movement and positioning of infants' tongues. With a looking-time procedure, we found that temporarily restraining infants' articulators impeded their discrimination of a nonnative consonant contrast but only when the relevant articulator was selectively restrained to prevent the movements associated with producing those sounds. Our results provide striking evidence that even before infants speak their first words and without specific listening experience, sensorimotor information from the articulators influences speech perception. These results transform theories of speech perception by suggesting that even at the initial stages of development, oral-motor movements influence speech sound discrimination. Moreover, an experimentally induced "impairment" in articulator movement can compromise speech perception performance, raising the question of whether long-term oral-motor impairments may impact perceptual development.
Full Text Available Human sensory systems allow individuals to see, hear, touch, and interact with the surrounding physical environment. Understanding human perception and its limit enables us to better exploit the psychophysics of human perceptual systems to design more efficient, adaptive algorithms and develop perceptually-inspired computational models. In this talk, I will survey some of recent efforts on perceptually-inspired computing with applications to crowd simulation and multimodal interaction. In particular, I will present data-driven personality modeling based on the results of user studies, example-guided physics-based sound synthesis using auditory perception, as well as perceptually-inspired simplification for multimodal interaction. These perceptually guided principles can be used to accelerating multi-modal interaction and visual computing, thereby creating more natural human-computer interaction and providing more immersive experiences. I will also present their use in interactive applications for entertainment, such as video games, computer animation, and shared social experience. I will conclude by discussing possible future research directions.
Cavanaugh, Lisa A; MacInnis, Deborah J; Weiss, Allen M
Individuals often describe objects in their world in terms of perceptual dimensions that span a variety of modalities; the visual (e.g., brightness: dark-bright), the auditory (e.g., loudness: quiet-loud), the gustatory (e.g., taste: sour-sweet), the tactile (e.g., hardness: soft vs. hard) and the kinaesthetic (e.g., speed: slow-fast). We ask whether individuals use perceptual dimensions to differentiate emotions from one another. Participants in two studies (one where respondents reported on abstract emotion concepts and a second where they reported on specific emotion episodes) rated the extent to which features anchoring 29 perceptual dimensions (e.g., temperature, texture and taste) are associated with 8 emotions (anger, fear, sadness, guilt, contentment, gratitude, pride and excitement). Results revealed that in both studies perceptual dimensions differentiate positive from negative emotions and high arousal from low arousal emotions. They also differentiate among emotions that are similar in arousal and valence (e.g., high arousal negative emotions such as anger and fear). Specific features that anchor particular perceptual dimensions (e.g., hot vs. cold) are also differentially associated with emotions.
Asakawa, Kaori; Tanaka, Akihiro; Imai, Hisato
We investigated whether audiovisual synchrony perception for speech could change after observation of the audiovisual temporal mismatch. Previous studies have revealed that audiovisual synchrony perception is re-calibrated after exposure to a constant timing difference between auditory and visual signals in non-speech. In the present study, we examined whether this audiovisual temporal recalibration occurs at the perceptual level even for speech (monosyllables). In Experiment 1, participants performed an audiovisual simultaneity judgment task (i.e., a direct measurement of the audiovisual synchrony perception) in terms of the speech signal after observation of the speech stimuli which had a constant audiovisual lag. The results showed that the “simultaneous” responses (i.e., proportion of responses for which participants judged the auditory and visual stimuli to be synchronous) at least partly depended on exposure lag. In Experiment 2, we adopted the McGurk identification task (i.e., an indirect measurement of the audiovisual synchrony perception) to exclude the possibility that this modulation of synchrony perception was solely attributable to the response strategy using stimuli identical to those of Experiment 1. The characteristics of the McGurk effect reported by participants depended on exposure lag. Thus, it was shown that audiovisual synchrony perception for speech could be modulated following exposure to constant lag both in direct and indirect measurement. Our results suggest that temporal recalibration occurs not only in non-speech signals but also in monosyllabic speech at the perceptual level.
... speech intelligibility. Speech intelligibility for signals generated by an acoustic microphone, a throat microphone, and the two microphones together was assessed using the Modified Rhyme Test (MRT...
Yellamsetty, Anusha; Bidelman, Gavin M
Parsing simultaneous speech requires listeners use pitch-guided segregation which can be affected by the signal-to-noise ratio (SNR) in the auditory scene. The interaction of these two cues may occur at multiple levels within the cortex. The aims of the current study were to assess the correspondence between oscillatory brain rhythms and determine how listeners exploit pitch and SNR cues to successfully segregate concurrent speech. We recorded electrical brain activity while participants heard double-vowel stimuli whose fundamental frequencies (F0s) differed by zero or four semitones (STs) presented in either clean or noise-degraded (+5 dB SNR) conditions. We found that behavioral identification was more accurate for vowel mixtures with larger pitch separations but F0 benefit interacted with noise. Time-frequency analysis decomposed the EEG into different spectrotemporal frequency bands. Low-frequency (θ, β) responses were elevated when speech did not contain pitch cues (0ST > 4ST) or was noisy, suggesting a correlate of increased listening effort and/or memory demands. Contrastively, γ power increments were observed for changes in both pitch (0ST > 4ST) and SNR (clean > noise), suggesting high-frequency bands carry information related to acoustic features and the quality of speech representations. Brain-behavior associations corroborated these effects; modulations in low-frequency rhythms predicted the speed of listeners' perceptual decisions with higher bands predicting identification accuracy. Results are consistent with the notion that neural oscillations reflect both automatic (pre-perceptual) and controlled (post-perceptual) mechanisms of speech processing that are largely divisible into high- and low-frequency bands of human brain rhythms. Copyright © 2018 Elsevier B.V. All rights reserved.
Denise Botelho Knopp
Full Text Available A atrofia de múltiplos sistemas (AMS é caracterizada pela presença de sinais parkinsonianos, cerebelares, autonômicos e piramidais, em várias combinações. O aparecimento de disartria e disfagia no primeiro ano de manifestação de parkinsonismo, sugere o diagnóstico de AMS. O objetivo deste estudo foi o de caracterizar do ponto de vista fonoaudiológico os distúrbios da fala e da voz dos pacientes com AMS. Foram selecionados cinco pacientes, com idade média de 51,2 anos e com diagnóstico provável de AMS. Cada paciente foi submetido a avaliação neurológica e fonoaudiológica. Esta última foi composta dos seguintes itens: anamnese; avaliação miofuncional e avaliação perceptivo-auditiva da fala. Os sintomas de fala e voz apareceram 1,1 ano após o início dos sintomas motores e a disartrofonia apresentada por todos os pacientes foi a do tipo mista, mesclando os componentes hipocinético, atáxico e espástico, com predomínio do primeiro. Nossos achados são diferentes daqueles comumente vistos em pacientes com a doença de Parkinson, onde o componente hipocinético é o único achado. Os dados levantados indicam que a avaliação fonoaudiológica é importante no diagnóstico diferencial e no planejamento terapêutico da AMS.Multiple system atrophy (MSA is characterized by parkinsonian, cerebellar and pyramidal features along with autonomic dysfunction in different combinations. Onset of dysarthria during the first year of the manifestation of a parkinsonian syndrome suggests the diagnosis of MSA. The aim of this study was to characterize the voice and the speech of patients with MSA. We studied five MSA patients with a mean age of 51.2 years. Each patient was submitted to a neurological and a specific speech and voice assessment. The latter consisted of the following: clinical interview, myofunctional examination, and perceptual speech evaluation. Speech and voice complaints occurred at an average time of 1.1 year after the
Lu, Lingxi; Bao, Xiaohan; Chen, Jing; Qu, Tianshu; Wu, Xihong; Li, Liang
Under a noisy "cocktail-party" listening condition with multiple people talking, listeners can use various perceptual/cognitive unmasking cues to improve recognition of the target speech against informational speech-on-speech masking. One potential unmasking cue is the emotion expressed in a speech voice, by means of certain acoustical features. However, it was unclear whether emotionally conditioning a target-speech voice that has none of the typical acoustical features of emotions (i.e., an emotionally neutral voice) can be used by listeners for enhancing target-speech recognition under speech-on-speech masking conditions. In this study we examined the recognition of target speech against a two-talker speech masker both before and after the emotionally neutral target voice was paired with a loud female screaming sound that has a marked negative emotional valence. The results showed that recognition of the target speech (especially the first keyword in a target sentence) was significantly improved by emotionally conditioning the target speaker's voice. Moreover, the emotional unmasking effect was independent of the unmasking effect of the perceived spatial separation between the target speech and the masker. Also, (skin conductance) electrodermal responses became stronger after emotional learning when the target speech and masker were perceptually co-located, suggesting an increase of listening efforts when the target speech was informationally masked. These results indicate that emotionally conditioning the target speaker's voice does not change the acoustical parameters of the target-speech stimuli, but the emotionally conditioned vocal features can be used as cues for unmasking target speech.
Semeraro, Hannah D; Rowan, Daniel; van Besouw, Rachel M; Allsopp, Adrian A
The studies described in this article outline the design and development of a British English version of the coordinate response measure (CRM) speech-in-noise (SiN) test. Our interest in the CRM is as a SiN test with high face validity for occupational auditory fitness for duty (AFFD) assessment. Study 1 used the method of constant stimuli to measure and adjust the psychometric functions of each target word, producing a speech corpus with equal intelligibility. After ensuring all the target words had similar intelligibility, for Studies 2 and 3, the CRM was presented in an adaptive procedure in stationary speech-spectrum noise to measure speech reception thresholds and evaluate the test-retest reliability of the CRM SiN test. Studies 1 (n = 20) and 2 (n = 30) were completed by normal-hearing civilians. Study 3 (n = 22) was completed by hearing impaired military personnel. The results display good test-retest reliability (95% confidence interval (CI) hearing impairment. The British English CRM using stationary speech-spectrum noise is a "ready to use" SiN test, suitable for investigation as an AFFD assessment tool for military personnel.
Heinen, Esther; Birkholz, Peter; Willmes, Klaus; Neuschaefer-Rube, Christiane
To explore possible effects of tongue piercing on perceived speech quality. Using a quasi-experimental design, we analyzed the effect of tongue piercing on speech in a perception experiment. Samples of spontaneous speech and read speech were recorded from 20 long-term pierced and 20 non-pierced individuals (10 males, 10 females each). The individuals having a tongue piercing were recorded with attached and removed piercing. The audio samples were blindly rated by 26 female and 20 male laypersons and by 5 female speech-language pathologists with regard to perceived speech quality along 5 dimensions: speech clarity, speech rate, prosody, rhythm and fluency. We found no statistically significant differences for any of the speech quality dimensions between the pierced and non-pierced individuals, neither for the read nor for the spontaneous speech. In addition, neither length nor position of piercing had a significant effect on speech quality. The removal of tongue piercings had no effects on speech performance either. Rating differences between laypersons and speech-language pathologists were not dependent on the presence of a tongue piercing. People are able to perfectly adapt their articulation to long-term tongue piercings such that their speech quality is not perceptually affected.
Rämö, Jussi; Christensen, Lasse; Bech, Søren
This paper focuses on validating a perceptual distraction model, which aims to predict user's perceived distraction caused by audio-on-audio interference. Originally, the distraction model was trained with music targets and interferers using a simple loudspeaker setup, consisting of only two...... sound zones within the sound-zone system. Thus, validating the model using a different sound-zone system with both speech-on-music and music-on-speech stimuli sets. The results show that the model performance is equally good in both zones, i.e., with both speech- on-music and music-on-speech stimuli...
Jerger, Susan; Damian, Markus F; McAlpine, Rachel P; Abdi, Hervé
Understanding spoken language is an audiovisual event that depends critically on the ability to discriminate and identify phonemes yet we have little evidence about the role of early auditory experience and visual speech on the development of these fundamental perceptual skills. Objectives of this research were to determine 1) how visual speech influences phoneme discrimination and identification; 2) whether visual speech influences these two processes in a like manner, such that discrimination predicts identification; and 3) how the degree of hearing loss affects this relationship. Such evidence is crucial for developing effective intervention strategies to mitigate the effects of hearing loss on language development. Participants were 58 children with early-onset sensorineural hearing loss (CHL, 53% girls, M = 9;4 yrs) and 58 children with normal hearing (CNH, 53% girls, M = 9;4 yrs). Test items were consonant-vowel (CV) syllables and nonwords with intact visual speech coupled to non-intact auditory speech (excised onsets) as, for example, an intact consonant/rhyme in the visual track (Baa or Baz) coupled to non-intact onset/rhyme in the auditory track (/-B/aa or/-B/az). The items started with an easy-to-speechread/B/or difficult-to-speechread/G/onset and were presented in the auditory (static face) vs. audiovisual (dynamic face) modes. We assessed discrimination for intact vs. non-intact different pairs (e.g., Baa:/-B/aa). We predicted that visual speech would cause the non-intact onset to be perceived as intact and would therefore generate more same-as opposed to different-responses in the audiovisual than auditory mode. We assessed identification by repetition of nonwords with non-intact onsets (e.g.,/-B/az). We predicted that visual speech would cause the non-intact onset to be perceived as intact and would therefore generate more Baz-as opposed to az- responses in the audiovisual than auditory mode. Performance in the audiovisual mode showed more same
Jerger, Susan; Damian, Markus F.; McAlpine, Rachel P.; Abdi, Hervé
Objectives Understanding spoken language is an audiovisual event that depends critically on the ability to discriminate and identify phonemes yet we have little evidence about the role of early auditory experience and visual speech on the development of these fundamental perceptual skills. Objectives of this research were to determine 1) how visual speech influences phoneme discrimination and identification; 2) whether visual speech influences these two processes in a like manner, such that discrimination predicts identification; and 3) how the degree of hearing loss affects this relationship. Such evidence is crucial for developing effective intervention strategies to mitigate the effects of hearing loss on language development. Methods Participants were 58 children with early-onset sensorineural hearing loss (CHL, 53% girls, M = 9;4 yrs) and 58 children with normal hearing (CNH, 53% girls, M = 9;4 yrs). Test items were consonant-vowel (CV) syllables and nonwords with intact visual speech coupled to non-intact auditory speech (excised onsets) as, for example, an intact consonant/rhyme in the visual track (Baa or Baz) coupled to non-intact onset/rhyme in the auditory track (/–B/aa or /–B/az). The items started with an easy-to-speechread /B/ or difficult-to-speechread /G/ onset and were presented in the auditory (static face) vs. audiovisual (dynamic face) modes. We assessed discrimination for intact vs. non-intact different pairs (e.g., Baa:/–B/aa). We predicted that visual speech would cause the non-intact onset to be perceived as intact and would therefore generate more same—as opposed to different—responses in the audiovisual than auditory mode. We assessed identification by repetition of nonwords with non-intact onsets (e.g., /–B/az). We predicted that visual speech would cause the non-intact onset to be perceived as intact and would therefore generate more Baz—as opposed to az— responses in the audiovisual than auditory mode. Results
Dosher, Barbara; Lu, Zhong-Lin
Visual perceptual learning through practice or training can significantly improve performance on visual tasks. Originally seen as a manifestation of plasticity in the primary visual cortex, perceptual learning is more readily understood as improvements in the function of brain networks that integrate processes, including sensory representations, decision, attention, and reward, and balance plasticity with system stability. This review considers the primary phenomena of perceptual learning, theories of perceptual learning, and perceptual learning's effect on signal and noise in visual processing and decision. Models, especially computational models, play a key role in behavioral and physiological investigations of the mechanisms of perceptual learning and for understanding, predicting, and optimizing human perceptual processes, learning, and performance. Performance improvements resulting from reweighting or readout of sensory inputs to decision provide a strong theoretical framework for interpreting perceptual learning and transfer that may prove useful in optimizing learning in real-world applications.
Söderlund, Göran B W; Jobs, Elisabeth Nilsson
The most common neuropsychiatric condition in the in children is attention deficit hyperactivity disorder (ADHD), affecting ∼6-9% of the population. ADHD is distinguished by inattention and hyperactive, impulsive behaviors as well as poor performance in various cognitive tasks often leading to failures at school. Sensory and perceptual dysfunctions have also been noticed. Prior research has mainly focused on limitations in executive functioning where differences are often explained by deficits in pre-frontal cortex activation. Less notice has been given to sensory perception and subcortical functioning in ADHD. Recent research has shown that children with ADHD diagnosis have a deviant auditory brain stem response compared to healthy controls. The aim of the present study was to investigate if the speech recognition threshold differs between attentive and children with ADHD symptoms in two environmental sound conditions, with and without external noise. Previous research has namely shown that children with attention deficits can benefit from white noise exposure during cognitive tasks and here we investigate if noise benefit is present during an auditory perceptual task. For this purpose we used a modified Hagerman's speech recognition test where children with and without attention deficits performed a binaural speech recognition task to assess the speech recognition threshold in no noise and noise conditions (65 dB). Results showed that the inattentive group displayed a higher speech recognition threshold than typically developed children and that the difference in speech recognition threshold disappeared when exposed to noise at supra threshold level. From this we conclude that inattention can partly be explained by sensory perceptual limitations that can possibly be ameliorated through noise exposure.
Göran B W Söderlund
Full Text Available The most common neuropsychiatric condition in the in children is attention deficit hyperactivity disorder (ADHD, affecting approximately 6-9 % of the population. ADHD is distinguished by inattention and hyperactive, impulsive behaviors as well as poor performance in various cognitive tasks often leading to failures at school. Sensory and perceptual dysfunctions have also been noticed. Prior research has mainly focused on limitations in executive functioning where differences are often explained by deficits in pre-frontal cortex activation. Less notice has been given to sensory perception and subcortical functioning in ADHD. Recent research has shown that children with ADHD diagnosis have a deviant auditory brain stem response compared to healthy controls. The aim of the present study was to investigate if the speech recognition threshold differs between attentive and children with ADHD symptoms in two environmental sound conditions, with and without external noise. Previous research has namely shown that children with attention deficits can benefit from white noise exposure during cognitive tasks and here we investigate if noise benefit is present during an auditory perceptual task. For this purpose we used a modified Hagerman’s speech recognition test where children with and without attention deficits performed a binaural speech recognition task to assess the speech recognition threshold in no noise and noise conditions (65 dB. Results showed that the inattentive group displayed a higher speech recognition threshold than typically developed children (TDC and that the difference in speech recognition threshold disappeared when exposed to noise at supra threshold level. From this we conclude that inattention can partly be explained by sensory perceptual limitations that can possibly be ameliorated through noise exposure.
Revision of the Competency Standards for Occupational Therapy Driver Assessors: An overview of the evidence for the inclusion of cognitive and perceptual assessments within fitness-to-drive evaluations.
Fields, Sally M; Unsworth, Carolyn A
Determination of fitness-to-drive after illness or injury is a complex process typically requiring a comprehensive driving assessment, including off-road and on-road assessment components. The competency standards for occupational therapy driver assessors (Victoria, Australia) define the requirements for performance of a comprehensive driving assessment, and we are currently revising these. Assessment of cognitive and perceptual skills forms an important part of the off-road assessment. The aim of this systematic review of systematic reviews (known as an overview) is to identify what evidence exists for including assessment of cognitive and perceptual skills within fitness-to-drive evaluations to inform revision of the competency standards. Five electronic databases (MEDLINE, CINAHL, PsycINFO, The Cochrane Library, OT Seeker) were systematically searched. Systematic review articles were appraised by two authors for eligibility. Methodological quality was independently assessed using the AMSTAR tool. Narrative analysis was conducted to summarise the content of eligible reviews. A total of 1228 results were retrieved. Fourteen reviews met the inclusion criteria. Reviews indicated that the components of cognition and perception most frequently identified as being predictive of fitness-to-drive were executive function (n = 13), processing speed (n = 12), visuospatial skills, attention, memory and mental flexibility (n = 11). Components less indicative were perception, concentration (n = 10), praxis (n = 9), language (n = 7) and neglect (n = 6). This overview of systematic reviews supports the inclusion of assessment of a range of cognitive and perceptual skills as key elements in a comprehensive driver assessment and therefore should be included in the revised competency standards for occupational therapy driver assessors. © 2017 Occupational Therapy Australia.
Webster, Michael A.; Yasuda, Maiko; Haber, Sara; Leonard, Deanne; Ballardini, Nicole
We used adaptation to examine the relationship between perceptual norms--the stimuli observers describe as psychologically neutral, and response norms--the stimulus levels that leave visual sensitivity in a neutral or balanced state. Adapting to stimuli on opposite sides of a neutral point (e.g. redder or greener than white) biases appearance in opposite ways. Thus the adapting stimulus can be titrated to find the unique adapting level that does not bias appearance. We compared these response norms to subjectively defined neutral points both within the same observer (at different retinal eccentricities) and between observers. These comparisons were made for visual judgments of color, image focus, and human faces, stimuli that are very different and may depend on very different levels of processing, yet which share the property that for each there is a well defined and perceptually salient norm. In each case the adaptation aftereffects were consistent with an underlying sensitivity basis for the perceptual norm. Specifically, response norms were similar to and thus covaried with the perceptual norm, and under common adaptation differences between subjectively defined norms were reduced. These results are consistent with models of norm-based codes and suggest that these codes underlie an important link between visual coding and visual experience.
Wilson, Donald A.; Fletcher, Max L.; Sullivan, Regina M.
Olfactory perceptual learning is a relatively long-term, learned increase in perceptual acuity, and has been described in both humans and animals. Data from recent electrophysiological studies have indicated that olfactory perceptual learning may be correlated with changes in odorant receptive fields of neurons in the olfactory bulb and piriform…
van Dantzig, Saskia; Pecher, Diane; Zeelenberg, Rene; Barsalou, Lawrence W.
According to the Perceptual Symbols Theory of cognition (Barsalou, 1999), modality-specific simulations underlie the representation of concepts. A strong prediction of this view is that perceptual processing affects conceptual processing. In this study, participants performed a perceptual detection task and a conceptual property-verification task…
Pisoni, David B.
This paper reviews some of the major evidence and arguments currently available to support the view that human speech perception may require the use of specialized neural mechanisms for perceptual analysis. Experiments using synthetically produced speech signals with adults are briefly summarized and extensions of these results to infants and other organisms are reviewed with an emphasis towards detailing those aspects of speech perception that may require some need for specialized species-specific processors. Finally, some comments on the role of early experience in perceptual development are provided as an attempt to identify promising areas of new research in speech perception. PMID:399200
Logan, J S; Greene, B G; Pisoni, D B
This paper reports the results of an investigation that employed the modified rhyme test (MRT) to measure the segmental intelligibility of synthetic speech generated automatically by rule. Synthetic speech produced by ten text-to-speech systems was studied and compared to natural speech. A variation of the standard MRT was also used to study the effects of response set size on perceptual confusions. Results indicated that the segmental intelligibility scores formed a continuum. Several systems displayed very high levels of performance that were close to or equal to scores obtained with natural speech; other systems displayed substantially worse performance compared to natural speech. The overall performance of the best system, DECtalk--Paul, was equivalent to the data obtained with natural speech for consonants in syllable-initial position. The findings from this study are discussed in terms of the use of a set of standardized procedures for measuring intelligibility of synthetic speech under controlled laboratory conditions. Recent work investigating the perception of synthetic speech under more severe conditions in which greater demands are made on the listener's processing resources is also considered. The wide range of intelligibility scores obtained in the present study demonstrates important differences in perception and suggests that not all synthetic speech is perceptually equivalent to the listener.
Logan, John S.; Greene, Beth G.; Pisoni, David B.
This paper reports the results of an investigation that employed the modified rhyme test (MRT) to measure the segmental intelligibility of synthetic speech generated automatically by rule. Synthetic speech produced by ten text-to-speech systems was studied and compared to natural speech. A variation of the standard MRT was also used to study the effects of response set size on perceptual confusions. Results indicated that the segmental intelligibility scores formed a continuum. Several systems displayed very high levels of performance that were close to or equal to scores obtained with natural speech; other systems displayed substantially worse performance compared to natural speech. The overall performance of the best system, DECtalk—Paul, was equivalent to the data obtained with natural speech for consonants in syllable-initial position. The findings from this study are discussed in terms of the use of a set of standardized procedures for measuring intelligibility of synthetic speech under controlled laboratory conditions. Recent work investigating the perception of synthetic speech under more severe conditions in which greater demands are made on the listener’s processing resources is also considered. The wide range of intelligibility scores obtained in the present study demonstrates important differences in perception and suggests that not all synthetic speech is perceptually equivalent to the listener. PMID:2527884
Zapata, Julián; Kirkedal, Andreas Søeborg
In this paper, we report on a two-part experiment aiming to assess and compare the performance of two types of automatic speech recognition (ASR) systems on two different computational platforms when used to augment dictation workflows. The experiment was performed with a sample of speakers...
C. R. C. Cruz et al.
Full Text Available "Biotecnological War" board game, a conceptual and perceptual assessment tool for biotechnology and protein chemistry teaching for undergraduate students in biological sciences and related areas. It is a proposal initially conceived as an alternative complementary tool for biochemistry teaching of proteins and peptides, challenging students, aiming to review concepts transmitted in classroom, stimulating diverse student’s abilities, such as their creativity, competitiveness and resource management. OBJECTIVES. Correlate biochemistry importance of proteins and peptides with the development of new products. MATERIAL AND METHODS. Firstly, theoretical-practical classes were given with seminars to be presented by the groups, including topics that will be addressed in game. Groups of 5 students, with previously viewed themes drawn a goal to be achieved. There are two drawn goals variations: Academic or Commercial. Board is divided into provinces, which must be bought with an initial resource to complete the goal. Before the beginning each group will have 15 minutes to plan their actions. The objective is to develop the entire objective drawn with appropriate methodology, having at least 1 territory in each province. RESULTS. This game proved to be an excellent tool for complementary evaluation of students, which stimulated teamwork and a strong competitive spirit within classroom, which allowed to analyze students' perception regarding the protein subject and team work. On the other hand, for teacher and students participating in compulsory traineeship program this game demonstrated new ways to approach complex subjects in biochemistry using creativity with the development of new activities such as this board game. CONCLUSION: Overall, students had a good impression of “Biotecnological war” game since it helped to secure and administer the protein and peptides biochemical subject in a competitive and team work way.
Shi, R; Werker, J F; Morgan, J L
In our study newborn infants were presented with lists of lexical and grammatical words prepared from natural maternal speech. The results show that newborns are able to categorically discriminate these sets of words based on a constellation of perceptual cues that distinguish them. This general ability to detect and categorically discriminate sets of words on the basis of multiple acoustic and phonological cues may provide a perceptual base that can help older infants bootstrap into the acquisition of grammatical categories and syntactic structure.
Ylinen, Sari; Uther, Maria; Latvala, Antti; Vepsalainen, Sara; Iverson, Paul; Akahane-Yamada, Reiko; Naatanen, Risto
Foreign-language learning is a prime example of a task that entails perceptual learning. The correct comprehension of foreign-language speech requires the correct recognition of speech sounds. The most difficult speech-sound contrasts for foreign-language learners often are the ones that have multiple phonetic cues, especially if the cues are…
Mitchel, Aaron D; Gerfen, Chip; Weiss, Daniel J
One challenge for speech perception is between-speaker variability in the acoustic parameters of speech. For example, the same phoneme (e.g. the vowel in "cat") may have substantially different acoustic properties when produced by two different speakers and yet the listener must be able to interpret these disparate stimuli as equivalent. Perceptual tuning, the use of contextual information to adjust phonemic representations, may be one mechanism that helps listeners overcome obstacles they face due to this variability during speech perception. Here we test whether visual contextual cues to speaker identity may facilitate the formation and maintenance of distributional representations for individual speakers, allowing listeners to adjust phoneme boundaries in a speaker-specific manner. We familiarized participants to an audiovisual continuum between /aba/ and /ada/. During familiarization, the "b-face" mouthed /aba/ when an ambiguous token was played, while the "D-face" mouthed /ada/. At test, the same ambiguous token was more likely to be identified as /aba/ when paired with a stilled image of the "b-face" than with an image of the "D-face." This was not the case in the control condition when the two faces were paired equally with the ambiguous token. Together, these results suggest that listeners may form speaker-specific phonemic representations using facial identity cues.
Ravishankar, C., Hughes Network Systems, Germantown, MD
Speech is the predominant means of communication between human beings and since the invention of the telephone by Alexander Graham Bell in 1876, speech services have remained to be the core service in almost all telecommunication systems. Original analog methods of telephony had the disadvantage of speech signal getting corrupted by noise, cross-talk and distortion Long haul transmissions which use repeaters to compensate for the loss in signal strength on transmission links also increase the associated noise and distortion. On the other hand digital transmission is relatively immune to noise, cross-talk and distortion primarily because of the capability to faithfully regenerate digital signal at each repeater purely based on a binary decision. Hence end-to-end performance of the digital link essentially becomes independent of the length and operating frequency bands of the link Hence from a transmission point of view digital transmission has been the preferred approach due to its higher immunity to noise. The need to carry digital speech became extremely important from a service provision point of view as well. Modem requirements have introduced the need for robust, flexible and secure services that can carry a multitude of signal types (such as voice, data and video) without a fundamental change in infrastructure. Such a requirement could not have been easily met without the advent of digital transmission systems, thereby requiring speech to be coded digitally. The term Speech Coding is often referred to techniques that represent or code speech signals either directly as a waveform or as a set of parameters by analyzing the speech signal. In either case, the codes are transmitted to the distant end where speech is reconstructed or synthesized using the received set of codes. A more generic term that is applicable to these techniques that is often interchangeably used with speech coding is the term voice coding. This term is more generic in the sense that the
Schreitmüller, Stefan; Frenken, Miriam; Bentz, Lüder; Ortmann, Magdalene; Walger, Martin; Meister, Hartmut
Watching a talker's mouth is beneficial for speech reception (SR) in many communication settings, especially in noise and when hearing is impaired. Measures for audiovisual (AV) SR can be valuable in the framework of diagnosing or treating hearing disorders. This study addresses the lack of standardized methods in many languages for assessing lipreading, AV gain, and integration. A new method is validated that supplements a German speech audiometric test with visualizations of the synthetic articulation of an avatar that was used, for it is feasible to lip-sync auditory speech in a highly standardized way. Three hypotheses were formed according to the literature on AV SR that used live or filmed talkers. It was tested whether respective effects could be reproduced with synthetic articulation: (1) cochlear implant (CI) users have a higher visual-only SR than normal-hearing (NH) individuals, and younger individuals obtain higher lipreading scores than older persons. (2) Both CI and NH gain from presenting AV over unimodal (auditory or visual) sentences in noise. (3) Both CI and NH listeners efficiently integrate complementary auditory and visual speech features. In a controlled, cross-sectional study with 14 experienced CI users (mean age 47.4) and 14 NH individuals (mean age 46.3, similar broad age distribution), lipreading, AV gain, and integration of a German matrix sentence test were assessed. Visual speech stimuli were synthesized by the articulation of the Talking Head system "MASSY" (Modular Audiovisual Speech Synthesizer), which displayed standardized articulation with respect to the visibility of German phones. In line with the hypotheses and previous literature, CI users had a higher mean visual-only SR than NH individuals (CI, 38%; NH, 12%; p < 0.001). Age was correlated with lipreading such that within each group, younger individuals obtained higher visual-only scores than older persons (rCI = -0.54; p = 0.046; rNH = -0.78; p < 0.001). Both CI and NH
Full Text Available People perceive the same situation described in direct speech (e.g., John said, “I like the food at this restaurant” as more vivid and perceptually engaging than described in indirect speech (e.g., John said that he likes the food at the restaurant. So, if direct speech enhances the perception of vividness relative to indirect speech, what are the effects of using indirect speech? In four experiments, we examined whether the use of direct and indirect speech influences the comprehender’s memory for the identity of the speaker. Participants read a direct or an indirect speech version of a story and then addressed statements to one of the four protagonists of the story in a memory task. We found better source memory at the level of protagonist gender after indirect than direct speech (Exp. 1–3. When the story was rewritten to make the protagonists more distinctive, we also found an effect of speech type on source memory at the level of the individual, with better memory after indirect than direct speech (Exp. 3–4. Memory for the content of the story, however, was not influenced by speech type (Exp. 4. While previous research showed that direct speech may enhance memory for how something was said, we conclude that indirect speech enhances memory for who said what.
Steenbergen, Peter; Buitenweg, Jan R; Trojan, Jörg; Veltink, Peter H
Various studies have shown subjects to mislocalize cutaneous stimuli in an idiosyncratic manner. Spatial properties of individual localization behavior can be represented in the form of perceptual maps. Individual differences in these maps may reflect properties of internal body representations, and perceptual maps may therefore be a useful method for studying these representations. For this to be the case, individual perceptual maps need to be reproducible, which has not yet been demonstrated. We assessed the reproducibility of localizations measured twice on subsequent days. Ten subjects participated in the experiments. Non-painful electrocutaneous stimuli were applied at seven sites on the lower arm. Subjects localized the stimuli on a photograph of their own arm, which was presented on a tablet screen overlaying the real arm. Reproducibility was assessed by calculating intraclass correlation coefficients (ICC) for the mean localizations of each electrode site and the slope and offset of regression models of the localizations, which represent scaling and displacement of perceptual maps relative to the stimulated sites. The ICCs of the mean localizations ranged from 0.68 to 0.93; the ICCs of the regression parameters were 0.88 for the intercept and 0.92 for the slope. These results indicate a high degree of reproducibility. We conclude that localization patterns of non-painful electrocutaneous stimuli on the arm are reproducible on subsequent days. Reproducibility is a necessary property of perceptual maps for these to reflect properties of a subject's internal body representations. Perceptual maps are therefore a promising method for studying body representations.
Müller-Deile, J; Kortmann, T; Hoppe, U; Hessel, H; Morsnowski, A
The aim of this multicenter clinical field study was to assess the benefits of the new Freedom 24 sound processor for cochlear implant (CI) users implanted with the Nucleus 24 cochlear implant system. The study included 48 postlingually profoundly deaf experienced CI users who demonstrated speech comprehension performance with their current speech processor on the Oldenburg sentence test (OLSA) in quiet conditions of at least 80% correct scores and who were able to perform adaptive speech threshold testing using the OLSA in noisy conditions. Following baseline measures of speech comprehension performance with their current speech processor, subjects were upgraded to the Freedom 24 speech processor. After a take-home trial period of at least 2 weeks, subject performance was evaluated by measuring the speech reception threshold with the Freiburg multisyllabic word test and speech intelligibility with the Freiburg monosyllabic word test at 50 dB and 70 dB in the sound field. The results demonstrated highly significant benefits for speech comprehension with the new speech processor. Significant benefits for speech comprehension were also demonstrated with the new speech processor when tested in competing background noise.In contrast, use of the Abbreviated Profile of Hearing Aid Benefit (APHAB) did not prove to be a suitably sensitive assessment tool for comparative subjective self-assessment of hearing benefits with each processor. Use of the preprocessing algorithm known as adaptive dynamic range optimization (ADRO) in the Freedom 24 led to additional improvements over the standard upgrade map for speech comprehension in quiet and showed equivalent performance in noise. Through use of the preprocessing beam-forming algorithm BEAM, subjects demonstrated a highly significant improved signal-to-noise ratio for speech comprehension thresholds (i.e., signal-to-noise ratio for 50% speech comprehension scores) when tested with an adaptive procedure using the Oldenburg
Reetzke, Rachel; Lam, Boji Pak-Wing; Xie, Zilong; Sheng, Li; Chandrasekaran, Bharath
Recognizing speech in adverse listening conditions is a significant cognitive, perceptual, and linguistic challenge, especially for children. Prior studies have yielded mixed results on the impact of bilingualism on speech perception in noise. Methodological variations across studies make it difficult to converge on a conclusion regarding the effect of bilingualism on speech-in-noise performance. Moreover, there is a dearth of speech-in-noise evidence for bilingual children who learn two languages simultaneously. The aim of the present study was to examine the extent to which various adverse listening conditions modulate differences in speech-in-noise performance between monolingual and simultaneous bilingual children. To that end, sentence recognition was assessed in twenty-four school-aged children (12 monolinguals; 12 simultaneous bilinguals, age of English acquisition ≤ 3 yrs.). We implemented a comprehensive speech-in-noise battery to examine recognition of English sentences across different modalities (audio-only, audiovisual), masker types (steady-state pink noise, two-talker babble), and a range of signal-to-noise ratios (SNRs; 0 to -16 dB). Results revealed no difference in performance between monolingual and simultaneous bilingual children across each combination of modality, masker, and SNR. Our findings suggest that when English age of acquisition and socioeconomic status is similar between groups, monolingual and bilingual children exhibit comparable speech-in-noise performance across a range of conditions analogous to everyday listening environments.
Heald, Shannon L. M.; Van Hedger, Stephen C.; Nusbaum, Howard C.
In our auditory environment, we rarely experience the exact acoustic waveform twice. This is especially true for communicative signals that have meaning for listeners. In speech and music, the acoustic signal changes as a function of the talker (or instrument), speaking (or playing) rate, and room acoustics, to name a few factors. Yet, despite this acoustic variability, we are able to recognize a sentence or melody as the same across various kinds of acoustic inputs and determine meaning based on listening goals, expectations, context, and experience. The recognition process relates acoustic signals to prior experience despite variability in signal-relevant and signal-irrelevant acoustic properties, some of which could be considered as “noise” in service of a recognition goal. However, some acoustic variability, if systematic, is lawful and can be exploited by listeners to aid in recognition. Perceivable changes in systematic variability can herald a need for listeners to reorganize perception and reorient their attention to more immediately signal-relevant cues. This view is not incorporated currently in many extant theories of auditory perception, which traditionally reduce psychological or neural representations of perceptual objects and the processes that act on them to static entities. While this reduction is likely done for the sake of empirical tractability, such a reduction may seriously distort the perceptual process to be modeled. We argue that perceptual representations, as well as the processes underlying perception, are dynamically determined by an interaction between the uncertainty of the auditory signal and constraints of context. This suggests that the process of auditory recognition is highly context-dependent in that the identity of a given auditory object may be intrinsically tied to its preceding context. To argue for the flexible neural and psychological updating of sound-to-meaning mappings across speech and music, we draw upon examples
Kellman, Philip J; Garrigan, Patrick
We consider perceptual learning: experience-induced changes in the way perceivers extract information. Often neglected in scientific accounts of learning and in instruction, perceptual learning is a fundamental contributor to human expertise and is crucial in domains where humans show remarkable levels of attainment, such as language, chess, music, and mathematics. In Section 2, we give a brief history and discuss the relation of perceptual learning to other forms of learning. We consider in Section 3 several specific phenomena, illustrating the scope and characteristics of perceptual learning, including both discovery and fluency effects. We describe abstract perceptual learning, in which structural relationships are discovered and recognized in novel instances that do not share constituent elements or basic features. In Section 4, we consider primary concepts that have been used to explain and model perceptual learning, including receptive field change, selection, and relational recoding. In Section 5, we consider the scope of perceptual learning, contrasting recent research, focused on simple sensory discriminations, with earlier work that emphasized extraction of invariance from varied instances in more complex tasks. Contrary to some recent views, we argue that perceptual learning should not be confined to changes in early sensory analyzers. Phenomena at various levels, we suggest, can be unified by models that emphasize discovery and selection of relevant information. In a final section, we consider the potential role of perceptual learning in educational settings. Most instruction emphasizes facts and procedures that can be verbalized, whereas expertise depends heavily on implicit pattern recognition and selective extraction skills acquired through perceptual learning. We consider reasons why perceptual learning has not been systematically addressed in traditional instruction, and we describe recent successful efforts to create a technology of perceptual
Kellman, Philip J.; Garrigan, Patrick
We consider perceptual learning: experience-induced changes in the way perceivers extract information. Often neglected in scientific accounts of learning and in instruction, perceptual learning is a fundamental contributor to human expertise and is crucial in domains where humans show remarkable levels of attainment, such as language, chess, music, and mathematics. In Section 2, we give a brief history and discuss the relation of perceptual learning to other forms of learning. We consider in Section 3 several specific phenomena, illustrating the scope and characteristics of perceptual learning, including both discovery and fluency effects. We describe abstract perceptual learning, in which structural relationships are discovered and recognized in novel instances that do not share constituent elements or basic features. In Section 4, we consider primary concepts that have been used to explain and model perceptual learning, including receptive field change, selection, and relational recoding. In Section 5, we consider the scope of perceptual learning, contrasting recent research, focused on simple sensory discriminations, with earlier work that emphasized extraction of invariance from varied instances in more complex tasks. Contrary to some recent views, we argue that perceptual learning should not be confined to changes in early sensory analyzers. Phenomena at various levels, we suggest, can be unified by models that emphasize discovery and selection of relevant information. In a final section, we consider the potential role of perceptual learning in educational settings. Most instruction emphasizes facts and procedures that can be verbalized, whereas expertise depends heavily on implicit pattern recognition and selective extraction skills acquired through perceptual learning. We consider reasons why perceptual learning has not been systematically addressed in traditional instruction, and we describe recent successful efforts to create a technology of perceptual
Mottron, Laurent; Dawson, Michelle; Soulières, Isabelle; Hubert, Benedicte; Burack, Jake
We propose an "Enhanced Perceptual Functioning" model encompassing the main differences between autistic and non-autistic social and non-social perceptual processing: locally oriented visual and auditory perception, enhanced low-level discrimination, use of a more posterior network in "complex" visual tasks, enhanced perception of first order static stimuli, diminished perception of complex movement, autonomy of low-level information processing toward higher-order operations, and differential relation between perception and general intelligence. Increased perceptual expertise may be implicated in the choice of special ability in savant autistics, and in the variability of apparent presentations within PDD (autism with and without typical speech, Asperger syndrome) in non-savant autistics. The overfunctioning of brain regions typically involved in primary perceptual functions may explain the autistic perceptual endophenotype.
Mackintosh, N J
Although most studies of perceptual learning in human participants have concentrated on the changes in perception assumed to be occurring, studies of nonhuman animals necessarily measure discrimination learning and generalization and remain agnostic on the question of whether changes in behavior reflect changes in perception. On the other hand, animal studies do make it easier to draw a distinction between supervised and unsupervised learning. Differential reinforcement will surely teach animals to attend to some features of a stimulus array rather than to others. But it is an open question as to whether such changes in attention underlie the enhanced discrimination seen after unreinforced exposure to such an array. I argue that most instances of unsupervised perceptual learning observed in animals (and at least some in human animals) are better explained by appeal to well-established principles and phenomena of associative learning theory: excitatory and inhibitory associations between stimulus elements, latent inhibition, and habituation.
Pries, Lotta-Katrin; Guloksuz, Sinan; Menne-Lothmann, Claudia; Decoster, Jeroen; van Winkel, Ruud; Collip, Dina; Delespaul, Philippe; De Hert, Marc; Derom, Catherine; Thiery, Evert; Jacobs, Nele; Wichers, Marieke; Simons, Claudia J. P.; Rutten, Bart P. F.; van Os, Jim
Background: An association between white noise speech illusion and psychotic symptoms has been reported in patients and their relatives. This supports the theory that bottom-up and top-down perceptual processes are involved in the mechanisms underlying perceptual abnormalities. However, findings in
Braun, Silke; Annovazzi, Chiara; Botella, Cristina; Bridler, René; Camussi, Elisabetta; Delfino, Juan P; Mohr, Christine; Moragrega, Ines; Papagno, Costanza; Pisoni, Alberto; Soler, Carla; Seifritz, Erich; Stassen, Hans H
Computerized speech analysis (CSA) is a powerful method that allows one to assess stress-induced mood disturbances and affective disorders through repeated measurements of speaking behavior and voice sound characteristics. Over the past decades CSA has been successfully used in the clinical context to monitor the transition from 'affectively disturbed' to 'normal' among psychiatric patients under treatment. This project, by contrast, aimed to extend the CSA method in such a way that the transition from 'normal' to 'affected' can be detected among subjects of the general population through 10-20 self-assessments. Central to the project was a normative speech study of 5 major languages (English, French, German, Italian, and Spanish). Each language comprised 120 subjects stratified according to gender, age, and education with repeated assessments at 14-day intervals (total n = 697). In a first step, we developed a multivariate model to assess affective state and stress-induced bodily reactions through speaking behavior and voice sound characteristics. Secondly, we determined language-, gender-, and age-specific thresholds that draw a line between 'natural fluctuations' and 'significant changes'. Thirdly, we implemented the model along with the underlying methods and normative data in a self-assessment 'voice app' for laptops, tablets, and smartphones. Finally, a longitudinal self-assessment study of 36 subjects was carried out over 14 days to test the performance of the CSA method in home environments. The data showed that speaking behavior and voice sound characteristics can be quantified in a reproducible and language-independent way. Gender and age explained 15-35% of the observed variance, whereas the educational level had a relatively small effect in the range of 1-3%. The self-assessment 'voice app' was realized in modular form so that additional languages can simply be 'plugged in' once the respective normative data become available. Results of the longitudinal
Saunders, Gabrielle H; Forsline, Anna; Fausti, Stephen A
Measurement of hearing aid outcomes is necessary for demonstration of treatment efficacy, third-party payment, and cost-benefit analysis. Outcomes are usually measured with hearing-related questionnaires and/or tests of speech recognition. However, results from these two types of test often conflict. In this paper, we provide data from a new test measure, known as the Performance-Perceptual Test (PPT), in which subjective and performance aspects of hearing in noise are measured using the same test materials and procedures. A Performance Speech Reception Threshold (SRTN) and a Perceptual SRTN are measured using the Hearing In Noise Test materials and adaptive procedure. A third variable, the discrepancy between these two SRTNs, is also computed. It measures the accuracy with which subjects assess their own hearing ability and is referred to as the Performance-Perceptual Discrepancy (PPDIS). One hundred seven subjects between 24 and 83 yr of age took part. Thirty-three subjects had normal hearing, while the remaining seventy-four had symmetrical sensorineural hearing loss. Of the subjects with impaired hearing, 24 wore hearing aids and 50 did not. All subjects underwent routine audiological examination and completed the PPT and the Hearing Handicap Inventory for the Elderly/Adults on two occasions, between 1 and 2 wk apart. The PPT was conducted for unaided listening with the masker level set to 50, 65, and 80 dB SPL. PPT data show that the subjects with normal hearing have significantly better Performance and Perceptual SRTNs at each test level than the subjects with impaired hearing but that PPDIS values do not differ between the groups. Test-retest reliability for the PPT is excellent (r-values > 0.93 for all conditions). Stepwise multiple regression analysis showed that the Performance SRTN, the PPDIS, and age explain 40% of the variance in reported handicap (Hearing Handicap Inventory for the Elderly/Adults scores). More specifically, poorer performance
Vickers, Deborah A; Backus, Bradford C; Macdonald, Nora K; Rostamzadeh, Niloofar K; Mason, Nisha K; Pandya, Roshni; Marriage, Josephine E; Mahon, Merle H
The assessment of the combined effect of classroom acoustics and sound field amplification (SFA) on children's speech perception within the "live" classroom poses a challenge to researchers. The goals of this study were to determine: (1) Whether personal response system (PRS) hand-held voting cards, together with a closed-set speech perception test (Chear Auditory Perception Test [CAPT]), provide an appropriate method for evaluating speech perception in the classroom; (2) Whether SFA provides better access to the teacher's speech than without SFA for children, taking into account vocabulary age, middle ear dysfunction or ear-canal wax, and home language. Forty-four children from two school-year groups, year 2 (aged 6 years 11 months to 7 years 10 months) and year 3 (aged 7 years 11 months to 8 years 10 months) were tested in two classrooms, using a shortened version of the four-alternative consonant discrimination section of the CAPT. All children used a PRS to register their chosen response, which they selected from four options displayed on the interactive whiteboard. The classrooms were located in a 19th-century school in central London, United Kingdom. Each child sat at their usual position in the room while target speech stimuli were presented either in quiet or in noise. The target speech was presented from the front of the classroom at 65 dBA (calibrated at 1 m) and the presented noise level was 46 dBA measured at the center of the classroom. The older children had an additional noise condition with a noise level of 52 dBA. All conditions were presented twice, once with SFA and once without SFA and the order of testing was randomized. White noise from the teacher's right-hand side of the classroom and International Speech Test Signal from the teacher's left-hand side were used, and the noises were matched at the center point of the classroom (10sec averaging [A-weighted]). Each child's expressive vocabulary age and middle ear status were measured
Van Lierde, K M; Bettens, K; Luyten, A; De Ley, S; Tungotyo, M; Balumukad, D; Galiwango, G; Bauters, W; Vermeersch, H; Hodges, A
The purpose of this study is to describe the speech characteristics in an English-speaking Ugandan boy of 4.5 years who has a rare paramedian craniofacial cleft (unilateral lip, alveolar, palatal, nasal and maxillary cleft, and associated hypertelorism). Closure of the lip together with the closure of the hard and soft palate (one-stage palatal closure) was performed at the age of 5 months. Objective as well as subjective speech assessment techniques were used. The speech samples were perceptually judged for articulation, intelligibility and nasality. The Nasometer was used for the objective measurement of the nasalance values. The most striking communication problems in this child with the rare craniofacial cleft are an incomplete phonetic inventory, a severely impaired speech intelligibility with the presence of very severe hypernasality, mild nasal emission, phonetic disorders (omission of several consonants, decreased intraoral pressure in explosives, insufficient frication of fricatives and the use of a middorsum palatal stop) and phonological disorders (deletion of initial and final consonants and consonant clusters). The increased objective nasalance values are in agreement with the presence of the audible nasality disorders. The results revealed that several phonetic and phonological articulation disorders together with a decreased speech intelligibility and resonance disorders are present in the child with a rare craniofacial cleft. To what extent a secondary surgery for velopharyngeal insufficiency, combined with speech therapy, will improve speech intelligibility, articulation and resonance characteristics is a subject for further research. The results of such analyses may ultimately serve as a starting point for specific surgical and logopedic treatment that addresses the specific needs of children with rare facial clefts. Copyright © 2012 Elsevier Ireland Ltd. All rights reserved.
The task of a keyword recognition system is to detect the presence of certain words in a conversation based on the linguistic information present in human speech. Such keyword spotting systems have applications in homeland security, telephone surveillance and human-computer interfacing. General procedure of a keyword spotting system involves feature generation and matching. In this work, new set of features that are based on the psycho-acoustic masking nature of human speech are proposed. After developing these features a time aligned pattern matching process was implemented to locate the words in a set of unknown words. A word boundary detection technique based on frame classification using the nonlinear characteristics of speech is also addressed in this work. Validation of this keyword spotting model was done using widely acclaimed Cepstral features. The experimental results indicate the viability of using these perceptually significant features as an augmented feature set in keyword spotting.
Cannito, Michael P; Doiuchi, Maki; Murry, Thomas; Woodson, Gayle E
To examine the perceptual structure of voice attributes in adductor spasmodic dysphonia (ADSD) before and after botulinum toxin treatment and identify acoustic correlates of underlying perceptual factors. Reliability of perceptual judgments is considered in detail. Pre- and posttreatment trial with comparison to healthy controls, using single-blind randomized listener judgments of voice qualities, as well as retrospective comparison with acoustic measurements. Oral readings were recorded from 42 ADSD speakers before and after treatment as well as from their age- and sex-matched controls. Experienced judges listened to speech samples and rated attributes of overall voice quality, breathiness, roughness, and brokenness, using computer-implemented visual analog scaling. Data were adjusted for regression to the mean and submitted to principal components factor analysis. Acoustic waveforms, extracted from the reading samples, were analyzed and measurements correlated with perceptual factor scores. Four reliable perceptual variables of ADSD voice were effectively reduced to two underlying factors that corresponded to hyperadduction, most strongly associated with roughness, and hypoadduction, most strongly associated with breathiness. After treatment, the hyperadduction factor improved, whereas the hypoadduction factor worsened. Statistically significant (P<0.01) correlations were observed between perceived roughness and four acoustic measures, whereas breathiness correlated with aperiodicity and cepstral peak prominence (CPPs). This study supported a two-factor model of ADSD, suggesting perceptual characterization by both hyperadduction and hypoadduction before and after treatment. Responses of the factors to treatment were consistent with previous research. Correlations among perceptual and acoustic variables suggested that multiple acoustic features contributed to the overall impression of roughness. Although CPPs appears to be a partial correlate of perceived
Murphy, Gillian; Greene, Ciara M
Perceptual Load Theory has been proposed as a resolution to the longstanding early versus late selection debate in cognitive psychology. There is much evidence in support of Load Theory but very few applied studies, despite the potential for the model to shed light on everyday attention and distraction. Using a driving simulator, the effect of perceptual and cognitive load on drivers' visual search was assessed. The findings were largely in line with Load Theory, with reduced distractor processing under high perceptual load, but increased distractor processing under high cognitive load. The effect of load on driving behaviour was also analysed, with significant differences in driving behaviour under perceptual and cognitive load. In addition, the effect of perceptual load on drivers' levels of awareness was investigated. High perceptual load significantly increased inattentional blindness and deafness, for stimuli that were both relevant and irrelevant to driving. High perceptual load also increased RTs to hazards. The current study helps to advance Load Theory by illustrating its usefulness outside of traditional paradigms. There are also applied implications for driver safety and roadway design, as the current study suggests that perceptual and cognitive load are important factors in driver attention. (PsycINFO Database Record (c) 2017 APA, all rights reserved).
Erb, Julia; Henry, Molly J; Eisner, Frank; Obleser, Jonas
Listeners show a remarkable ability to quickly adjust to degraded speech input. Here, we aimed to identify the neural mechanisms of such short-term perceptual adaptation. In a sparse-sampling, cardiac-gated functional magnetic resonance imaging (fMRI) acquisition, human listeners heard and repeated back 4-band-vocoded sentences (in which the temporal envelope of the acoustic signal is preserved, while spectral information is highly degraded). Clear-speech trials were included as baseline. An additional fMRI experiment on amplitude modulation rate discrimination quantified the convergence of neural mechanisms that subserve coping with challenging listening conditions for speech and non-speech. First, the degraded speech task revealed an "executive" network (comprising the anterior insula and anterior cingulate cortex), parts of which were also activated in the non-speech discrimination task. Second, trial-by-trial fluctuations in successful comprehension of degraded speech drove hemodynamic signal change in classic "language" areas (bilateral temporal cortices). Third, as listeners perceptually adapted to degraded speech, downregulation in a cortico-striato-thalamo-cortical circuit was observable. The present data highlight differential upregulation and downregulation in auditory-language and executive networks, respectively, with important subcortical contributions when successfully adapting to a challenging listening situation.
Scherer, Demian; Wentura, Dirk
Recent theories assume a mutual facilitation in case of semantic overlap for concepts being activated simultaneously. We provide evidence for this claim using a semantic priming paradigm. To test for mutual facilitation of related concepts, a perceptual identification task was employed, presenting prime-target pairs briefly and masked, with an SOA of 0 ms (i.e., prime and target were presented concurrently, one above the other). Participants were instructed to identify the target. In Experiment 1, a cue defining the target was presented at stimulus onset, whereas in Experiment 2 the cue was not presented before the offset of stimuli. Accordingly, in Experiment 2, a post-cue task was merged with the perceptual identification task. We obtained significant semantic priming effects in both experiments. This result is compatible with the view that two concepts can both be activated in parallel and can mutually facilitate each other if they are related.
Kovačić, Damir; Balaban, Evan
The study was carried out to assess the role that five hearing history variables (chronological age, age at onset of deafness, age of first cochlear implant [CI] activation, duration of CI use, and duration of known deafness) play in the ability of CI users to identify speaker gender. Forty-one juvenile CI users participated in two voice gender identification tasks. In a fixed, single-interval task, subjects listened to a single speech item from one of 20 adult male or 20 adult female speakers and had to identify speaker gender. In an adaptive speech-based voice gender discrimination task with the fundamental frequency difference between the voices as the adaptive parameter, subjects listened to a pair of speech items presented in sequential order, one of which was always spoken by an adult female and the other by an adult male. Subjects had to identify the speech item spoken by the female voice. Correlation and regression analyses between perceptual scores in the two tasks and the hearing history variables were performed. Subjects fell into three performance groups: (1) those who could distinguish voice gender in both tasks, (2) those who could distinguish voice gender in the adaptive but not the fixed task, and (3) those who could not distinguish voice gender in either task. Gender identification performance for single voices in the fixed task was significantly and negatively related to the duration of deafness before cochlear implantation (shorter deafness yielded better performance), whereas performance in the adaptive task was weakly but significantly related to age at first activation of the CI device, with earlier activations yielding better scores. The existence of a group of subjects able to perform adaptive discrimination but unable to identify the gender of singly presented voices demonstrates the potential dissociability of the skills required for these two tasks, suggesting that duration of deafness and age of cochlear implantation could have
D'Souza, Dean; D'Souza, Hana; Johnson, Mark H; Karmiloff-Smith, Annette
Typically-developing (TD) infants can construct unified cross-modal percepts, such as a speaking face, by integrating auditory-visual (AV) information. This skill is a key building block upon which higher-level skills, such as word learning, are built. Because word learning is seriously delayed in most children with neurodevelopmental disorders, we assessed the hypothesis that this delay partly results from a deficit in integrating AV speech cues. AV speech integration has rarely been investigated in neurodevelopmental disorders, and never previously in infants. We probed for the McGurk effect, which occurs when the auditory component of one sound (/ba/) is paired with the visual component of another sound (/ga/), leading to the perception of an illusory third sound (/da/ or /tha/). We measured AV integration in 95 infants/toddlers with Down, fragile X, or Williams syndrome, whom we matched on Chronological and Mental Age to 25 TD infants. We also assessed a more basic AV perceptual ability: sensitivity to matching vs. mismatching AV speech stimuli. Infants with Williams syndrome failed to demonstrate a McGurk effect, indicating poor AV speech integration. Moreover, while the TD children discriminated between matching and mismatching AV stimuli, none of the other groups did, hinting at a basic deficit or delay in AV speech processing, which is likely to constrain subsequent language development. Copyright © 2016 Elsevier Inc. All rights reserved.
Hunter, Karla M.; Westwick, Joshua N.; Haleta, Laurie L.
Despite assessment's prominence in higher education, many communication departments still find its implementation problematic. In this case study, we answer a call for heightened research pertaining to the best practices for assessment of large, multisection, standardized public speaking courses. We demonstrate the ease with which the basic course…
A study examined the results of the use of the "Assessing Motivation To Communicate" (AMTC) computerized program with high school students in Anchorage, Alaska, during the 1995-96 school year. The AMTC consists of two self-assessment instruments: the Personal Report of Communication (PRCA-24) and the Willingness to Communicate (WTC).…
Song, Judy H.; Skoe, Erika; Banai, Karen
We investigated training-related improvements in listening in noise and the biological mechanisms mediating these improvements. Training-related malleability was examined using a program that incorporates cognitively based listening exercises to improve speech-in-noise perception. Before and after training, auditory brainstem responses to a speech syllable were recorded in quiet and multitalker noise from adults who ranged in their speech-in-noise perceptual ability. Controls did not undergo training but were tested at intervals equivalent to the trained subjects. Trained subjects exhibited significant improvements in speech-in-noise perception that were retained 6 months later. Subcortical responses in noise demonstrated training-related enhancements in the encoding of pitch-related cues (the fundamental frequency and the second harmonic), particularly for the time-varying portion of the syllable that is most vulnerable to perceptual disruption (the formant transition region). Subjects with the largest strength of pitch encoding at pretest showed the greatest perceptual improvement. Controls exhibited neither neurophysiological nor perceptual changes. We provide the first demonstration that short-term training can improve the neural representation of cues important for speech-in-noise perception. These results implicate and delineate biological mechanisms contributing to learning success, and they provide a conceptual advance to our understanding of the kind of training experiences that can influence sensory processing in adulthood. PMID:21799207
Peelle, Jonathan E; Sommers, Mitchell S
During face-to-face conversational speech listeners must efficiently process a rapid and complex stream of multisensory information. Visual speech can serve as a critical complement to auditory information because it provides cues to both the timing of the incoming acoustic signal (the amplitude envelope, influencing attention and perceptual sensitivity) and its content (place and manner of articulation, constraining lexical selection). Here we review behavioral and neurophysiological evidence regarding listeners' use of visual speech information. Multisensory integration of audiovisual speech cues improves recognition accuracy, particularly for speech in noise. Even when speech is intelligible based solely on auditory information, adding visual information may reduce the cognitive demands placed on listeners through increasing the precision of prediction. Electrophysiological studies demonstrate that oscillatory cortical entrainment to speech in auditory cortex is enhanced when visual speech is present, increasing sensitivity to important acoustic cues. Neuroimaging studies also suggest increased activity in auditory cortex when congruent visual information is available, but additionally emphasize the involvement of heteromodal regions of posterior superior temporal sulcus as playing a role in integrative processing. We interpret these findings in a framework of temporally-focused lexical competition in which visual speech information affects auditory processing to increase sensitivity to acoustic information through an early integration mechanism, and a late integration stage that incorporates specific information about a speaker's articulators to constrain the number of possible candidates in a spoken utterance. Ultimately it is words compatible with both auditory and visual information that most strongly determine successful speech perception during everyday listening. Thus, audiovisual speech perception is accomplished through multiple stages of integration
Peelle, Jonathan E.; Sommers, Mitchell S.
During face-to-face conversational speech listeners must efficiently process a rapid and complex stream of multisensory information. Visual speech can serve as a critical complement to auditory information because it provides cues to both the timing of the incoming acoustic signal (the amplitude envelope, influencing attention and perceptual sensitivity) and its content (place and manner of articulation, constraining lexical selection). Here we review behavioral and neurophysiological evidence regarding listeners' use of visual speech information. Multisensory integration of audiovisual speech cues improves recognition accuracy, particularly for speech in noise. Even when speech is intelligible based solely on auditory information, adding visual information may reduce the cognitive demands placed on listeners through increasing precision of prediction. Electrophysiological studies demonstrate oscillatory cortical entrainment to speech in auditory cortex is enhanced when visual speech is present, increasing sensitivity to important acoustic cues. Neuroimaging studies also suggest increased activity in auditory cortex when congruent visual information is available, but additionally emphasize the involvement of heteromodal regions of posterior superior temporal sulcus as playing a role in integrative processing. We interpret these findings in a framework of temporally-focused lexical competition in which visual speech information affects auditory processing to increase sensitivity to auditory information through an early integration mechanism, and a late integration stage that incorporates specific information about a speaker's articulators to constrain the number of possible candidates in a spoken utterance. Ultimately it is words compatible with both auditory and visual information that most strongly determine successful speech perception during everyday listening. Thus, audiovisual speech perception is accomplished through multiple stages of integration, supported
Full Text Available Background and Aim: A community-based rehabilitation programme, Sri Ramachandra University-Transforming Faces project, was initiated to provide comprehensive management of communication disorders in individuals with CLP in two districts in Tamil Nadu, India. This community-based programme aims to integrate hospital-based services with the community-based initiatives and to enable long-term care. The programme was initiated in Thiruvannamalai (2005 district and extended to Cuddalore (2011. The aim of this study was to identify needs related to speech among children with CLP, enroled in the above community-based programme in two districts in Tamil Nadu, India. Design: This was a cross–sectional study. Participants and Setting: Ten camps were conducted specifically for speech assessments in two districts over a 12-month period. Two hundred and seventeen individuals (116 males and 101 females> 3 years of age reported to the camps. Methods: Investigator (SLP collected data using the speech protocol of the cleft and craniofacial centre. Descriptive analysis and profiling of speech samples were carried out and reported using universal protocol for reporting speech outcomes. Fleiss' Kappa test was used to estimate inter-rater reliability. Results: In this study, inter-rater reliability between three evaluators revealed good agreement for the parameters: resonance, articulatory errors and voice disorder. About 83.8% (n = 151/180 of the participants demonstrated errors in articulation and 69% (n = 124/180 exhibited abnormal resonance. Velopharyngeal port functioning assessment was completed for 55/124 participants. Conclusion: This study allows us to capture a “snapshot” of children with CLP, living in a specific geographical location, and assist in planning intervention programmes.
Full Text Available Background: Spoken word recognition and speech perception tests in quiet are being used as a routine in assessment of the benefit which children and adult cochlear implant users receive from their devices. Cochlear implant users generally demonstrate high level performances in these test materials as they are able to achieve high level speech perception ability in quiet situations. Although these test materials provide valuable information regarding Cochlear Implant (CI users’ performances in optimal listening conditions, they do not give realistic information regarding performances in adverse listening conditions, which is the case in the everyday environment. Aims: The aim of this study was to assess the speech intelligibility performance of post lingual CI users in the presence of noise at different signal-to-noise ratio with the Matrix Test developed for Turkish language. Study Design: Cross-sectional study. Methods: The thirty post lingual implant user adult subjects, who had been using implants for a minimum of one year, were evaluated with Turkish Matrix test. Subjects’ speech intelligibility was measured using the adaptive and non-adaptive Matrix Test in quiet and noisy environments. Results: The results of the study show a correlation between Pure Tone Average (PTA values of the subjects and Matrix test Speech Reception Threshold (SRT values in the quiet. Hence, it is possible to asses PTA values of CI users using the Matrix Test also. However, no correlations were found between Matrix SRT values in the quiet and Matrix SRT values in noise. Similarly, the correlation between PTA values and intelligibility scores in noise was also not significant. Therefore, it may not be possible to assess the intelligibility performance of CI users using test batteries performed in quiet conditions. Conclusion: The Matrix Test can be used to assess the benefit of CI users from their systems in everyday life, since it is possible to perform
Riecke, Lars; Formisano, Elia; Sorger, Bettina; Baskent, Deniz; Gaudrain, Etienne
Speech is crucial for communication in everyday life. Speech-brain entrainment, the alignment of neural activity to the slow temporal fluctuations (envelope) of acoustic speech input, is a ubiquitous element of current theories of speech processing. Associations between speech-brain entrainment and
Perceptual organization--the processes structuring visual information into coherent units--and visual attention--the processes by which some visual information in a scene is selected--are crucial for the perception of our visual environment and to visuomotor behavior. Recent research points to important relations between attentional and organizational processes. Several studies demonstrated that perceptual organization constrains attentional selectivity, and other studies suggest that attention can also constrain perceptual organization. In this chapter I focus on two aspects of the relationship between perceptual organization and attention. The first addresses the question of whether or not perceptual organization can take place without attention. I present findings demonstrating that some forms of grouping and figure-ground segmentation can occur without attention, whereas others require controlled attentional processing, depending on the processes involved and the conditions prevailing for each process. These findings challenge the traditional view, which assumes that perceptual organization is a unitary entity that operates preattentively. The second issue addresses the question of whether perceptual organization can affect the automatic deployment of attention. I present findings showing that the mere organization of some elements in the visual field by Gestalt factors into a coherent perceptual unit (an "object"), with no abrupt onset or any other unique transient, can capture attention automatically in a stimulus-driven manner. Taken together, the findings discussed in this chapter demonstrate the multifaceted, interactive relations between perceptual organization and visual attention.
The prevalence of autism spectrum disorder (ASD) has increased significantly in the last decade as have treatment choices. Nonetheless, the vastly diverse autism topic includes issues related to naming, description, iden-tification, assessment, and differentiation from other neu-rodevelopmental conditions. ASD issues directly impact speech-language pathologists (SLPs) who often see these children as the second contact, after pediatric medical practitioners. Because of shared symptomology, differentiation among neurodevelopmental disorders is crucial as it impacts treatment, educational choices, and the performance trajectory of affected children. To highlight issues in: identification and differentiation of ASD from other communication and language challenges, the prevalence differences between ASD gender phenotypes, and the insufficient consideration of cultural factors in evaluating ASD in children. A second objective was to propose a tool to assist SLPs in the management of autism in children. A universal resource toolkit development project for SLP communities at large is proposed. The resource is comprised of research-based observation and screening tools for caregivers and educators, as well as parent questionnaires for portraying the children's function in the family, cultural com-munity, and educational setting. © 2017 S. Karger AG, Basel.
Monson, Brian Bruce
While human speech and the human voice generate acoustical energy up to (and beyond) 20 kHz, the energy above approximately 5 kHz has been largely neglected. Evidence is accruing that this high-frequency energy contains perceptual information relevant to speech and voice, including percepts of quality, localization, and intelligibility. The present research was an initial step in the long-range goal of characterizing high-frequency energy in singing voice and speech, with particular regard for its perceptual role and its potential for modification during voice and speech production. In this study, a database of high-fidelity recordings of talkers was created and used for a broad acoustical analysis and general characterization of high-frequency energy, as well as specific characterization of phoneme category, voice and speech intensity level, and mode of production (speech versus singing) by high-frequency energy content. Directionality of radiation of high-frequency energy from the mouth was also examined. The recordings were used for perceptual experiments wherein listeners were asked to discriminate between speech and voice samples that differed only in high-frequency energy content. Listeners were also subjected to gender discrimination tasks, mode-of-production discrimination tasks, and transcription tasks with samples of speech and singing that contained only high-frequency content. The combination of these experiments has revealed that (1) human listeners are able to detect very subtle level changes in high-frequency energy, and (2) human listeners are able to extract significant perceptual information from high-frequency energy.
Full Text Available Perceptual hash functions provide a tool for fast and reliable identification of content. We present new audio hash functions based on summarization of the time-frequency spectral characteristics of an audio document. The proposed hash functions are based on the periodicity series of the fundamental frequency and on singular-value description of the cepstral frequencies. They are found, on one hand, to perform very satisfactorily in identification and verification tests, and on the other hand, to be very resilient to a large variety of attacks. Moreover, we address the issue of security of hashes and propose a keying technique, and thereby a key-dependent hash function.
Morton , Hazel; Gunson , Nancie; Marshall , Diarmid; McInnes , Fergus; Ayres , Andrea; Jack , Mervyn
Abstract This paper describes a comprehensive usability evaluation of an automated telephone banking system which employs text-to-speech (TTS) synthesis in offering additional detail on customers? account transactions. The paper describes a series of four experiments in which TTS was employed to offer an extra level of detail to recent transactions listings within an established banking service which otherwise uses recorded speech from a professional recording artist. Results from ...
Full Text Available This paper deals with the impact of transcoding on the speech quality. We have focused mainly on the transcoding between codecs without the negative influence of the network parameters such as packet loss and delay. It has ensured objective and repeatable results from our measurement. The measurement was performed on the Transcoding Measuring System developed especially for this purpose. The system is based on the open source projects and is useful as a design tool for VoIP system administrators. The paper compares the most used codecs from the transcoding perspective. The multiple transcoding between G711, GSM and G729 codecs were performed and the speech quality of these calls was evaluated. The speech quality was measured by Perceptual Evaluation of Speech Quality method, which provides results in Mean Opinion Score used to describe the speech quality on a scale from 1 to 5. The obtained results indicate periodical speech quality degradation on every transcoding between two codecs.
Lamata, Pablo; Gomez, Enrique J; Hernández, Félix Lamata; Oltra Pastor, Alfonso; Sanchez-Margallo, Francisco Miquel; Del Pozo Guerrero, Francisco
Human perceptual capabilities related to the laparoscopic interaction paradigm are not well known. Its study is important for the design of virtual reality simulators, and for the specification of augmented reality applications that overcome current limitations and provide a supersensing to the surgeon. As part of this work, this article addresses the study of laparoscopic pulling forces. Two definitions are proposed to focalize the problem: the perceptual fidelity boundary, limit of human perceptual capabilities, and the Utile fidelity boundary, that encapsulates the perceived aspects actually used by surgeons to guide an operation. The study is then aimed to define the perceptual fidelity boundary of laparoscopic pulling forces. This is approached with an experimental design in which surgeons assess the resistance against pulling of four different tissues, which are characterized with both in vivo interaction forces and ex vivo tissue biomechanical properties. A logarithmic law of tissue consistency perception is found comparing subjective valorizations with objective parameters. A model of this perception is developed identifying what the main parameters are: the grade of fixation of the organ, the tissue stiffness, the amount of tissue bitten, and the organ mass being pulled. These results are a clear requirement analysis for the force feedback algorithm of a virtual reality laparoscopic simulator. Finally, some discussion is raised about the suitability of augmented reality applications around this surgical gesture.
Robin, Donald A.; Jacks, Adam; Hageman, Carlin; Clark, Heather M.; Woodworth, George
This investigation examined the visuomotor tracking abilities of persons with apraxia of speech (AOS) or conduction aphasia (CA). In addition, tracking performance was correlated with perceptual judgments of speech accuracy. Five individuals with AOS and four with CA served as participants, as well as an equal number of healthy controls matched by…
Casini, Laurence; Burle, Boris; Nguyen, Noel
Time is essential to speech. The duration of speech segments plays a critical role in the perceptual identification of these segments, and therefore in that of spoken words. Here, using a French word identification task, we show that vowels are perceived as shorter when attention is divided between two tasks, as compared to a single task control…
Almeida, Diogo; Poeppel, David; Corina, David
The human auditory system distinguishes speech-like information from general auditory signals in a remarkably fast and efficient way. Combining psychophysics and neurophysiology (MEG), we demonstrate a similar result for the processing of visual information used for language communication in users of sign languages. We demonstrate that the earliest visual cortical responses in deaf signers viewing American Sign Language (ASL) signs show specific modulations to violations of anatomic constraints that would make the sign either possible or impossible to articulate. These neural data are accompanied with a significantly increased perceptual sensitivity to the anatomical incongruity. The differential effects in the early visual evoked potentials arguably reflect an expectation-driven assessment of somatic representational integrity, suggesting that language experience and/or auditory deprivation may shape the neuronal mechanisms underlying the analysis of complex human form. The data demonstrate that the perceptual tuning that underlies the discrimination of language and non-language information is not limited to spoken languages but extends to languages expressed in the visual modality.
Reber, Rolf; Wurtz, Pascal; Zimmermann, Thomas D
Perceptual fluency is the subjective experience of ease with which an incoming stimulus is processed. Although perceptual fluency is assessed by speed of processing, it remains unclear how objective speed is related to subjective experiences of fluency. We present evidence that speed at different stages of the perceptual process contributes to perceptual fluency. In an experiment, figure-ground contrast influenced detection of briefly presented words, but not their identification at longer exposure durations. Conversely, font in which the word was written influenced identification, but not detection. Both contrast and font influenced subjective fluency. These findings suggest that speed of processing at different stages condensed into a unified subjective experience of perceptual fluency.
Erb, J.; Henry, M.J.; Eisner, F.; Obleser, J.
Listeners show a remarkable ability to quickly adjust to degraded speech input. Here, we aimed to identify the neural mechanisms of such short-term perceptual adaptation. In a sparse-sampling, cardiac-gated functional magnetic resonance imaging (fMRI) acquisition, human listeners heard and repeated
Ding, Nai; Patel, Aniruddh D; Chen, Lin; Butler, Henry; Luo, Cheng; Poeppel, David
Speech and music have structured rhythms. Here we discuss a major acoustic correlate of spoken and musical rhythms, the slow (0.25-32Hz) temporal modulations in sound intensity and compare the modulation properties of speech and music. We analyze these modulations using over 25h of speech and over 39h of recordings of Western music. We show that the speech modulation spectrum is highly consistent across 9 languages (including languages with typologically different rhythmic characteristics). A different, but similarly consistent modulation spectrum is observed for music, including classical music played by single instruments of different types, symphonic, jazz, and rock. The temporal modulations of speech and music show broad but well-separated peaks around 5 and 2Hz, respectively. These acoustically dominant time scales may be intrinsic features of speech and music, a possibility which should be investigated using more culturally diverse samples in each domain. Distinct modulation timescales for speech and music could facilitate their perceptual analysis and its neural processing. Copyright © 2017 Elsevier Ltd. All rights reserved.
Cooper, Angela; Brouwer, Susanne; Bradlow, Ann R
Speech processing can often take place in adverse listening conditions that involve the mixing of speech and background noise. In this study, we investigated processing dependencies between background noise and indexical speech features, using a speeded classification paradigm (Garner, 1974; Exp. 1), and whether background noise is encoded and represented in memory for spoken words in a continuous recognition memory paradigm (Exp. 2). Whether or not the noise spectrally overlapped with the speech signal was also manipulated. The results of Experiment 1 indicated that background noise and indexical features of speech (gender, talker identity) cannot be completely segregated during processing, even when the two auditory streams are spectrally nonoverlapping. Perceptual interference was asymmetric, whereby irrelevant indexical feature variation in the speech signal slowed noise classification to a greater extent than irrelevant noise variation slowed speech classification. This asymmetry may stem from the fact that speech features have greater functional relevance to listeners, and are thus more difficult to selectively ignore than background noise. Experiment 2 revealed that a recognition cost for words embedded in different types of background noise on the first and second occurrences only emerged when the noise and the speech signal were spectrally overlapping. Together, these data suggest integral processing of speech and background noise, modulated by the level of processing and the spectral separation of the speech and noise.
Several articles addressing topics in speech research are presented. The topics include: exploring the functional significance of physiological tremor: A biospectroscopic approach; differences between experienced and inexperienced listeners to deaf speech; a language-oriented view of reading and its disabilities; Phonetic factors in letter detection; categorical perception; Short-term recall by deaf signers of American sign language; a common basis for auditory sensory storage in perception and immediate memory; phonological awareness and verbal short-term memory; initiation versus execution time during manual and oral counting by stutterers; trading relations in the perception of speech by five-year-old children; the role of the strap muscles in pitch lowering; phonetic validation of distinctive features; consonants and syllable boundaires; and vowel information in postvocalic frictions.
Anne Birgitta Nilsen
Full Text Available The manifesto of the Norwegian terrorist Anders Behring Breivik is based on the “Eurabia” conspiracy theory. This theory is a key starting point for hate speech amongst many right-wing extremists in Europe, but also has ramifications beyond these environments. In brief, proponents of the Eurabia theory claim that Muslims are occupying Europe and destroying Western culture, with the assistance of the EU and European governments. By contrast, members of Al-Qaeda and other extreme Islamists promote the conspiracy theory “the Crusade” in their hate speech directed against the West. Proponents of the latter theory argue that the West is leading a crusade to eradicate Islam and Muslims, a crusade that is similarly facilitated by their governments. This article presents analyses of texts written by right-wing extremists and Muslim extremists in an effort to shed light on how hate speech promulgates conspiracy theories in order to spread hatred and intolerance.The aim of the article is to contribute to a more thorough understanding of hate speech’s nature by applying rhetorical analysis. Rhetorical analysis is chosen because it offers a means of understanding the persuasive power of speech. It is thus a suitable tool to describe how hate speech works to convince and persuade. The concepts from rhetorical theory used in this article are ethos, logos and pathos. The concept of ethos is used to pinpoint factors that contributed to Osama bin Laden's impact, namely factors that lent credibility to his promotion of the conspiracy theory of the Crusade. In particular, Bin Laden projected common sense, good morals and good will towards his audience. He seemed to have coherent and relevant arguments; he appeared to possess moral credibility; and his use of language demonstrated that he wanted the best for his audience.The concept of pathos is used to define hate speech, since hate speech targets its audience's emotions. In hate speech it is the
Fahrenfort, Johannes J.; Van Leeuwen, Jonathan; Olivers, Christian N.L.; Hogendoorn, Hinze
The visual system has the remarkable ability to integrate fragmentary visual input into a perceptually organized collection of surfaces and objects, a process we refer to as perceptual integration. Despite a long tradition of perception research, it is not known whether access to consciousness is
Dartel, M. van; Sprinkhuizen-Kuyper, I.G.; Postma, E.O.; Herik, H.J. van den
Reactive agents are generally believed to be incapable of coping with perceptual ambiguity (i.e., identical sensory states that require different responses). However, a recent finding suggests that reactive agents can cope with perceptual ambiguity in a simple model (Nolfi, 2002). This paper
Baumgärtel, Regina M; Hu, Hongmei; Krawczyk-Becker, Martin; Marquardt, Daniel; Herzke, Tobias; Coleman, Graham; Adiloğlu, Kamil; Bomke, Katrin; Plotz, Karsten; Gerkmann, Timo; Doclo, Simon; Kollmeier, Birger; Hohmann, Volker; Dietz, Mathias
Several binaural audio signal enhancement algorithms were evaluated with respect to their potential to improve speech intelligibility in noise for users of bilateral cochlear implants (CIs). 50% speech reception thresholds (SRT50) were assessed using an adaptive procedure in three distinct, realistic noise scenarios. All scenarios were highly nonstationary, complex, and included a significant amount of reverberation. Other aspects, such as the perfectly frontal target position, were idealized laboratory settings, allowing the algorithms to perform better than in corresponding real-world conditions. Eight bilaterally implanted CI users, wearing devices from three manufacturers, participated in the study. In all noise conditions, a substantial improvement in SRT50 compared to the unprocessed signal was observed for most of the algorithms tested, with the largest improvements generally provided by binaural minimum variance distortionless response (MVDR) beamforming algorithms. The largest overall improvement in speech intelligibility was achieved by an adaptive binaural MVDR in a spatially separated, single competing talker noise scenario. A no-pre-processing condition and adaptive differential microphones without a binaural link served as the two baseline conditions. SRT50 improvements provided by the binaural MVDR beamformers surpassed the performance of the adaptive differential microphones in most cases. Speech intelligibility improvements predicted by instrumental measures were shown to account for some but not all aspects of the perceptually obtained SRT50 improvements measured in bilaterally implanted CI users. © The Author(s) 2015.
Sebkhi, Nordine; Desai, Dhyey; Islam, Mohammad; Lu, Jun; Wilson, Kimberly; Ghovanloo, Maysam
Speech-language pathologists (SLPs) are trained to correct articulation of people diagnosed with motor speech disorders by analyzing articulators' motion and assessing speech outcome while patients speak. To assist SLPs in this task, we are presenting the multimodal speech capture system (MSCS) that records and displays kinematics of key speech articulators, the tongue and lips, along with voice, using unobtrusive methods. Collected speech modalities, tongue motion, lips gestures, and voice are visualized not only in real-time to provide patients with instant feedback but also offline to allow SLPs to perform post-analysis of articulators' motion, particularly the tongue, with its prominent but hardly visible role in articulation. We describe the MSCS hardware and software components, and demonstrate its basic visualization capabilities by a healthy individual repeating the words "Hello World." A proof-of-concept prototype has been successfully developed for this purpose, and will be used in future clinical studies to evaluate its potential impact on accelerating speech rehabilitation by enabling patients to speak naturally. Pattern matching algorithms to be applied to the collected data can provide patients with quantitative and objective feedback on their speech performance, unlike current methods that are mostly subjective, and may vary from one SLP to another.
Benesty, Jacob; Chen, Jingdong
We live in a noisy world! In all applications (telecommunications, hands-free communications, recording, human-machine interfaces, etc.) that require at least one microphone, the signal of interest is usually contaminated by noise and reverberation. As a result, the microphone signal has to be ""cleaned"" with digital signal processing tools before it is played out, transmitted, or stored.This book is about speech enhancement. Different well-known and state-of-the-art methods for noise reduction, with one or multiple microphones, are discussed. By speech enhancement, we mean not only noise red
Jacobs, Robert A
New technologies and new ways of thinking have recently led to rapid expansions in the study of perceptual learning. We describe three themes shared by many of the nine articles included in this topic on Integrated Approaches to Perceptual Learning. First, perceptual learning cannot be studied on its own because it is closely linked to other aspects of cognition, such as attention, working memory, decision making, and conceptual knowledge. Second, perceptual learning is sensitive to both the stimulus properties of the environment in which an observer exists and to the properties of the tasks that the observer needs to perform. Moreover, the environmental and task properties can be characterized through their statistical regularities. Finally, the study of perceptual learning has important implications for society, including implications for science education and medical rehabilitation. Contributed articles relevant to each theme are summarized. Copyright © 2010 Cognitive Science Society, Inc.
Paredes Gallardo, Andreu; Madsen, Sara Miay Kim; Dau, Torsten
Auditory streaming is a perceptual process by which the human auditory system organizes sounds from different sources into perceptually meaningful elements. Segregation of sound sources is important, among others, for understanding speech in noisy environments, which is especially challenging...
Cowan, Gloria; Khatchadourian, Desiree
Women are more intolerant of hate speech than men. This study examined relationality measures as mediators of gender differences in the perception of the harm of hate speech and the importance of freedom of speech. Participants were 107 male and 123 female college students. Questionnaires assessed the perceived harm of hate speech, the importance…
Ballard, Kirrie J; Azizi, Lamiae; Duffy, Joseph R; McNeil, Malcolm R; Halaki, Mark; O'Dwyer, Nicholas; Layfield, Claire; Scholl, Dominique I; Vogel, Adam P; Robin, Donald A
Diagnosis of the speech motor planning/programming disorder, apraxia of speech (AOS), has proven challenging, largely due to its common co-occurrence with the language-based impairment of aphasia. Currently, diagnosis is based on perceptually identifying and rating the severity of several speech features. It is not known whether all, or a subset of the features, are required for a positive diagnosis. The purpose of this study was to assess predictor variables for the presence of AOS after left-hemisphere stroke, with the goal of increasing diagnostic objectivity and efficiency. This population-based case-control study involved a sample of 72 cases, using the outcome measure of expert judgment on presence of AOS and including a large number of independently collected candidate predictors representing behavioral measures of linguistic, cognitive, nonspeech oral motor, and speech motor ability. We constructed a predictive model using multiple imputation to deal with missing data; the Least Absolute Shrinkage and Selection Operator (Lasso) technique for variable selection to define the most relevant predictors, and bootstrapping to check the model stability and quantify the optimism of the developed model. Two measures were sufficient to distinguish between participants with AOS plus aphasia and those with aphasia alone, (1) a measure of speech errors with words of increasing length and (2) a measure of relative vowel duration in three-syllable words with weak-strong stress pattern (e.g., banana, potato). The model has high discriminative ability to distinguish between cases with and without AOS (c-index=0.93) and good agreement between observed and predicted probabilities (calibration slope=0.94). Some caution is warranted, given the relatively small sample specific to left-hemisphere stroke, and the limitations of imputing missing data. These two speech measures are straightforward to collect and analyse, facilitating use in research and clinical settings. Copyright
Nielsen, Jens Bo; Dau, Torsten
Listeners were given the task to identify the stop-consonant [t] in the test-word "stir" when the word was embedded in a carrier sentence. Reverberation was added to the test-word, but not to the carrier, and the ability to identify the [t] decreased because the amplitude modulations associated...... with the [t] were smeared. When a similar amount of reverberation was also added to the carrier sentence, the listeners' ability to identify the stop-consonant was restored. This phenomenon has in previous research been considered as evidence for an extrinsic compensation mechanism for reverberation...... an interference effect that impedes the identification of the stop-consonant. These findings raise doubts about the existence of the compensation mechanism....
Heath, Steve M.; Bishop, Dorothy V. M.; Hogben, John H.; Roach, Neil W.
An influential causal theory attributes dyslexia to visual and/or auditory perceptual deficits. This theory derives from group differences between individuals with dyslexia and controls on a range of psychophysical tasks, but there is substantial variation, both between individuals within a group and from task to task. We addressed two questions. First, do psychophysical measures have sufficient reliability to assess perceptual deficits in individuals? Second, do different psychophysical task...
Speech intelligibility (SI) is important for different fields of research, engineering and diagnostics in order to quantify very different phenomena like the quality of recordings, communication and playback devices, the reverberation of auditoria, characteristics of hearing impairment, benefit using hearing aids or combinations of these things.
Morin, Alain; Hamper, Breanne
Inner speech involvement in self-reflection was examined by reviewing 130 studies assessing brain activation during self-referential processing in key self-domains: agency, self-recognition, emotions, personality traits, autobiographical memory, and miscellaneous (e.g., prospection, judgments). The left inferior frontal gyrus (LIFG) has been shown to be reliably recruited during inner speech production. The percentage of studies reporting LIFG activity for each self-dimension was calculated. Fifty five percent of all studies reviewed indicated LIFG (and presumably inner speech) activity during self-reflection tasks; on average LIFG activation is observed 16% of the time during completion of non-self tasks (e.g., attention, perception). The highest LIFG activation rate was observed during retrieval of autobiographical information. The LIFG was significantly more recruited during conceptual tasks (e.g., prospection, traits) than during perceptual tasks (agency and self-recognition). This constitutes additional evidence supporting the idea of a participation of inner speech in self-related thinking.
Shriberg, Lawrence D.; Lohmeier, Heather L.; Strand, Edythe A.; Jakielski, Kathy J.
A central question in Childhood Apraxia of Speech (CAS) is whether the core phenotype is limited to transcoding (planning/programming) deficits or if speakers with CAS also have deficits in auditory-perceptual "encoding" (representational) and/or "memory" (storage and retrieval of representations) processes. We addressed this and other questions…
A connected digit speech recognition is important in many applications such as automated banking system, catalogue-dialing, automatic data entry, automated banking system, etc. This paper presents an optimum speaker-independent connected digit recognizer for Malayalam language. The system employs Perceptual ...
Accounts of speech perception disagree on whether listeners perceive the acoustic signal (Diehl, Lotto, & Holt, 2004) or the vocal tract gestures that produce the signal (e.g., Fowler, 1986). In this dissertation, I outline a research program using a phenomenon called "perceptual compensation for coarticulation" (Mann, 1980) to examine this…
A. Eerland (Anita); J.A.A. Engelen (Jan A.A.); R.A. Zwaan (Rolf)
textabstractLanguage can be viewed as a set of cues that modulate the comprehender's thought processes. It is a very subtle instrument. For example, the literature suggests that people perceive direct speech (e.g., Joanne said: 'I went out for dinner last night') as more vivid and perceptually
Sadakata, M.; Zanden, L.D.T. van der; Sekiyama, K.
The current study reports specific cases in which a positive transfer of perceptual ability from the music domain to the language domain occurs. We tested whether musical training enhances discrimination and identification performance of L2 speech sounds (timing features, nasal consonants and
Babel, Molly; McGuire, Grant
Research has shown that processing dynamics on the perceiver's end determine aesthetic pleasure. Specifically, typical objects, which are processed more fluently, are perceived as more attractive. We extend this notion of perceptual fluency to judgments of vocal aesthetics. Vocal attractiveness has traditionally been examined with respect to sexual dimorphism and the apparent size of a talker, as reconstructed from the acoustic signal, despite evidence that gender-specific speech patterns are learned social behaviors. In this study, we report on a series of three experiments using 60 voices (30 females) to compare the relationship between judgments of vocal attractiveness, stereotypicality, and gender categorization fluency. Our results indicate that attractiveness and stereotypicality are highly correlated for female and male voices. Stereotypicality and categorization fluency were also correlated for male voices, but not female voices. Crucially, stereotypicality and categorization fluency interacted to predict attractiveness, suggesting the role of perceptual fluency is present, but nuanced, in judgments of human voices. © 2014 Cognitive Science Society, Inc.
Petersen, Kim T; Hansen, Steffen Duus; Sørensen, John Aasted
The increasing performance requirements of multimedia modalities, carrying speech, audio, video, image, and graphics emphasize the need for assessment methods of the total quality of a multimedia system and methods for simultaneous analysis of the system components. It is important to take...
Shriberg, Lawrence D.; Strand, Edythe A.; Fourakis, Marios; Jakielski, Kathy J.; Hall, Sheryl D.; Karlsson, Heather B.; Mabie, Heather L.; McSweeny, Jane L.; Tilkens, Christie M.; Wilson, David L.
Purpose: The purpose of this 2nd article in this supplement is to report validity support findings for the Pause Marker (PM), a proposed single-sign diagnostic marker of childhood apraxia of speech (CAS). Method: PM scores and additional perceptual and acoustic measures were obtained from 296 participants in cohorts with idiopathic and…
...-Speech Services for Individuals with Hearing and Speech Disabilities, Report and Order (Order), document...] Speech-to-Speech and Internet Protocol (IP) Speech-to-Speech Telecommunications Relay Services; Telecommunications Relay Services and Speech-to-Speech Services for Individuals With Hearing and Speech Disabilities...
Cler, Gabriel J; Lee, Jackson C; Mittelman, Talia; Stepp, Cara E; Bohland, Jason W
Delayed auditory feedback (DAF) causes speakers to become disfluent and make phonological errors. Methods for assessing the kinematics of speech errors are lacking, with most DAF studies relying on auditory perceptual analyses, which may be problematic, as errors judged to be categorical may actually represent blends of sounds or articulatory errors. Eight typical speakers produced nonsense syllable sequences under normal and DAF (200 ms). Lip and tongue kinematics were captured with electromagnetic articulography. Time-locked acoustic recordings were transcribed, and the kinematics of utterances with and without perceived errors were analyzed with existing and novel quantitative methods. New multivariate measures showed that for 5 participants, kinematic variability for productions perceived to be error free was significantly increased under delay; these results were validated by using the spatiotemporal index measure. Analysis of error trials revealed both typical productions of a nontarget syllable and productions with articulatory kinematics that incorporated aspects of both the target and the perceived utterance. This study is among the first to characterize articulatory changes under DAF and provides evidence for different classes of speech errors, which may not be perceptually salient. New methods were developed that may aid visualization and analysis of large kinematic data sets. https://doi.org/10.23641/asha.5103067.
Van Engen, Kristin J
This study investigated whether clear speech reduces the cognitive demands of lexical competition by crossing speaking style with lexical difficulty. Younger and older adults identified more words in clear versus conversational speech and more easy words than hard words. An initial analysis suggested that the effect of lexical difficulty was reduced in clear speech, but more detailed analyses within each age group showed this interaction was significant only for older adults. The results also showed that both groups improved over the course of the task and that clear speech was particularly helpful for individuals with poorer hearing: for younger adults, clear speech eliminated hearing-related differences that affected performance on conversational speech. For older adults, clear speech was generally more helpful to listeners with poorer hearing. These results suggest that clear speech affords perceptual benefits to all listeners and, for older adults, mitigates the cognitive challenge associated with identifying words with many phonological neighbors.
Full Text Available Speech recorded from a throat microphone is robust to the surrounding noise, but sounds unnatural unlike the speech recorded from a close-speaking microphone. This paper addresses the issue of improving the perceptual quality of the throat microphone speech by mapping the speech spectra from the throat microphone to the close-speaking microphone. A neural network model is used to capture the speaker-dependent functional relationship between the feature vectors (cepstral coefficients of the two speech signals. A method is proposed to ensure the stability of the all-pole synthesis filter. Objective evaluations indicate the effectiveness of the proposed mapping scheme. The advantage of this method is that the model gives a smooth estimate of the spectra of the close-speaking microphone speech. No distortions are perceived in the reconstructed speech. This mapping technique is also used for bandwidth extension of telephone speech.
Full Text Available Speech perception is known to rely on both auditory and visual information. However, sound specific somatosensory input has been shown also to influence speech perceptual processing (Ito et al., 2009. In the present study we addressed further the relationship between somatosensory information and speech perceptual processing by addressing the hypothesis that the temporal relationship between orofacial movement and sound processing contributes to somatosensory-auditory interaction in speech perception. We examined the changes in event-related potentials in response to multisensory synchronous (simultaneous and asynchronous (90 ms lag and lead somatosensory and auditory stimulation compared to individual unisensory auditory and somatosensory stimulation alone. We used a robotic device to apply facial skin somatosensory deformations that were similar in timing and duration to those experienced in speech production. Following synchronous multisensory stimulation the amplitude of the event-related potential was reliably different from the two unisensory potentials. More importantly, the magnitude of the event-related potential difference varied as a function of the relative timing of the somatosensory-auditory stimulation. Event-related activity change due to stimulus timing was seen between 160-220 ms following somatosensory onset, mostly around the parietal area. The results demonstrate a dynamic modulation of somatosensory-auditory convergence and suggest the contribution of somatosensory information for speech processing process is dependent on the specific temporal order of sensory inputs in speech production.
O'Brien, Mary Grantham
In early stages of classroom language learning, many adult second language (L2) learners communicate primarily with one another, yet we know little about which speech stream characteristics learners tune into or the extent to which they understand this lingua franca communication. In the current study, 25 native English speakers learning German as…
Bonnard, Damien; Lautissier, Sylvie; Bosset-Audoit, Amélie; Coriat, Géraldine; Beraha, Max; Maunoury, Antoine; Martel, Jacques; Darrouzet, Vincent; Bébéar, Jean-Pierre; Dauman, René
An alternative to bilateral cochlear implantation is offered by the Neurelec Digisonic(®) SP Binaural cochlear implant, which allows stimulation of both cochleae within a single device. The purpose of this prospective study was to compare a group of Neurelec Digisonic(®) SP Binaural implant users (denoted BINAURAL group, n = 7) with a group of bilateral adult cochlear implant users (denoted BILATERAL group, n = 6) in terms of speech perception, sound localization, and self-assessment of health status and hearing disability. Speech perception was assessed using word recognition at 60 dB SPL in quiet and in a 'cocktail party' noise delivered through five loudspeakers in the hemi-sound field facing the patient (signal-to-noise ratio = +10 dB). The sound localization task was to determine the source of a sound stimulus among five speakers positioned between -90° and +90° from midline. Change in health status was assessed using the Glasgow Benefit Inventory and hearing disability was evaluated with the Abbreviated Profile of Hearing Aid Benefit. Speech perception was not statistically different between the two groups, even though there was a trend in favor of the BINAURAL group (mean percent word recognition in the BINAURAL and BILATERAL groups: 70 vs. 56.7% in quiet, 55.7 vs. 43.3% in noise). There was also no significant difference with regard to performance in sound localization and self-assessment of health status and hearing disability. On the basis of the BINAURAL group's performance in hearing tasks involving the detection of interaural differences, implantation with the Neurelec Digisonic(®) SP Binaural implant may be considered to restore effective binaural hearing. Based on these first comparative results, this device seems to provide benefits similar to those of traditional bilateral cochlear implantation, with a new approach to stimulate both auditory nerves. Copyright © 2013 S. Karger AG, Basel.
Lotto, A J; Kluender, K R
When members of a series of synthesized stop consonants varying acoustically in F3 characteristics and varying perceptually from /da/ to /ga/ are preceded by /al/, subjects report hearing more /ga/ syllables relative to when each member is preceded by /ar/ (Mann, 1980). It has been suggested that this result demonstrates the existence of a mechanism that compensates for coarticulation via tacit knowledge of articulatory dynamics and constraints, or through perceptual recovery of vocal-tract dynamics. The present study was designed to assess the degree to which these perceptual effects are specific to qualities of human articulatory sources. In three experiments, series of consonant-vowel (CV) stimuli varying in F3-onset frequency (/da/-/ga/) were preceded by speech versions or nonspeech analogues of /al/ and /ar/. The effect of liquid identity on stop consonant labeling remained when the preceding VC was produced by a female speaker and the CV syllable was modeled after a male speaker's productions. Labeling boundaries also shifted when the CV was preceded by a sine wave glide modeled after F3 characteristics of /al/ and /ar/. Identifications shifted even when the preceding sine wave was of constant frequency equal to the offset frequency of F3 from a natural production. These results suggest an explanation in terms of general auditory processes as opposed to recovery of or knowledge of specific articulatory dynamics.
Full Text Available Load Theory (Lavie, 1995; 2005 states that the level of perceptual load in a task (i.e. the amount of information involved in processing task-relevant stimuli determines the efficiency of selective attention. There is evidence that perceptual load affects distractor processing, with increased inattentional blindness under high load. Given that high load can result in individuals failing to report seeing obvious objects, it is conceivable that load may also impair memory for the scene. The current study is the first to assess the effect of perceptual load on eyewitness memory. Across three experiments (two video-based and one in a driving simulator, the effect of perceptual load on eyewitness memory was assessed. The results showed that eyewitnesses were less accurate under high load, in particular for peripheral details. For example, memory for the central character in the video was not affected by load but memory for a witness who passed by the window at the edge of the scene was significantly worse under high load. High load memories were also more open to suggestion, showing increased susceptibility to leading questions. High visual perceptual load also affected recall for auditory information, illustrating a possible cross-modal perceptual load effect on memory accuracy. These results have implications for eyewitness memory researchers and forensic professionals.
Murphy, Gillian; Greene, Ciara M
Load Theory (Lavie, 1995, 2005) states that the level of perceptual load in a task (i.e., the amount of information involved in processing task-relevant stimuli) determines the efficiency of selective attention. There is evidence that perceptual load affects distractor processing, with increased inattentional blindness under high load. Given that high load can result in individuals failing to report seeing obvious objects, it is conceivable that load may also impair memory for the scene. The current study is the first to assess the effect of perceptual load on eyewitness memory. Across three experiments (two video-based and one in a driving simulator), the effect of perceptual load on eyewitness memory was assessed. The results showed that eyewitnesses were less accurate under high load, in particular for peripheral details. For example, memory for the central character in the video was not affected by load but memory for a witness who passed by the window at the edge of the scene was significantly worse under high load. High load memories were also more open to suggestion, showing increased susceptibility to leading questions. High visual perceptual load also affected recall for auditory information, illustrating a possible cross-modal perceptual load effect on memory accuracy. These results have implications for eyewitness memory researchers and forensic professionals.
Fahrenfort, Johannes J; van Leeuwen, Jonathan; Olivers, Christian N L; Hogendoorn, Hinze
The visual system has the remarkable ability to integrate fragmentary visual input into a perceptually organized collection of surfaces and objects, a process we refer to as perceptual integration. Despite a long tradition of perception research, it is not known whether access to consciousness is required to complete perceptual integration. To investigate this question, we manipulated access to consciousness using the attentional blink. We show that, behaviorally, the attentional blink impairs conscious decisions about the presence of integrated surface structure from fragmented input. However, despite conscious access being impaired, the ability to decode the presence of integrated percepts remains intact, as shown through multivariate classification analyses of electroencephalogram (EEG) data. In contrast, when disrupting perception through masking, decisions about integrated percepts and decoding of integrated percepts are impaired in tandem, while leaving feedforward representations intact. Together, these data show that access consciousness and perceptual integration can be dissociated.
Kant, Anjali R; Banik, Arun A
The present study aims to use the model-based test Lexical Neighborhood Test (LNT), to assess speech recognition performance in early and late implanted hearing impaired children with normal and malformed cochlea. The LNT was administered to 46 children with congenital (prelingual) bilateral severe-profound sensorineural hearing loss, using Nucleus 24 cochlear implant. The children were grouped into Group 1-(early implantees with normal cochlea-EI); n = 15, 31/2-61/2 years of age; mean age at implantation-3½ years. Group 2-(late implantees with normal cochlea-LI); n = 15, 6-12 years of age; mean age at implantation-5 years. Group 3-(early implantees with malformed cochlea-EIMC); n = 9; 4.9-10.6 years of age; mean age at implantation-3.10 years. Group 4-(late implantees with malformed cochlea-LIMC); n = 7; 7-12.6 years of age; mean age at implantation-6.3 years. The following were the malformations: dysplastic cochlea, common cavity, Mondini's, incomplete partition-1 and 2 (IP-1 and 2), enlarged IAC. The children were instructed to repeat the words on hearing them. Means of the word and phoneme scores were computed. The LNT can also be used to assess speech recognition performance of hearing impaired children with malformed cochlea. When both easy and hard lists of LNT are considered, although, late implantees (with or without normal cochlea), have achieved higher word scores than early implantees, the differences are not statistically significant. Using LNT for assessing speech recognition enables a quantitative as well as descriptive report of phonological processes used by the children.
Dallas Erik Jonsson School of Engineering & Computer Science EC32 P.O. Box 830688 Richardson, Texas 75083-0688 8. PERFORMING ORGANIZATION REPORT...87 4.3 Whisper Based Processing for ASR ………………………………………….…. 92 5.0 Task 5: SPEAKER STATE ASSESSMENT/ ENVIROMENTAL SNIFFING (SSA/ENVS...Dec. 7-10, 2014  S. Amuda, H. Boril, A. Sangwan, J.H.L. Hansen, T.S. Ibiyemi, “ Engineering analysis and recognition of Nigerian English: An
Danielson, D Kyle; Bruderer, Alison G; Kandhadai, Padmapriya; Vatikiotis-Bateson, Eric; Werker, Janet F
The period between six and 12 months is a sensitive period for language learning during which infants undergo auditory perceptual attunement, and recent results indicate that this sensitive period may exist across sensory modalities. We tested infants at three stages of perceptual attunement (six, nine, and 11 months) to determine 1) whether they were sensitive to the congruence between heard and seen speech stimuli in an unfamiliar language, and 2) whether familiarization with congruent audiovisual speech could boost subsequent non-native auditory discrimination. Infants at six- and nine-, but not 11-months, detected audiovisual congruence of non-native syllables. Familiarization to incongruent, but not congruent, audiovisual speech changed auditory discrimination at test for six-month-olds but not nine- or 11-month-olds. These results advance the proposal that speech perception is audiovisual from early in ontogeny, and that the sensitive period for audiovisual speech perception may last somewhat longer than that for auditory perception alone.
Xu, Jie Jie; Chen, Xi; Lu, Mei Ping; Qiao, Ming Zhe
To investigate the perceptual and acoustic characteristics of the pneumatic artificial larynx (PAL) and evaluate its speech ability and clinical value. Prospective study. The study was conducted in the Voice Lab, Department of Otorhinolaryngology, The First Affiliated Hospital of Nanjing Medical University. Forty-six laryngectomy patients using the PAL were rated for intelligibility and fluency of speech. The voice signals of sustained vowel /a/ for 40 healthy controls and 42 successful patients using the PAL were measured by a computer system. The acoustic parameters and sound spectrographs were analyzed and compared between the two groups. Forty-two of 46 patients using the PAL (91.3%) acquired successful speech capability. The intelligibility scores of 42 successful PAL speakers ranged from 71 to 95 percent, and the intelligibility range of four unsuccessful speakers was 30 to 50 percent. The fluency was judged as good or excellent in 42 successful patients, and poor or fair in four unsuccessful patients. There was no significant difference in average fundamental frequency, maximum intensity, jitter, shimmer, and normalized noise energy (NNE) between 42 successful PAL speakers and 40 healthy controls, while the maximum phonation time (MPT) of PAL speakers was slightly lower than that of the controls. The sound spectrographs of the patients using the PAL approximated those of the healthy controls. The PAL has the advantage of a high percentage of successful vocal rehabilitation. PAL speech is fluent and intelligible. The acoustic characteristics of the PAL are similar to those of a normal voice.
Mickael L. D. Deroche
Full Text Available The inferior parietal lobe (IPL is a region of the cortex believed to participate in speech motor learning. In this study, we investigated whether transcranial direct current stimulation (tDCS of the IPL could influence the extent to which healthy adults (1 adapted to a sensory alteration of their own auditory feedback, and (2 changed their perceptual representation. Seventy subjects completed three tasks: a baseline perceptual task that located the phonetic boundary between the vowels /e/ and /a/; a sensorimotor adaptation task in which subjects produced the word “head” under conditions of altered or unaltered feedback; and a post-adaptation perceptual task identical to the first. Subjects were allocated to four groups which differed in current polarity and feedback manipulation. Subjects who received anodal tDCS to their IPL (i.e., presumably increasing cortical excitability lowered their first formant frequency (F1 by 10% in opposition to the upward shift in F1 in their auditory feedback. Subjects who received the same stimulation with unaltered feedback did not change their production. Subjects who received cathodal tDCS to their IPL (i.e., presumably decreasing cortical excitability showed a 5% adaptation to the F1 alteration similar to subjects who received sham tDCS. A subset of subjects returned a few days later to reiterate the same protocol but without tDCS, enabling assessment of any facilitatory effects of the previous tDCS. All subjects exhibited a 5% adaptation effect. In addition, across all subjects and for the two recording sessions, the phonetic boundary was shifted toward the vowel /e/ being repeated, consistently with the selective adaptation effect, but a correlation between perception and production suggested that anodal tDCS had enhanced this perceptual shift. In conclusion, we successfully demonstrated that anodal tDCS could (1 enhance the motor adaptation to a sensory alteration, and (2 potentially affect the
... disorder; Voice disorders; Vocal disorders; Disfluency; Communication disorder - speech disorder; Speech disorder - stuttering ... evaluation tools that can help identify and diagnose speech disorders: Denver Articulation Screening Examination Goldman-Fristoe Test of ...
Mishra, Jyoti; Rolle, Camarin; Gazzaley, Adam
Healthy aging is associated with a decline in basic perceptual abilities, as well as higher-level cognitive functions such as working memory. In a recent perceptual training study using moving sweeps of Gabor stimuli, Berry et al. (2010) observed that older adults significantly improved discrimination abilities on the most challenging perceptual tasks that presented paired sweeps at rapid rates of 5 and 10 Hz. Berry et al. further showed that this perceptual training engendered transfer-of-benefit to an untrained working memory task. Here, we investigated the neural underpinnings of the improvements in these perceptual tasks, as assessed by event-related potential (ERP) recordings. Early visual ERP components time-locked to stimulus onset were compared pre- and post-training, as well as relative to a no-contact control group. The visual N1 and N2 components were significantly enhanced after training, and the N1 change correlated with improvements in perceptual discrimination on the task. Further, the change observed for the N1 and N2 was associated with the rapidity of the perceptual challenge; the visual N1 (120-150 ms) was enhanced post-training for 10 Hz sweep pairs, while the N2 (240-280 ms) was enhanced for the 5 Hz sweep pairs. We speculate that these observed post-training neural enhancements reflect improvements by older adults in the allocation of attention that is required to accurately dissociate perceptually overlapping stimuli when presented in rapid sequence. This article is part of a Special Issue entitled SI: Memory Å. Copyright © 2014 Elsevier B.V. All rights reserved.
Haley, Katarina L; Shafer, Jennifer N; Harmon, Tyson G; Jacks, Adam
This study was intended to document speech recovery for 1 person with acquired apraxia of speech quantitatively and on the basis of her lived experience. The second author sustained a traumatic brain injury that resulted in acquired apraxia of speech. Over a 2-year period, she documented her recovery through 22 video-recorded monologues. We analyzed these monologues using a combination of auditory perceptual, acoustic, and qualitative methods. Recovery was evident for all quantitative variables examined. For speech sound production, the recovery was most prominent during the first 3 months, but slower improvement was evident for many months. Measures of speaking rate, fluency, and prosody changed more gradually throughout the entire period. A qualitative analysis of topics addressed in the monologues was consistent with the quantitative speech recovery and indicated a subjective dynamic relationship between accuracy and rate, an observation that several factors made speech sound production variable, and a persisting need for cognitive effort while speaking. Speech features improved over an extended time, but the recovery trajectories differed, indicating dynamic reorganization of the underlying speech production system. The relationship among speech dimensions should be examined in other cases and in population samples. The combination of quantitative and qualitative analysis methods offers advantages for understanding clinically relevant aspects of recovery.
Hashizume, Hiroshi; Taki, Yasuyuki; Sassa, Yuko; Thyreau, Benjamin; Asano, Michiko; Asano, Kohei; Takeuchi, Hikaru; Nouchi, Rui; Kotozaki, Yuka; Jeong, Hyeonjeong; Sugiura, Motoaki; Kawashima, Ryuta
Older children are more successful at producing unfamiliar, non-native speech sounds than younger children during the initial stages of learning. To reveal the neuronal underpinning of the age-related increase in the accuracy of non-native speech production, we examined the developmental changes in activation involved in the production of novel speech sounds using functional magnetic resonance imaging. Healthy right-handed children (aged 6-18 years) were scanned while performing an overt repetition task and a perceptual task involving aurally presented non-native and native syllables. Productions of non-native speech sounds were recorded and evaluated by native speakers. The mouth regions in the bilateral primary sensorimotor areas were activated more significantly during the repetition task relative to the perceptual task. The hemodynamic response in the left inferior frontal gyrus pars opercularis (IFG pOp) specific to non-native speech sound production (defined by prior hypothesis) increased with age. Additionally, the accuracy of non-native speech sound production increased with age. These results provide the first evidence of developmental changes in the neural processes underlying the production of novel speech sounds. Our data further suggest that the recruitment of the left IFG pOp during the production of novel speech sounds was possibly enhanced due to the maturation of the neuronal circuits needed for speech motor planning. This, in turn, would lead to improvement in the ability to immediately imitate non-native speech. Copyright © 2014 Wiley Periodicals, Inc.
Full Text Available This paper presents a method of speech recognition by pattern recognition techniques. Learning consists in determining the unique characteristics of a word (cepstral coefficients by eliminating those characteristics that are different from one word to another. For learning and recognition, the system will build a dictionary of words by determining the characteristics of each word to be used in the recognition. Determining the characteristics of an audio signal consists in the following steps: noise removal, sampling it, applying Hamming window, switching to frequency domain through Fourier transform, calculating the magnitude spectrum, filtering data, determining cepstral coefficients.
Full Text Available This is to study the methodological foundations of perceptual mapping in printing industry enterprises. This research has a practice focus which affects the choice of its methodological framework. The authors use such scientific research as analysis of cause-effect relationships, synthesis, problem analysis, expert evaluation and image visualization methods. In this paper, the authors present their assessment of the competitive environment of major printing industry companies in Kirov oblast; their assessment employs perceptual mapping enables by Minitab 14. This technique can be used by experts in the field of marketing and branding to assess the competitive environment in any market. The object of research is printing industry in Kirov oblast. The most important conclusion of this study is that in perceptual mapping, all the parameters are integrated in a single system and provide a more objective view of the company’s market situation.
Full Text Available A growing body of evidence shows that brain oscillations track speech. This mechanism is thought to maximise processing efficiency by allocating resources to important speech information, effectively parsing speech into units of appropriate granularity for further decoding. However, some aspects of this mechanism remain unclear. First, while periodicity is an intrinsic property of this physiological mechanism, speech is only quasi-periodic, so it is not clear whether periodicity would present an advantage in processing. Second, it is still a matter of debate which aspect of speech triggers or maintains cortical entrainment, from bottom-up cues such as fluctuations of the amplitude envelope of speech to higher level linguistic cues such as syntactic structure. We present data from a behavioural experiment assessing the effect of isochronous retiming of speech on speech perception in noise. Two types of anchor points were defined for retiming speech, namely syllable onsets and amplitude envelope peaks. For each anchor point type, retiming was implemented at two hierarchical levels, a slow time scale around 2.5 Hz and a fast time scale around 4 Hz. Results show that while any temporal distortion resulted in reduced speech intelligibility, isochronous speech anchored to P-centers (approximated by stressed syllable vowel onsets was significantly more intelligible than a matched anisochronous retiming, suggesting a facilitative role of periodicity defined on linguistically motivated units in processing speech in noise.
Kates, James M; Arehart, Kathryn H
This paper uses mutual information to quantify the relationship between envelope modulation fidelity and perceptual responses. Data from several previous experiments that measured speech intelligibility, speech quality, and music quality are evaluated for normal-hearing and hearing-impaired listeners. A model of the auditory periphery is used to generate envelope signals, and envelope modulation fidelity is calculated using the normalized cross-covariance of the degraded signal envelope with that of a reference signal. Two procedures are used to describe the envelope modulation: (1) modulation within each auditory frequency band and (2) spectro-temporal processing that analyzes the modulation of spectral ripple components fit to successive short-time spectra. The results indicate that low modulation rates provide the highest information for intelligibility, while high modulation rates provide the highest information for speech and music quality. The low-to-mid auditory frequencies are most important for intelligibility, while mid frequencies are most important for speech quality and high frequencies are most important for music quality. Differences between the spectral ripple components used for the spectro-temporal analysis were not significant in five of the six experimental conditions evaluated. The results indicate that different modulation-rate and auditory-frequency weights may be appropriate for indices designed to predict different types of perceptual relationships.
Poeppel, David; Idsardi, William J; van Wassenhove, Virginie
Speech perception consists of a set of computations that take continuously varying acoustic waveforms as input and generate discrete representations that make contact with the lexical representations stored in long-term memory as output. Because the perceptual objects that are recognized by the speech perception enter into subsequent linguistic computation, the format that is used for lexical representation and processing fundamentally constrains the speech perceptual processes. Consequently, theories of speech perception must, at some level, be tightly linked to theories of lexical representation. Minimally, speech perception must yield representations that smoothly and rapidly interface with stored lexical items. Adopting the perspective of Marr, we argue and provide neurobiological and psychophysical evidence for the following research programme. First, at the implementational level, speech perception is a multi-time resolution process, with perceptual analyses occurring concurrently on at least two time scales (approx. 20-80 ms, approx. 150-300 ms), commensurate with (sub)segmental and syllabic analyses, respectively. Second, at the algorithmic level, we suggest that perception proceeds on the basis of internal forward models, or uses an 'analysis-by-synthesis' approach. Third, at the computational level (in the sense of Marr), the theory of lexical representation that we adopt is principally informed by phonological research and assumes that words are represented in the mental lexicon in terms of sequences of discrete segments composed of distinctive features. One important goal of the research programme is to develop linking hypotheses between putative neurobiological primitives (e.g. temporal primitives) and those primitives derived from linguistic inquiry, to arrive ultimately at a biologically sensible and theoretically satisfying model of representation and computation in speech.
Amitay, Sygal; Zhang, Yu-Xuan; Jones, Pete R; Moore, David R
Perceptual learning has traditionally been portrayed as a bottom-up phenomenon that improves encoding or decoding of the trained stimulus. Cognitive skills such as attention and memory are thought to drive, guide and modulate learning but are, with notable exceptions, not generally considered to undergo changes themselves as a result of training with simple perceptual tasks. Moreover, shifts in threshold are interpreted as shifts in perceptual sensitivity, with no consideration for non-sensory factors (such as response bias) that may contribute to these changes. Accumulating evidence from our own research and others shows that perceptual learning is a conglomeration of effects, with training-induced changes ranging from the lowest (noise reduction in the phase locking of auditory signals) to the highest (working memory capacity) level of processing, and includes contributions from non-sensory factors that affect decision making even on a "simple" auditory task such as frequency discrimination. We discuss our emerging view of learning as a process that increases the signal-to-noise ratio associated with perceptual tasks by tackling noise sources and inefficiencies that cause performance bottlenecks, and present some implications for training populations other than young, smart, attentive and highly-motivated college students. Crown Copyright © 2013. Published by Elsevier Ltd. All rights reserved.
Gao, Yayue; Cao, Shuyang; Qu, Tianshu; Wu, Xihong; Li, Haifeng; Zhang, Jinsheng; Li, Liang
In noisy, multipeople talking environments such as a cocktail party, listeners can use various perceptual and/or cognitive cues to improve recognition of target speech against masking, particularly informational masking. Previous studies have shown that temporally prepresented voice cues (voice primes) improve recognition of target speech against speech masking but not noise masking. This study investigated whether static face image primes that have become target-voice associated (i.e., facial images linked through associative learning with voices reciting the target speech) can be used by listeners to unmask speech. The results showed that in 32 normal-hearing younger adults, temporally prepresenting a voice-priming sentence with the same voice reciting the target sentence significantly improved the recognition of target speech that was masked by irrelevant two-talker speech. When a person's face photograph image became associated with the voice reciting the target speech by learning, temporally prepresenting the target-voice-associated face image significantly improved recognition of target speech against speech masking, particularly for the last two keywords in the target sentence. Moreover, speech-recognition performance under the voice-priming condition was significantly correlated to that under the face-priming condition. The results suggest that learned facial information on talker identity plays an important role in identifying the target-talker's voice and facilitating selective attention to the target-speech stream against the masking-speech stream. © 2014 The Institute of Psychology, Chinese Academy of Sciences and Wiley Publishing Asia Pty Ltd.
Ali Akbar Tahaei
Full Text Available Auditory processing deficits have been hypothesized as an underlying mechanism for stuttering. Previous studies have demonstrated abnormal responses in subjects with persistent developmental stuttering (PDS at the higher level of the central auditory system using speech stimuli. Recently, the potential usefulness of speech evoked auditory brainstem responses in central auditory processing disorders has been emphasized. The current study used the speech evoked ABR to investigate the hypothesis that subjects with PDS have specific auditory perceptual dysfunction. Objectives. To determine whether brainstem responses to speech stimuli differ between PDS subjects and normal fluent speakers. Methods. Twenty-five subjects with PDS participated in this study. The speech-ABRs were elicited by the 5-formant synthesized syllable/da/, with duration of 40 ms. Results. There were significant group differences for the onset and offset transient peaks. Subjects with PDS had longer latencies for the onset and offset peaks relative to the control group. Conclusions. Subjects with PDS showed a deficient neural timing in the early stages of the auditory pathway consistent with temporal processing deficits and their abnormal timing may underlie to their disfluency.
Sheffert, Sonya M; Olson, Elizabeth
In this research, we investigated the effects of voice and face information on the perceptual learning of talkers and on long-term memory for spoken words. In the first phase, listeners were trained over several days to identify voices from words presented auditorily or audiovisually. The training data showed that visual information about speakers enhanced voice learning, revealing cross-modal connections in talker processing akin to those observed in speech processing. In the second phase, the listeners completed an auditory or audiovisual word recognition memory test in which equal numbers of words were spoken by familiar and unfamiliar talkers. The data showed that words presented by familiar talkers were more likely to be retrieved from episodic memory, regardless of modality. Together, these findings provide new information about the representational code underlying familiar talker recognition and the role of stimulus familiarity in episodic word recognition.
Schönmeyr, Björn; Wendby, Lisa; Sharma, Mitali; Jacobson, Lia; Restrepo, Carolina; Campbell, Alex
Many patients with cleft palate deformities worldwide receive treatment at a later age than is recommended for normal speech to develop. The outcomes after late palate repairs in terms of speech and quality of life (QOL) still remain largely unstudied. In the current study, questionnaires were used to assess the patients' perception of speech and QOL before and after primary palate repair. All of the patients were operated at a cleft center in northeast India and had a cleft palate with a normal lip or with a cleft lip that had been previously repaired. A total of 134 patients (7-35 years) were interviewed preoperatively and 46 patients (7-32 years) were assessed in the postoperative survey. The survey showed that scores based on the speech handicap index, concerning speech and speech-related QOL, did not improve postoperatively. In fact, the questionnaires indicated that the speech became more unpredictable (P reported that their self-confidence had improved after the operation. Thus, the majority of interviewed patients who underwent late primary palate repair were satisfied with the surgery. At the same time, speech and speech-related QOL did not improve according to the speech handicap index-based survey. Speech predictability may even become worse and nasal regurgitation may increase after late palate repair, according to these results.
de Bruijn, Marieke J.; ten Bosch, Louis; Kuik, Dirk J.; Witte, Birgit I.; Langendijk, Johannes A.; Leemans, C. Rene; Verdonck-de Leeuw, Irma M.
Speech impairment often occurs in patients after treatment for head and neck cancer. A specific speech characteristic that influences intelligibility and speech quality is voice-onset-time (VOT) in stop consonants. VOT is one of the functionally most relevant parameters that distinguishes voiced and
Murphy, Sandra; Spence, Charles; Dalton, Polly
Selective attention is a crucial mechanism in everyday life, allowing us to focus on a portion of incoming sensory information at the expense of other less relevant stimuli. The circumstances under which irrelevant stimuli are successfully ignored have been a topic of scientific interest for several decades now. Over the last 20 years, the perceptual load theory (e.g. Lavie, 1995) has provided one robust framework for understanding these effects within the visual modality. The suggestion is that successful selection depends on the perceptual demands imposed by the task-relevant information. However, less research has addressed the question of whether the same principles hold in audition and, to date, the existing literature provides a mixed picture. Here, we review the evidence for and against the applicability of perceptual load theory in hearing, concluding that this question still awaits resolution. Copyright © 2017 The Authors. Published by Elsevier B.V. All rights reserved.
Hashimoto, K.; Yamagishi, J.; Byrne, W.; King, S.; Tokuda, K.
This paper provides an analysis of the impacts of machine translation and speech synthesis on speech-to-speech translation systems. The speech-to-speech translation system consists of three components: speech recognition, machine translation and speech synthesis. Many techniques for integration of speech recognition and machine translation have been proposed. However, speech synthesis has not yet been considered. Therefore, in this paper, we focus on machine translation and speech synthesis, ...
Falk, Simone; Rathcke, Tamara; Dalla Bella, Simone
Repetition can boost memory and perception. However, repeating the same stimulus several times in immediate succession also induces intriguing perceptual transformations and illusions. Here, we investigate the Speech to Song Transformation (S2ST), a massed repetition effect in the auditory modality, which crosses the boundaries between language and music. In the S2ST, a phrase repeated several times shifts to being heard as sung. To better understand this unique cross-domain transformation, we examined the perceptual determinants of the S2ST, in particular the role of acoustics. In 2 Experiments, the effects of 2 pitch properties and 3 rhythmic properties on the probability and speed of occurrence of the transformation were examined. Results showed that both pitch and rhythmic properties are key features fostering the transformation. However, some properties proved to be more conducive to the S2ST than others. Stable tonal targets that allowed for the perception of a musical melody led more often and quickly to the S2ST than scalar intervals. Recurring durational contrasts arising from segmental grouping favoring a metrical interpretation of the stimulus also facilitated the S2ST. This was, however, not the case for a regular beat structure within and across repetitions. In addition, individual perceptual abilities allowed to predict the likelihood of the S2ST. Overall, the study demonstrated that repetition enables listeners to reinterpret specific prosodic features of spoken utterances in terms of musical structures. The findings underline a tight link between language and music, but they also reveal important differences in communicative functions of prosodic structure in the 2 domains.
Kraljic, Tanya; Brennan, Susan E; Samuel, Arthur G
Listeners are faced with enormous variation in pronunciation, yet they rarely have difficulty understanding speech. Although much research has been devoted to figuring out how listeners deal with variability, virtually none (outside of sociolinguistics) has focused on the source of the variation itself. The current experiments explore whether different kinds of variation lead to different cognitive and behavioral adjustments. Specifically, we compare adjustments to the same acoustic consequence when it is due to context-independent variation (resulting from articulatory properties unique to a speaker) versus context-conditioned variation (resulting from common articulatory properties of speakers who share a dialect). The contrasting results for these two cases show that the source of a particular acoustic-phonetic variation affects how that variation is handled by the perceptual system. We also show that changes in perceptual representations do not necessarily lead to changes in production.
Nielsen, Jens Bo; Dau, Torsten; Neher, Tobias
Studies investigating speech-on-speech masking effects commonly use closed-set speech materials such as the coordinate response measure [Bolia et al. (2000). J. Acoust. Soc. Am. 107, 1065-1066]. However, these studies typically result in very low (i.e., negative) speech recognition thresholds (SRTs......) when the competing speech signals are spatially separated. To achieve higher SRTs that correspond more closely to natural communication situations, an open-set, low-context, multi-talker speech corpus was developed. Three sets of 268 unique Danish sentences were created, and each set was recorded...... with one of three professional female talkers. The intelligibility of each sentence in the presence of speech-shaped noise was measured. For each talker, 200 approximately equally intelligible sentences were then selected and systematically distributed into 10 test lists. Test list homogeneity was assessed...
Rämö, Jussi; Christensen, Lasse; Bech, Søren
This paper focuses on validating a perceptual distraction model, which aims to predict user’s perceived distraction caused by audio-on-audio interference, e.g., two competing audio sources within the same listening space. Originally, the distraction model was trained with music-on-music stimuli...... that the model performance is equally good in both zones, i.e., with both speech-on-music and music-on-speech stimuli, and comparable to the previous validation round (RMSE approximately 10%). The results further confirm that the distraction model can be used as a valuable tool in evaluating and optimizing...
Chiodo, Liliane; Majerus, Steve; Mottron, Laurent
The distinction between autism and Asperger syndrome has been abandoned in the DSM-5. However, this clinical categorization largely overlaps with the presence or absence of a speech onset delay which is associated with clinical, cognitive, and neural differences. It is unknown whether these different speech development pathways and associated cognitive differences are involved in the heterogeneity of the restricted interests that characterize autistic adults. This study tested the hypothesis that speech onset delay, or conversely, early mastery of speech, orients the nature and verbal reporting of adult autistic interests. The occurrence of a priori defined descriptors for perceptual and thematic dimensions were determined, as well as the perceived function and benefits, in the response of autistic people to a semi-structured interview on their intense interests. The number of words, grammatical categories, and proportion of perceptual / thematic descriptors were computed and compared between groups by variance analyses. The participants comprised 40 autistic adults grouped according to the presence ( N = 20) or absence ( N = 20) of speech onset delay, as well as 20 non-autistic adults, also with intense interests, matched for non-verbal intelligence using Raven's Progressive Matrices. The overall nature, function, and benefit of intense interests were similar across autistic subgroups, and between autistic and non-autistic groups. However, autistic participants with a history of speech onset delay used more perceptual than thematic descriptors when talking about their interests, whereas the opposite was true for autistic individuals without speech onset delay. This finding remained significant after controlling for linguistic differences observed between the two groups. Verbal reporting, but not the nature or positive function, of intense interests differed between adult autistic individuals depending on their speech acquisition history: oral reporting of
... OTC Relief for Diarrhea Home Diseases and Conditions Speech and Language Delay Condition Speech and Language Delay Share Print Table of Contents1. ... Treatment6. Everyday Life7. Questions8. Resources What is a speech and language delay? A speech and language delay ...
In this paper I argue that one way of explaining what is wrong with hate speech is by critically assessing what kind of freedom free speech involves and, relatedly, what kind of freedom hate speech undermines. More specifically, I argue that the main arguments for freedom of speech (e.g. from truth, from autonomy, and from democracy) rely on a “positive” conception of freedom intended as autonomy and self-mastery (Berlin, 2006), and can only partially help us to understand what is wrong with ...
Civera, M.; Filosi, C. M.; Pugno, N. M.; Silvestrini, M.; Surace, C.; Worden, K.
Vocal cord nodules represent a pathological condition for which the growth of unnatural masses on vocal folds affects the patients. Among other effects, changes in the vocal cords’ overall mass and stiffness alter their vibratory behaviour, thus changing the vocal emission generated by them. This causes dysphonia, i.e. abnormalities in the patients’ voice, which can be analysed and inspected via audio signals. However, the evaluation of voice condition through speech processing is not a trivial task, as standard methods based on the Fourier Transform, fail to fit the non-stationary nature of vocal signals. In this study, four audio tracks, provided by a volunteer patient, whose vocal fold nodules have been surgically removed, were analysed using a relatively new technique: the Hilbert-Huang Transform (HHT) via Empirical Mode Decomposition (EMD); specifically, by using the CEEMDAN (Complete Ensemble EMD with Adaptive Noise) algorithm. This method has been applied here to speech signals, which were recorded before removal surgery and during convalescence, to investigate specific trends. Possibilities offered by the HHT are exposed, but also some limitations of decomposing the signals into so-called intrinsic mode functions (IMFs) are highlighted. The results of these preliminary studies are intended to be a basis for the development of new viable alternatives to the softwares currently used for the analysis and evaluation of pathological voice.
Civera, M; Surace, C; Filosi, C M; Silvestrini, M; Pugno, N M; Worden, K
Vocal cord nodules represent a pathological condition for which the growth of unnatural masses on vocal folds affects the patients. Among other effects, changes in the vocal cords’ overall mass and stiffness alter their vibratory behaviour, thus changing the vocal emission generated by them. This causes dysphonia, i.e. abnormalities in the patients’ voice, which can be analysed and inspected via audio signals. However, the evaluation of voice condition through speech processing is not a trivial task, as standard methods based on the Fourier Transform, fail to fit the non-stationary nature of vocal signals. In this study, four audio tracks, provided by a volunteer patient, whose vocal fold nodules have been surgically removed, were analysed using a relatively new technique: the Hilbert-Huang Transform (HHT) via Empirical Mode Decomposition (EMD); specifically, by using the CEEMDAN (Complete Ensemble EMD with Adaptive Noise) algorithm. This method has been applied here to speech signals, which were recorded before removal surgery and during convalescence, to investigate specific trends. Possibilities offered by the HHT are exposed, but also some limitations of decomposing the signals into so-called intrinsic mode functions (IMFs) are highlighted. The results of these preliminary studies are intended to be a basis for the development of new viable alternatives to the softwares currently used for the analysis and evaluation of pathological voice. (paper)
Hutchinson, Michael J; Paulson, Thomas A W; Eston, Roger; Goosey-Tolfrey, Victoria L
To examine the reliability of a perceptually-regulated maximal exercise test (PRETmax) to measure peak oxygen uptake ([Formula: see text]) during handcycle exercise and to compare peak responses to those derived from a ramp-incremented protocol (RAMP). Twenty recreationally active individuals (14 male, 6 female) completed four trials across a 2-week period, using a randomised, counterbalanced design. Participants completed two RAMP protocols (20 W·min-1) in week 1, followed by two PRETmax in week 2, or vice versa. The PRETmax comprised five, 2-min stages clamped at Ratings of Perceived Exertion (RPE) 11, 13, 15, 17 and 20. Participants changed power output (PO) as often as required to maintain target RPE. Gas exchange variables (oxygen uptake, carbon dioxide production, minute ventilation), heart rate (HR) and PO were collected throughout. Differentiated RPE were collected at the end of each stage throughout trials. For relative [Formula: see text], coefficient of variation (CV) was equal to 4.1% and 4.8%, with ICC(3,1) of 0.92 and 0.85 for repeated measures from PRETmax and RAMP, respectively. Measurement error was 0.15 L·min-1 and 2.11 ml·kg-1·min-1 in PRETmax and 0.16 L·min-1 and 2.29 ml·kg-1·min-1 during RAMP for determining absolute and relative [Formula: see text], respectively. The difference in [Formula: see text] between PRETmax and RAMP was tending towards statistical significance (26.2 ± 5.1 versus 24.3 ± 4.0 ml·kg-1·min-1, P = 0.055). The 95% LoA were -1.9 ± 4.1 (-9.9 to 6.2) ml·kg-1·min-1. The PRETmax can be used as a reliable test to measure [Formula: see text] during handcycle exercise in recreationally active participants. Whilst PRETmax tended towards significantly greater [Formula: see text] values than RAMP, the difference is smaller than measurement error of determining [Formula: see text] from PRETmax and RAMP.
Brouwer, G.J.; Tong, F.; Hagoort, P.; van Ee, R.
We employed a parametric psychophysical design in combination with functional imaging to examine the influence of metric changes in perceptual incongruence on perceptual alternation rates and cortical responses. Subjects viewed a bistable stimulus defined by incongruent depth cues; bistability
de Kok, I.A.; Poppe, Ronald Walter; Heylen, Dirk K.J.
We introduce Iterative Perceptual Learning (IPL), a novel approach to learn computational models for social behavior synthesis from corpora of human–human interactions. IPL combines perceptual evaluation with iterative model refinement. Human observers rate the appropriateness of synthesized
de Kok, I.A.; Poppe, Ronald Walter; Heylen, Dirk K.J.
We introduce Iterative Perceptual Learning (IPL), a novel approach for learning computational models for social behavior synthesis from corpora of human-human interactions. The IPL approach combines perceptual evaluation with iterative model refinement. Human observers rate the appropriateness of
Krueger, Paul M.; van Vugt, Marieke K.; Simen, Patrick; Nystrom, Leigh; Holmes, Philip; Cohen, Jonathan D.
BACKGROUND: We assessed whether evidence accumulation could be observed in the BOLD signal during perceptual decision making. This presents a challenge since the hemodynamic response is slow, while perceptual decisions are typically fast. NEW METHOD: Guided by theoretical predictions of the drift
Dana L Strait
Full Text Available Even in the quietest of rooms, our senses are perpetually inundated by a barrage of sounds, requiring the auditory system to adapt to a variety of listening conditions in order to extract signals of interest (e.g., one speaker’s voice amidst others. Brain networks that promote selective attention are thought to sharpen the neural encoding of a target signal, suppressing competing sounds and enhancing perceptual performance. Here, we ask: does musical training benefit cortical mechanisms that underlie selective attention to speech? To answer this question, we assessed the impact of selective auditory attention on cortical auditory-evoked response variability in musicians and nonmusicians. Outcomes indicate strengthened brain networks for selective auditory attention in musicians in that musicians but not nonmusicians demonstrate decreased prefrontal response variability with auditory attention. Results are interpreted in the context of previous work from our laboratory documenting perceptual and subcortical advantages in musicians for the hearing and neural encoding of speech in background noise. Musicians’ neural proficiency for selectively engaging and sustaining auditory attention to language indicates a potential benefit of music for auditory training. Given the importance of auditory attention for the development of language-related skills, musical training may aid in the prevention, habilitation and remediation of children with a wide range of attention-based language and learning impairments.
Szpiro, Sarit F. A.; Spering, Miriam; Carrasco, Marisa
Perceptual learning improves detection and discrimination of relevant visual information in mature humans, revealing sensory plasticity. Whether visual perceptual learning affects motor responses is unknown. Here we implemented a protocol that enabled us to address this question. We tested a perceptual response (motion direction estimation, in which observers overestimate motion direction away from a reference) and a motor response (voluntary smooth pursuit eye movements). Perceptual training...
Shriberg, Lawrence D.; Strand, Edythe A.; Fourakis, Marios; Jakielski, Kathy J.; Hall, Sheryl D.; Karlsson, Heather B.; Mabie, Heather L.; McSweeny, Jane L.; Tilkens, Christie M.; Wilson, David L.
Purpose: Previous articles in this supplement described rationale for and development of the pause marker (PM), a diagnostic marker of childhood apraxia of speech (CAS), and studies supporting its validity and reliability. The present article assesses the theoretical coherence of the PM with speech processing deficits in CAS. Method: PM and other…
The primary objective of this position paper is to assess the theoretical and empirical support that exists for the Mayo Clinic view of motor speech disorders in general, and for oromotor, nonverbal tasks as a window to speech production processes in particular. Literature both in support of and against the Mayo clinic view and the associated use…
Ward, Roslyn; Leitão, Suze; Strauss, Geoff
This study evaluates perceptual changes in speech production accuracy in six children (3-11 years) with moderate-to-severe speech impairment associated with cerebral palsy before, during, and after participation in a motor-speech intervention program (Prompts for Restructuring Oral Muscular Phonetic Targets). An A1BCA2 single subject research design was implemented. Subsequent to the baseline phase (phase A1), phase B targeted each participant's first intervention priority on the PROMPT motor-speech hierarchy. Phase C then targeted one level higher. Weekly speech probes were administered, containing trained and untrained words at the two levels of intervention, plus an additional level that served as a control goal. The speech probes were analysed for motor-speech-movement-parameters and perceptual accuracy. Analysis of the speech probe data showed all participants recorded a statistically significant change. Between phases A1-B and B-C 6/6 and 4/6 participants, respectively, recorded a statistically significant increase in performance level on the motor speech movement patterns targeted during the training of that intervention. The preliminary data presented in this study make a contribution to providing evidence that supports the use of a treatment approach aligned with dynamic systems theory to improve the motor-speech movement patterns and speech production accuracy in children with cerebral palsy.
Kawabe, Takahiro; Maruya, Kazushi; Nishida, Shin'ya
Human vision has a remarkable ability to perceive two layers at the same retinal locations, a transparent layer in front of a background surface. Critical image cues to perceptual transparency, studied extensively in the past, are changes in luminance or color that could be caused by light absorptions and reflections by the front layer, but such image changes may not be clearly visible when the front layer consists of a pure transparent material such as water. Our daily experiences with transparent materials of this kind suggest that an alternative potential cue of visual transparency is image deformations of a background pattern caused by light refraction. Although previous studies have indicated that these image deformations, at least static ones, play little role in perceptual transparency, here we show that dynamic image deformations of the background pattern, which could be produced by light refraction on a moving liquid's surface, can produce a vivid impression of a transparent liquid layer without the aid of any other visual cues as to the presence of a transparent layer. Furthermore, a transparent liquid layer perceptually emerges even from a randomly generated dynamic image deformation as long as it is similar to real liquid deformations in its spatiotemporal frequency profile. Our findings indicate that the brain can perceptually infer the presence of "invisible" transparent liquids by analyzing the spatiotemporal structure of dynamic image deformation, for which it uses a relatively simple computation that does not require high-level knowledge about the detailed physics of liquid deformation.
Keidser, Gitte; Dillon, Harvey; Convery, Elizabeth; Mejia, Jorge
Large variations in perceptual directional microphone benefit, which far exceed the variation expected from physical performance measures of directional microphones, have been reported in the literature. The cause for the individual variation has not been systematically investigated. To determine the factors that are responsible for the individual variation in reported perceptual directional benefit. A correlational study. Physical performance measures of the directional microphones obtained after they had been fitted to individuals, cognitive abilities of individuals, and measurement errors were related to perceptual directional benefit scores. Fifty-nine hearing-impaired adults with varied degrees of hearing loss participated in the study. All participants were bilaterally fitted with a Motion behind-the-ear device (500 M, 501 SX, or 501 P) from Siemens according to the National Acoustic Laboratories' non-linear prescription, version two (NAL-NL2). Using the Bamford-Kowal-Bench (BKB) sentences, the perceptual directional benefit was obtained as the difference in speech reception threshold measured in babble noise (SRTn) with the devices in directional (fixed hypercardioid) and in omnidirectional mode. The SRTn measurements were repeated three times with each microphone mode. Physical performance measures of the directional microphone included the angle of the microphone ports to loudspeaker axis, the frequency range dominated by amplified sound, the in situ signal-to-noise ratio (SNR), and the in situ three-dimensional, articulation-index weighted directivity index (3D AI-DI). The cognitive tests included auditory selective attention, speed of processing, and working memory. Intraparticipant variation on the repeated SRTn's and the interparticipant variation on the average SRTn were used to determine the effect of measurement error. A multiple regression analysis was used to determine the effect of other factors. Measurement errors explained 52% of the variation
Christiner, Markus; Reiterer, Susanne M
In previous research on speech imitation, musicality, and an ability to sing were isolated as the strongest indicators of good pronunciation skills in foreign languages. We, therefore, wanted to take a closer look at the nature of the ability to sing, which shares a common ground with the ability to imitate speech. This study focuses on whether good singing performance predicts good speech imitation. Forty-one singers of different levels of proficiency were selected for the study and their ability to sing, to imitate speech, their musical talent and working memory were tested. Results indicated that singing performance is a better indicator of the ability to imitate speech than the playing of a musical instrument. A multiple regression revealed that 64% of the speech imitation score variance could be explained by working memory together with educational background and singing performance. A second multiple regression showed that 66% of the speech imitation variance of completely unintelligible and unfamiliar language stimuli (Hindi) could be explained by working memory together with a singer's sense of rhythm and quality of voice. This supports the idea that both vocal behaviors have a common grounding in terms of vocal and motor flexibility, ontogenetic and phylogenetic development, neural orchestration and auditory memory with singing fitting better into the category of "speech" on the productive level and "music" on the acoustic level. As a result, good singers benefit from vocal and motor flexibility, productively and cognitively, in three ways. (1) Motor flexibility and the ability to sing improve language and musical function. (2) Good singers retain a certain plasticity and are open to new and unusual sound combinations during adulthood both perceptually and productively. (3) The ability to sing improves the memory span of the auditory working memory.
Full Text Available In previous research on speech imitation, musicality and an ability to sing were isolated as the strongest indicators of good pronunciation skills in foreign languages. We, therefore, wanted to take a closer look at the nature of the ability to sing, which shares a common ground with the ability to imitate speech. This study focuses on whether good singing performance predicts good speech imitation. Fourty-one singers of different levels of proficiency were selected for the study and their ability to sing, to imitate speech, their musical talent and working memory were tested. Results indicated that singing performance is a better indicator of the ability to imitate speech than the playing of a musical instrument. A multiple regression revealed that 64 % of the speech imitation score variance could be explained by working memory together with educational background and singing performance. A second multiple regression showed that 66 % of the speech imitation variance of completely unintelligible and unfamiliar language stimuli (Hindi could be explained by working memory together with a singer’s sense of rhythm and quality of voice. This supports the idea that both vocal behaviors have a common grounding in terms of vocal and motor flexibility, ontogenetic and phylogenetic development, neural orchestration and sound memory with singing fitting better into the category of "speech" on the productive level and "music" on the acoustic level. As a result, good singers benefit from vocal and motor flexibility, productively and cognitively, in three ways. 1. Motor flexibility and the ability to sing improve language and musical function. 2. Good singers retain a certain plasticity and are open to new and unusual sound combinations during adulthood both perceptually and productively. 3. The ability to sing improves the memory span of the auditory short term memory.
Liang, Jiali; Wilkinson, Krista; Sainburg, Robert L
Previous studies proposed that selecting which hand to use for a reaching task appears to be modulated by a factor described as "task difficulty". However, what features of a task might contribute to greater or lesser "difficulty" in the context of hand selection decisions has yet to be determined. There has been evidence that biomechanical and kinematic factors such as movement smoothness and work can predict patterns of selection across the workspace, suggesting a role of predictive cost analysis in hand-selection. We hypothesize that this type of prediction for hand-selection should recruit substantial cognitive resources and thus should be influenced by cognitive-perceptual loading. We test this hypothesis by assessing the role of cognitive-perceptual loading on hand selection decisions, using a visual search task that presents different levels of difficulty (cognitive-perceptual load), as established in previous studies on overall response time and efficiency of visual search. Although the data are necessarily preliminary due to small sample size, our data suggested an influence of cognitive-perceptual load on hand selection, such that the dominant hand was selected more frequently as cognitive load increased. Interestingly, cognitive-perceptual loading also increased cross-midline reaches with both hands. Because crossing midline is more costly in terms of kinematic and kinetic factors, our findings suggest that cognitive processes are normally engaged to avoid costly actions, and that the choice not-to-cross midline requires cognitive resources. Copyright © 2017 IBRO. Published by Elsevier Ltd. All rights reserved.
Abiri, Ahmad; Tao, Anna; LaRocca, Meg; Guan, Xingmin; Askari, Syed J; Bisley, James W; Dutson, Erik P; Grundfest, Warren S
The principal objective of the experiment was to analyze the effects of the clutch operation of robotic surgical systems on the performance of the operator. The relative coordinate system introduced by the clutch operation can introduce a visual-perceptual mismatch which can potentially have negative impact on a surgeon's performance. We also assess the impact of the introduction of additional tactile sensory information on reducing the impact of visual-perceptual mismatch on the performance of the operator. We asked 45 novice subjects to complete peg transfers using the da Vinci IS 1200 system with grasper-mounted, normal force sensors. The task involves picking up a peg with one of the robotic arms, passing it to the other arm, and then placing it on the opposite side of the view. Subjects were divided into three groups: aligned group (no mismatch), the misaligned group (10 cm z axis mismatch), and the haptics-misaligned group (haptic feedback and z axis mismatch). Each subject performed the task five times, during which the grip force, time of completion, and number of faults were recorded. Compared to the subjects that performed the tasks using a properly aligned controller/arm configuration, subjects with a single-axis misalignment showed significantly more peg drops (p = 0.011) and longer time to completion (p sensors showed no difference between the different groups. The visual-perceptual mismatch created by the misalignment of the robotic controls relative to the robotic arms has a negative impact on the operator of a robotic surgical system. Introduction of other sensory information and haptic feedback systems can help in potentially reducing this effect.
Cardoso-Junior, M M; Scarpel, R A
The main focus of risk management is technical and rational analysis about the operational risks and by those imposed by the occupational environment. In this work one seeks to contribute to the risk perception study and to better comprehend how a group of occupational safety students assesses a set of activities and environmental agents. In this way it was used theory sustained by psychometric paradigm and multivariate analysis tools, mainly multidimensional scaling, generalized Procrustes analysis and facets theory, in order to construct the perceptual map of occupational risks. The results obtained showed that the essential characteristics of risks, which were initially splited in 4 facets were detected and maintained in the perceptual map. It was not possible to reveal the cognitive structure of the group, because the variability of the students was too high. Differences among the risks analyzed could not be detected as well in the perceptual map of the group.
Dornan, Dimity; Hickson, Louise; Murdoch, Bruce; Houston, Todd
This study examined the speech perception, speech, and language developmental progress of 25 children with hearing loss (mean Pure-Tone Average [PTA] 79.37 dB HL) in an auditory verbal therapy program. Children were tested initially and then 21 months later on a battery of assessments. The speech and language results over time were compared with…
Soutar, Geoffrey N.; Clarke, Alexander W.
Proposes a methodology for examining career preferences, which uses perceptual mapping techniques and external preference analysis to assess the attributes individuals believe are important. A study of 158 business students' career preferences suggested the methodology can be useful in analyzing reasons for career preferences. (WAS)
Teaching Speech Acts
Full Text Available In this paper I argue that pragmatic ability must become part of what we teach in the classroom if we are to realize the goals of communicative competence for our students. I review the research on pragmatics, especially those articles that point to the effectiveness of teaching pragmatics in an explicit manner, and those that posit methods for teaching. I also note two areas of scholarship that address classroom needs—the use of authentic data and appropriate assessment tools. The essay concludes with a summary of my own experience teaching speech acts in an advanced-level Portuguese class.
Lohmander, Anette; Persson, Christina; Willadsen, Elisabeth
Background and aim: Adequate velopharyngeal function and speech are main goals in the treatment of cleft palate. The objective was to investigate if there were differences in velopharyngeal competency (VPC) and hypernasality at age 5 years in children with unilateral cleft lip and palate (UCLP...... (136 girls, 255 boys) were available and perceptually analysed. The main outcome measures were VPC and hypernasality from blinded assessments. Results: There were no statistically significant differences between the prevalences in the arms in any of the trials. VPC: Trial 1, A: 58%, B: 61%; Trial 2, A......: 57%, C: 54%; Trial 3, A: 35%, D: 51%. No hypernasality: Trial 1, A: 54%, B: 44%; Trial 2, A: 47%, C: 51%; Trial 3, A: 34%, D: 49%. Conclusions: No differences were found regarding VPC and hypernasality at age 5 years after different methods for primary palatal repair. The burden of care in terms...
Venezia, Jonathan H; Thurman, Steven M; Matchin, William; George, Sahara E; Hickok, Gregory
Recent influential models of audiovisual speech perception suggest that visual speech aids perception by generating predictions about the identity of upcoming speech sounds. These models place stock in the assumption that visual speech leads auditory speech in time. However, it is unclear whether and to what extent temporally-leading visual speech information contributes to perception. Previous studies exploring audiovisual-speech timing have relied upon psychophysical procedures that require artificial manipulation of cross-modal alignment or stimulus duration. We introduce a classification procedure that tracks perceptually relevant visual speech information in time without requiring such manipulations. Participants were shown videos of a McGurk syllable (auditory /apa/ + visual /aka/ = perceptual /ata/) and asked to perform phoneme identification (/apa/ yes-no). The mouth region of the visual stimulus was overlaid with a dynamic transparency mask that obscured visual speech in some frames but not others randomly across trials. Variability in participants' responses (~35 % identification of /apa/ compared to ~5 % in the absence of the masker) served as the basis for classification analysis. The outcome was a high resolution spatiotemporal map of perceptually relevant visual features. We produced these maps for McGurk stimuli at different audiovisual temporal offsets (natural timing, 50-ms visual lead, and 100-ms visual lead). Briefly, temporally-leading (~130 ms) visual information did influence auditory perception. Moreover, several visual features influenced perception of a single speech sound, with the relative influence of each feature depending on both its temporal relation to the auditory signal and its informational content.
Venezia, Jonathan H.; Thurman, Steven M.; Matchin, William; George, Sahara E.; Hickok, Gregory
Recent influential models of audiovisual speech perception suggest that visual speech aids perception by generating predictions about the identity of upcoming speech sounds. These models place stock in the assumption that visual speech leads auditory speech in time. However, it is unclear whether and to what extent temporally-leading visual speech information contributes to perception. Previous studies exploring audiovisual-speech timing have relied upon psychophysical procedures that require artificial manipulation of cross-modal alignment or stimulus duration. We introduce a classification procedure that tracks perceptually-relevant visual speech information in time without requiring such manipulations. Participants were shown videos of a McGurk syllable (auditory /apa/ + visual /aka/ = perceptual /ata/) and asked to perform phoneme identification (/apa/ yes-no). The mouth region of the visual stimulus was overlaid with a dynamic transparency mask that obscured visual speech in some frames but not others randomly across trials. Variability in participants' responses (∼35% identification of /apa/ compared to ∼5% in the absence of the masker) served as the basis for classification analysis. The outcome was a high resolution spatiotemporal map of perceptually-relevant visual features. We produced these maps for McGurk stimuli at different audiovisual temporal offsets (natural timing, 50-ms visual lead, and 100-ms visual lead). Briefly, temporally-leading (∼130 ms) visual information did influence auditory perception. Moreover, several visual features influenced perception of a single speech sound, with the relative influence of each feature depending on both its temporal relation to the auditory signal and its informational content. PMID:26669309
Kayasith, Prakasith; Theeramunkong, Thanaruk
It is a tedious and subjective task to measure severity of a dysarthria by manually evaluating his/her speech using available standard assessment methods based on human perception. This paper presents an automated approach to assess speech quality of a dysarthric speaker with cerebral palsy. With the consideration of two complementary factors, speech consistency and speech distinction, a speech quality indicator called speech clarity index (Ψ) is proposed as a measure of the speaker's ability to produce consistent speech signal for a certain word and distinguished speech signal for different words. As an application, it can be used to assess speech quality and forecast speech recognition rate of speech made by an individual dysarthric speaker before actual exhaustive implementation of an automatic speech recognition system for the speaker. The effectiveness of Ψ as a speech recognition rate predictor is evaluated by rank-order inconsistency, correlation coefficient, and root-mean-square of difference. The evaluations had been done by comparing its predicted recognition rates with ones predicted by the standard methods called the articulatory and intelligibility tests based on the two recognition systems (HMM and ANN). The results show that Ψ is a promising indicator for predicting recognition rate of dysarthric speech. All experiments had been done on speech corpus composed of speech data from eight normal speakers and eight dysarthric speakers.
Szpiro, Sarit F A; Spering, Miriam; Carrasco, Marisa
Perceptual learning improves detection and discrimination of relevant visual information in mature humans, revealing sensory plasticity. Whether visual perceptual learning affects motor responses is unknown. Here we implemented a protocol that enabled us to address this question. We tested a perceptual response (motion direction estimation, in which observers overestimate motion direction away from a reference) and a motor response (voluntary smooth pursuit eye movements). Perceptual training led to greater overestimation and, remarkably, it modified untrained smooth pursuit. In contrast, pursuit training did not affect overestimation in either pursuit or perception, even though observers in both training groups were exposed to the same stimuli for the same time period. A second experiment revealed that estimation training also improved discrimination, indicating that overestimation may optimize perceptual sensitivity. Hence, active perceptual training is necessary to alter perceptual responses, and an acquired change in perception suffices to modify pursuit, a motor response. © 2014 ARVO.
Andreas Maier; Tino Haderlein; Florian Stelzle; Elmar Nöth; Emeka Nkenke; Frank Rosanowski; Anne Schützenberger; Maria Schuster
In patients suffering from head and neck cancer, speech intelligibility is often restricted. For assessment and outcome measurements, automatic speech recognition systems have previously been shown to be appropriate for objective and quick evaluation of intelligibility. In this study we investigate the applicability of the method to speech disorders caused by head and neck cancer. Intelligibility was quantified by speech recognition on recordings of a standard text read by 41 German laryngect...
Dale, Philip S; Hayden, Deborah A
Prompts for Restructuring Oral Muscular Phonetic Targets (PROMPT; Hayden, 2004; Hayden, Eigen, Walker, & Olsen, 2010)-a treatment approach for the improvement of speech sound disorders in children-uses tactile-kinesthetic- proprioceptive (TKP) cues to support and shape movements of the oral articulators. No research to date has systematically examined the efficacy of PROMPT for children with childhood apraxia of speech (CAS). Four children (ages 3;6 [years;months] to 4;8), all meeting the American Speech-Language-Hearing Association (2007) criteria for CAS, were treated using PROMPT. All children received 8 weeks of 2 × per week treatment, including at least 4 weeks of full PROMPT treatment that included TKP cues. During the first 4 weeks, 2 of the 4 children received treatment that included all PROMPT components except TKP cues. This design permitted both between-subjects and within-subjects comparisons to evaluate the effect of TKP cues. Gains in treatment were measured by standardized tests and by criterion-referenced measures based on the production of untreated probe words, reflecting change in speech movements and auditory perceptual accuracy. All 4 children made significant gains during treatment, but measures of motor speech control and untreated word probes provided evidence for more gain when TKP cues were included. PROMPT as a whole appears to be effective for treating children with CAS, and the inclusion of TKP cues appears to facilitate greater effect.
John F Magnotti
Full Text Available During speech perception, humans integrate auditory information from the voice with visual information from the face. This multisensory integration increases perceptual precision, but only if the two cues come from the same talker; this requirement has been largely ignored by current models of speech perception. We describe a generative model of multisensory speech perception that includes this critical step of determining the likelihood that the voice and face information have a common cause. A key feature of the model is that it is based on a principled analysis of how an observer should solve this causal inference problem using the asynchrony between two cues and the reliability of the cues. This allows the model to make predictions abut the behavior of subjects performing a synchrony judgment task, predictive power that does not exist in other approaches, such as post hoc fitting of Gaussian curves to behavioral data. We tested the model predictions against the performance of 37 subjects performing a synchrony judgment task viewing audiovisual speech under a variety of manipulations, including varying asynchronies, intelligibility, and visual cue reliability. The causal inference model outperformed the Gaussian model across two experiments, providing a better fit to the behavioral data with fewer parameters. Because the causal inference model is derived from a principled understanding of the task, model parameters are directly interpretable in terms of stimulus and subject properties.
Alice Estevo Dias
Full Text Available ABSTRACT Speech disorders are common manifestations of Parkinson´s disease. Objective To compare speech articulation in patients according to age at onset of the disease. Methods Fifty patients was divided into two groups: Group I consisted of 30 patients with age at onset between 40 and 55 years; Group II consisted of 20 patients with age at onset after 65 years. All patients were evaluated based on the Unified Parkinson’s Disease Rating Scale scores, Hoehn and Yahr scale and speech evaluation by perceptual and acoustical analysis. Results There was no statistically significant difference between the two groups regarding neurological involvement and speech characteristics. Correlation analysis indicated differences in speech articulation in relation to staging and axial scores of rigidity and bradykinesia for middle and late-onset. Conclusions Impairment of speech articulation did not correlate with age at onset of disease, but was positively related with disease duration and higher scores in both groups.
Society of "" America , Anaheim, CA, Dec. 1986. # Randolph, M. A., and V. W. Zue, "The Role of Syllable Structure in the Acoustic Realizations of Stops...input speech signal is first transformed into a represen- ences in sociolinguistic background, dialect, and vocal tract tation that takes into account...Perceptual Evidence,’ Journal of the Acovuticai Society of America , vol. 59, * no. 5, pp. 1208-1221, May 1976. G. E. Kupec and M. A. Bush, ’Network
One of the most important issues concerning the foundations of conscious perception centers on the question of whether perceptual consciousness is rich or sparse. The overflow argument uses a form of 'iconic memory' to argue that perceptual consciousness is richer (i.e., has a higher capacity) than cognitive access: when observing a complex scene we are conscious of more than we can report or think about. Recently, the overflow argument has been challenged both empirically and conceptually. This paper reviews the controversy, arguing that proponents of sparse perception are committed to the postulation of (i) a peculiar kind of generic conscious representation that has no independent rationale and (ii) an unmotivated form of unconscious representation that in some cases conflicts with what we know about unconscious representation. Copyright © 2011 Elsevier Ltd. All rights reserved.
Full Text Available Color camera characterization, mapping outputs from the camera sensors to an independent color space, such as \\(XYZ\\, is an important step in the camera processing pipeline. Until now, this procedure has been primarily solved by using a \\(3 \\times 3\\ matrix obtained via a least-squares optimization. In this paper, we propose to use the spherical sampling method, recently published by Finlayson al., to perform a perceptual color characterization. In particular, we search for the \\(3 \\times 3\\ matrix that minimizes three different perceptual errors, one pixel based and two spatially based. For the pixel-based case, we minimize the CIE \\(\\Delta E\\ error, while for the spatial-based case, we minimize both the S-CIELAB error and the CID error measure. Our results demonstrate an improvement of approximately 3for the \\(\\Delta E\\ error, 7& for the S-CIELAB error and 13% for the CID error measures.
Parbery-Clark, Alexandra; Strait, Dana L.; Anderson, Samira; Hittner, Emily; Kraus, Nina
Much of our daily communication occurs in the presence of background noise, compromising our ability to hear. While understanding speech in noise is a challenge for everyone, it becomes increasingly difficult as we age. Although aging is generally accompanied by hearing loss, this perceptual decline cannot fully account for the difficulties experienced by older adults for hearing in noise. Decreased cognitive skills concurrent with reduced perceptual acuity are thought to contribute to the difficulty older adults experience understanding speech in noise. Given that musical experience positively impacts speech perception in noise in young adults (ages 18–30), we asked whether musical experience benefits an older cohort of musicians (ages 45–65), potentially offsetting the age-related decline in speech-in-noise perceptual abilities and associated cognitive function (i.e., working memory). Consistent with performance in young adults, older musicians demonstrated enhanced speech-in-noise perception relative to nonmusicians along with greater auditory, but not visual, working memory capacity. By demonstrating that speech-in-noise perception and related cognitive function are enhanced in older musicians, our results imply that musical training may reduce the impact of age-related auditory decline. PMID:21589653
Full Text Available Much of our daily communication occurs in the presence of background noise, compromising our ability to hear. While understanding speech in noise is a challenge for everyone, it becomes increasingly difficult as we age. Although aging is generally accompanied by hearing loss, this perceptual decline cannot fully account for the difficulties experienced by older adults for hearing in noise. Decreased cognitive skills concurrent with reduced perceptual acuity are thought to contribute to the difficulty older adults experience understanding speech in noise. Given that musical experience positively impacts speech perception in noise in young adults (ages 18-30, we asked whether musical experience benefits an older cohort of musicians (ages 45-65, potentially offsetting the age-related decline in speech-in-noise perceptual abilities and associated cognitive function (i.e., working memory. Consistent with performance in young adults, older musicians demonstrated enhanced speech-in-noise perception relative to nonmusicians along with greater auditory, but not visual, working memory capacity. By demonstrating that speech-in-noise perception and related cognitive function are enhanced in older musicians, our results imply that musical training may reduce the impact of age-related auditory decline.
Parbery-Clark, Alexandra; Strait, Dana L; Anderson, Samira; Hittner, Emily; Kraus, Nina
Much of our daily communication occurs in the presence of background noise, compromising our ability to hear. While understanding speech in noise is a challenge for everyone, it becomes increasingly difficult as we age. Although aging is generally accompanied by hearing loss, this perceptual decline cannot fully account for the difficulties experienced by older adults for hearing in noise. Decreased cognitive skills concurrent with reduced perceptual acuity are thought to contribute to the difficulty older adults experience understanding speech in noise. Given that musical experience positively impacts speech perception in noise in young adults (ages 18-30), we asked whether musical experience benefits an older cohort of musicians (ages 45-65), potentially offsetting the age-related decline in speech-in-noise perceptual abilities and associated cognitive function (i.e., working memory). Consistent with performance in young adults, older musicians demonstrated enhanced speech-in-noise perception relative to nonmusicians along with greater auditory, but not visual, working memory capacity. By demonstrating that speech-in-noise perception and related cognitive function are enhanced in older musicians, our results imply that musical training may reduce the impact of age-related auditory decline.
Lenay, Charles; Stewart, John
WORK AIMED AT STUDYING SOCIAL COGNITION IN AN INTERACTIONIST PERSPECTIVE OFTEN ENCOUNTERS SUBSTANTIAL THEORETICAL AND METHODOLOGICAL DIFFICULTIES: identifying the significant behavioral variables; recording them without disturbing the interaction; and distinguishing between: (a) the necessary and sufficient contributions of each individual partner for a collective dynamics to emerge; (b) features which derive from this collective dynamics and escape from the control of the individual partners; and (c) the phenomena arising from this collective dynamics which are subsequently appropriated and used by the partners. We propose a minimalist experimental paradigm as a basis for this conceptual discussion: by reducing the sensory inputs to a strict minimum, we force a spatial and temporal deployment of the perceptual activities, which makes it possible to obtain a complete recording and control of the dynamics of interaction. After presenting the principles of this minimalist approach to perception, we describe a series of experiments on two major questions in social cognition: recognizing the presence of another intentional subject; and phenomena of imitation. In both cases, we propose explanatory schema which render an interactionist approach to social cognition clear and explicit. Starting from our earlier work on perceptual crossing we present a new experiment on the mechanisms of reciprocal recognition of the perceptual intentionality of the other subject: the emergent collective dynamics of the perceptual crossing can be appropriated by each subject. We then present an experimental study of opaque imitation (when the subjects cannot see what they themselves are doing). This study makes it possible to characterize what a properly interactionist approach to imitation might be. In conclusion, we draw on these results, to show how an interactionist approach can contribute to a fully social approach to social cognition.
Erickson, Laura C; Heeg, Elizabeth; Rauschecker, Josef P; Turkeltaub, Peter E
The brain improves speech processing through the integration of audiovisual (AV) signals. Situations involving AV speech integration may be crudely dichotomized into those where auditory and visual inputs contain (1) equivalent, complementary signals (validating AV speech) or (2) inconsistent, different signals (conflicting AV speech). This simple framework may allow the systematic examination of broad commonalities and differences between AV neural processes engaged by various experimental paradigms frequently used to study AV speech integration. We conducted an activation likelihood estimation metaanalysis of 22 functional imaging studies comprising 33 experiments, 311 subjects, and 347 foci examining "conflicting" versus "validating" AV speech. Experimental paradigms included content congruency, timing synchrony, and perceptual measures, such as the McGurk effect or synchrony judgments, across AV speech stimulus types (sublexical to sentence). Colocalization of conflicting AV speech experiments revealed consistency across at least two contrast types (e.g., synchrony and congruency) in a network of dorsal stream regions in the frontal, parietal, and temporal lobes. There was consistency across all contrast types (synchrony, congruency, and percept) in the bilateral posterior superior/middle temporal cortex. Although fewer studies were available, validating AV speech experiments were localized to other regions, such as ventral stream visual areas in the occipital and inferior temporal cortex. These results suggest that while equivalent, complementary AV speech signals may evoke activity in regions related to the corroboration of sensory input, conflicting AV speech signals recruit widespread dorsal stream areas likely involved in the resolution of conflicting sensory signals. Copyright © 2014 Wiley Periodicals, Inc.
Full Text Available Listeners must accomplish two complementary perceptual feats in extracting a message from speech. They must discriminate linguistically-relevant acoustic variability and generalize across irrelevant variability. Said another way, they must categorize speech. Since the mapping of acoustic variability is language-specific, these categories must be learned from experience. Thus, understanding how, in general, the auditory system acquires and represents categories can inform us about the toolbox of mechanisms available to speech perception. This perspective invites consideration of findings from cognitive neuroscience literatures outside of the speech domain as a means of constraining models of speech perception. Although neurobiological models of speech perception have mainly focused on cerebral cortex, research outside the speech domain is consistent with the possibility of significant subcortical contributions in category learning. Here, we review the functional role of one such structure, the basal ganglia. We examine research from animal electrophysiology, human neuroimaging, and behavior to consider characteristics of basal ganglia processing that may be advantageous for speech category learning. We also present emerging evidence for a direct role for basal ganglia in learning auditory categories in a complex, naturalistic task intended to model the incidental manner in which speech categories are acquired. To conclude, we highlight new research questions that arise in incorporating the broader neuroscience research literature in modeling speech perception, and suggest how understanding contributions of the basal ganglia can inform attempts to optimize training protocols for learning non-native speech categories in adulthood.
Full Text Available Speech-language therapists (SLTs working in the context of cultural and linguistic diversity face considerable challenges in providing equitable services to all clients. This is complicated by the fact that the majority of SLTs in South Africa are English or Afrikaans speakers, while the majority of the population have a home language other than English/Afrikaans. Consequently, SLTs are often forced to call on untrained personnel to act as interpreters or translators, and to utilise informally translated materials in the assessment and management of clients with communication impairments. However, variations in translation have the potential to considerably alter intervention plans. This study explored whether the linguistic complexity conveyed in translation of the Western Aphasia Battery (WAB test changed when translated from English to isiZulu by five different first-language IsiZulu speakers. A qualitative comparative research design was adopted and results were analysed using comparative data analysis. Results revealed notable differences in the translations, with most differences relating to vocabulary and semantics. This finding holds clinical implications for the use of informal translators as well as for the utilisation of translated material in the provision of speech-language therapy services in multilingual contexts. This study highlights the need for cautious use of translators and/or translated materials that are not appropriately and systematically adapted for local usage. Further recommendations include a call for intensified efforts in the transformation of the profession within the country, specifically by attracting greater numbers of students who are fluent in African languages.
Christiner, Markus; Reiterer, Susanne M.
In previous research on speech imitation, musicality, and an ability to sing were isolated as the strongest indicators of good pronunciation skills in foreign languages. We, therefore, wanted to take a closer look at the nature of the ability to sing, which shares a common ground with the ability to imitate speech. This study focuses on whether good singing performance predicts good speech imitation. Forty-one singers of different levels of proficiency were selected for the study and their ability to sing, to imitate speech, their musical talent and working memory were tested. Results indicated that singing performance is a better indicator of the ability to imitate speech than the playing of a musical instrument. A multiple regression revealed that 64% of the speech imitation score variance could be explained by working memory together with educational background and singing performance. A second multiple regression showed that 66% of the speech imitation variance of completely unintelligible and unfamiliar language stimuli (Hindi) could be explained by working memory together with a singer's sense of rhythm and quality of voice. This supports the idea that both vocal behaviors have a common grounding in terms of vocal and motor flexibility, ontogenetic and phylogenetic development, neural orchestration and auditory memory with singing fitting better into the category of “speech” on the productive level and “music” on the acoustic level. As a result, good singers benefit from vocal and motor flexibility, productively and cognitively, in three ways. (1) Motor flexibility and the ability to sing improve language and musical function. (2) Good singers retain a certain plasticity and are open to new and unusual sound combinations during adulthood both perceptually and productively. (3) The ability to sing improves the memory span of the auditory working memory. PMID:24319438
... to being completely unable to speak or understand speech. Causes include Hearing disorders and deafness Voice problems, ... or those caused by cleft lip or palate Speech problems like stuttering Developmental disabilities Learning disorders Autism ...
Xie, Jiushu; Wang, Ruiming; Sun, Xun; Chang, Song
The effect of color and shape load on conceptual processing was studied. Perceptual load effects have been found in visual and auditory conceptual processing, supporting the theory of embodied cognition. However, whether different types of visual concepts, such as color and shape, share the same perceptual load effects is unknown. In the current experiment, 32 participants were administered simultaneous perceptual and conceptual tasks to assess the relation between perceptual load and conceptual processing. Keeping color load in mind obstructed color conceptual processing. Hence, perceptual processing and conceptual load shared the same resources, suggesting embodied cognition. Color conceptual processing was not affected by shape pictures, indicating that different types of properties within vision were separate.
Giles, Melanie; Barker, Mary; Hayes, Amanda
Speech language pathologists play an important role in the care of patients with speech, language, or swallowing difficulties that can result from a variety of medical conditions. This article describes how speech language pathologists assess and treat these conditions and the red flags that suggest a referral to a speech language pathologist is indicated.
Rämö, Jussi; Christensen, Lasse; Bech, Søren
This paper focuses on validating a perceptual distraction model, which aims to predict user’s perceived distraction caused by audio-on-audio interference, e.g., two competing audio sources within the same listening space. Originally, the distraction model was trained with music-on-music stimuli...... using a simple loudspeaker setup, consisting of only two loudspeakers, one for the target sound source and the other for the interfering sound source. Recently, the model was successfully validated in a complex personal sound-zone system with speech-on-music stimuli. Second round of validations were...... conducted by physically altering the sound-zone system and running a set of new listening experiments utilizing two sound zones within the sound-zone system. Thus, validating the model using a different sound-zone system with both speech-on-music and music-on-speech stimuli sets. Preliminary results show...
"It's the Way You Talk to Them." The Child's Environment: Early Years Practitioners' Perceptions of Its Influence on Speech and Language Development, Its Assessment and Environment Targeted Interventions
Marshall, Julie; Lewis, Elizabeth
Speech and language delay occurs in approximately 6% of the child population, and interventions to support this group of children focus on the child and/or the communicative environment. Evidence about the effectiveness of interventions that focus on the environment as well as the (reported) practices of speech and language therapists (SLTs) and…
Loukina, Anastassia; Buzick, Heather
This study is an evaluation of the performance of automated speech scoring for speakers with documented or suspected speech impairments. Given that the use of automated scoring of open-ended spoken responses is relatively nascent and there is little research to date that includes test takers with disabilities, this small exploratory study focuses…
Phifer, Gregg, Ed.
The 17 articles in this collection deal with theoretical and practical freedom of speech issues. The topics include: freedom of speech in Marquette Park, Illinois; Nazis in Skokie, Illinois; freedom of expression in the Confederate States of America; Robert M. LaFollette's arguments for free speech and the rights of Congress; the United States…
Edwards, Brent W.; van Tasell, Dianne J.
Hearing aid capabilities have increased dramatically over the past six years, in large part due to the development of small, low-power digital signal processing chips suitable for hearing aid applications. As hearing aid signal processing capabilities increase, there will be new opportunities to apply perceptually based knowledge to technological development. Most hearing loss compensation techniques in today's hearing aids are based on simple estimates of audibility and loudness. As our understanding of the psychoacoustical and physiological characteristics of sensorineural hearing loss improves, the result should be improved design of hearing aids and fitting methods. The state of the art in hearing aids will be reviewed, including form factors, user requirements, and technology that improves speech intelligibility, sound quality, and functionality. General areas of auditory perception that remain unaddressed by current hearing aid technology will be discussed.
Yunusova, Yana; Green, Jordan R; Wang, Jun; Pattee, Gary; Zinman, Lorne
Improved methods for assessing bulbar impairment are necessary for expediting diagnosis of bulbar dysfunction in ALS, for predicting disease progression across speech subsystems, and for addressing the critical need for sensitive outcome measures for ongoing experimental treatment trials. To address this need, we are obtaining longitudinal profiles of bulbar impairment in 100 individuals based on a comprehensive instrumentation-based assessment that yield objective measures. Using instrumental approaches to quantify speech-related behaviors is very important in a field that has primarily relied on subjective, auditory-perceptual forms of speech assessment(1). Our assessment protocol measures performance across all of the speech subsystems, which include respiratory, phonatory (laryngeal), resonatory (velopharyngeal), and articulatory. The articulatory subsystem is divided into the facial components (jaw and lip), and the tongue. Prior research has suggested that each speech subsystem responds differently to neurological diseases such as ALS. The current protocol is designed to test the performance of each speech subsystem as independently from other subsystems as possible. The speech subsystems are evaluated in the context of more global changes to speech performance. These speech system level variables include speaking rate and intelligibility of speech. The protocol requires specialized instrumentation, and commercial and custom software. The respiratory, phonatory, and resonatory subsystems are evaluated using pressure-flow (aerodynamic) and acoustic methods. The articulatory subsystem is assessed using 3D motion tracking techniques. The objective measures that are used to quantify bulbar impairment have been well established in the speech literature and show sensitivity to changes in bulbar function with disease progression. The result of the assessment is a comprehensive, across-subsystem performance profile for each participant. The profile, when compared to
Basilakos, Alexandra; Yourganov, Grigori; den Ouden, Dirk-Bart; Fogerty, Daniel; Rorden, Chris; Feenaughty, Lynda; Fridriksson, Julius
Purpose: Apraxia of speech (AOS) is a consequence of stroke that frequently co-occurs with aphasia. Its study is limited by difficulties with its perceptual evaluation and dissociation from co-occurring impairments. This study examined the classification accuracy of several acoustic measures for the differential diagnosis of AOS in a sample of…
Werker, Janet F.; Pons, Ferran; Dietrich, Christiane; Kajikawa, Sachiyo; Fais, Laurel; Amano, Shigeaki
Across the first year of life, infants show decreased sensitivity to phonetic differences not used in the native language [Werker, J. F., & Tees, R. C. (1984). Cross-language speech perception: evidence for perceptual reorganization during the first year of life. "Infant Behaviour and Development," 7, 49-63]. In an artificial language learning…
Full Text Available The paper deals with the methodology for speech quality measuring in GSM networks using Perceptual Evaluation of Speech Quality (PESQ. The paper brings results of practical measurement of own GSM network build on the Universal Software Radio Peripheral (USRP N210 hardware and OpenBTS software. This OpenBTS station was installed in open terrain, and the speech quality was measured from different distances from the transmitter. The limit parameters of OpenBTS station with USRP N210 were obtained.
Li, Tianhao; Fu, Qian-Jie
To determine whether perceptual adaptation improves voice gender discrimination of spectrally shifted vowels and, if so, which acoustic cues contribute to the improvement. Voice gender discrimination was measured for 10 normal-hearing subjects, during 5 days of adaptation to spectrally shifted vowels, produced by processing the speech of 5 male and 5 female talkers with 16-channel sine-wave vocoders. The subjects were randomly divided into 2 groups; one subjected to 50-Hz, and the other to 200-Hz, temporal envelope cutoff frequencies. No preview or feedback was provided. There was significant adaptation in voice gender discrimination with the 200-Hz cutoff frequency, but significant improvement was observed only for 3 female talkers with F(0) > 180 Hz and 3 male talkers with F(0) gender discrimination under spectral shift conditions with perceptual adaptation, but spectral shift may limit the exclusive use of spectral information and/or the use of formant structure on voice gender discrimination. The results have implications for cochlear implant users and for understanding voice gender discrimination.
Kennedy, Kristen M.; Rodrigue, Karen M.; Head, Denise; Gunning-Dixon, Faith; Raz, Naftali
Our objectives were to assess age differences in perceptual repetition priming and perceptual skill learning, and to determine whether they are mediated by cognitive resources and regional cerebral volume differences. Fragmented picture identification paradigm allows the study of both priming and learning within the same task. We presented this task to 169 adults (ages 18–80), assessed working memory and fluid intelligence, and measured brain volumes of regions that were deemed relevant to th...
Quiroga Martinez, David Ricardo; Hansen, Niels Christian; Højlund, Andreas
play a fundamental role in music perception. The mismatch negativity (MMN) is a brain response that offers a unique insight into these processes. The MMN is elicited by deviants in a series of repetitive sounds and reflects the perception of change in physical and abstract sound regularities. Therefore......, it is regarded as a prediction error signal and a neural correlate of the updating of predictive perceptual models. In music, the MMN has been particularly valuable for the assessment of musical expectations, learning and expertise. However, the MMN paradigm has an important limitation: its ecological validity....... To this aim we will develop a new paradigm using more real-sounding stimuli. Our stimuli will be two-part music excerpts made by adding a melody to a previous design based on the Alberti bass (Vuust et al., 2011). Our second goal is to determine how the complexity of this context affects the predictive...
Harrison N. Jones
Full Text Available Multiple reports have described patients with disordered articulation and prosody, often following acute aphasia, dysarthria, or apraxia of speech, which results in the perception by listeners of a foreign-like accent. These features led to the term foreign accent syndrome (FAS, a speech disorder with perceptual features that suggest an indistinct, non-native speaking accent. Also correctly known as psuedoforeign accent, the speech does not typically match a specific foreign accent, but is rather a constellation of speech features that result in the perception of a foreign accent by listeners. The primary etiologies of FAS are cerebrovascular accidents or traumatic brain injuries which affect cortical and subcortical regions critical to expressive speech and language production. Far fewer cases of FAS associated with psychiatric conditions have been reported. We will present the clinical history, neurological examination, neuropsychological assessment, cognitive-behavioral and biofeedback assessments, and motor speech examination of a patient with FAS without a known vascular, traumatic, or infectious precipitant. Repeated multidisciplinary examinations of this patient provided convergent evidence in support of FAS secondary to conversion disorder. We discuss these findings and their implications for evaluation and treatment of rare neurological and psychiatric conditions.
Neher, Tobias; Grimm, Giso; Hohmann, Volker
In a previous study, ) investigated whether pure-tone average (PTA) hearing loss and working memory capacity (WMC) modulate benefit from different binaural noise reduction (NR) settings. Results showed that listeners with smaller WMC preferred strong over moderate NR even at the expense of poorer speech recognition due to greater speech distortion (SD), whereas listeners with larger WMC did not. To enable a better understanding of these findings, the main aims of the present study were (1) to explore the perceptual consequences of changes to the signal mixture, target speech, and background noise caused by binaural NR, and (2) to determine whether response to these changes varies with WMC and PTA. As in the previous study, four age-matched groups of elderly listeners (with N = 10 per group) characterized by either mild or moderate PTAs and either better or worse performance on a visual measure of WMC participated. Five processing conditions were tested, which were based on the previously used (binaural coherence-based) NR scheme designed to attenuate diffuse signal components at mid to high frequencies. The five conditions differed in terms of the type of processing that was applied (no NR, strong NR, or strong NR with restoration of the long-term stimulus spectrum) and in terms of whether the target speech and background noise were processed in the same manner or whether one signal was left unprocessed while the other signal was processed with the gains computed for the signal mixture. Comparison across these conditions allowed assessing the effects of changes in high-frequency audibility (HFA), SD, and noise attenuation and distortion (NAD). Outcome measures included a dual-task paradigm combining speech recognition with a visual reaction time (VRT) task as well as ratings of perceived effort and overall preference. All measurements were carried out using headphone simulations of a frontal target speaker in a busy cafeteria. Relative to no NR, strong NR was found
Krull, Vidya; Humes, Larry E
The purpose of this experiment was to quantify the contribution of visual text to auditory speech recognition in background noise. Specifically, the authors tested the hypothesis that partially accurate visual text from an automatic speech recognizer could be used successfully to supplement speech understanding in difficult listening conditions in older adults, with normal or impaired hearing. The working hypotheses were based on what is known regarding audiovisual speech perception in the elderly from speechreading literature. We hypothesized that (1) combining auditory and visual text information will result in improved recognition accuracy compared with auditory or visual text information alone, (2) benefit from supplementing speech with visual text (auditory and visual enhancement) in young adults will be greater than that in older adults, and (3) individual differences in performance on perceptual measures would be associated with cognitive abilities. Fifteen young adults with normal hearing, 15 older adults with normal hearing, and 15 older adults with hearing loss participated in this study. All participants completed sentence recognition tasks in auditory-only, text-only, and combined auditory-text conditions. The auditory sentence stimuli were spectrally shaped to restore audibility for the older participants with impaired hearing. All participants also completed various cognitive measures, including measures of working memory, processing speed, verbal comprehension, perceptual and cognitive speed, processing efficiency, inhibition, and the ability to form wholes from parts. Group effects were examined for each of the perceptual and cognitive measures. Audiovisual benefit was calculated relative to performance on auditory- and visual-text only conditions. Finally, the relationship between perceptual measures and other independent measures were examined using principal-component factor analyses, followed by regression analyses. Both young and older adults
Full Text Available Initially, infants are capable of discriminating phonetic contrasts across the world’s languages. Starting between seven and ten months of age, they gradually lose this ability through a process of perceptual narrowing. Although traditionally investigated with isolated speech sounds, such narrowing occurs in a variety of perceptual domains (e.g., faces, visual speech. Thus far, tracking the developmental trajectory of this tuning process has been focused primarily on auditory speech alone, and generally using isolated sounds. But infants learn from speech produced by people talking to them, meaning they learn from a complex audiovisual signal. Here, we use near-infrared spectroscopy to measure blood concentration changes in the bilateral temporal cortices of infants in three different age groups: 3-to-6 months, 7-to-10 months, and 11-to-14-months. Critically, all three groups of infants were tested with continuous audiovisual speech in both their native and another, unfamiliar language. We found that at each age range, infants showed different patterns of cortical activity in response to the native and non-native stimuli. Infants in the youngest group showed bilateral cortical activity that was greater overall in response to non-native relative to native speech; the oldest group showed left lateralized activity in response to native relative to non-native speech. These results highlight perceptual tuning as a dynamic process that happens across modalities and at different levels of stimulus complexity.
Vick, Jennell C.; Moore, Christopher A.
This study examined the relationship between acoustic correlates of stress in trochaic (strong-weak), spondaic (strong-strong), and iambic (weak-strong) nonword bisyllables produced by children (30-50) with normal speech acquisition and children with speech delay. Ratios comparing the acoustic measures (vowel duration, rms, and f0) of the first syllable to the second syllable were calculated to evaluate the extent to which each phonetic parameter was used to mark stress. In addition, a calculation of the variability of jaw movement in each bisyllable was made. Finally, perceptual judgments of accuracy of stress production were made. Analysis of perceptual judgments indicated a robust difference between groups: While both groups of children produced errors in imitating the contrastive lexical stress models (~40%), the children with normal speech acquisition tended to produce trochaic forms in substitution for other stress types, whereas children with speech delay showed no preference for trochees. The relationship between segmental acoustic parameters, kinematic variability, and the ratings of stress by trained listeners will be presented.
Malmenholt, Ann; Lohmander, Anette; McAllister, Anita
The purpose of this study was to investigate current knowledge of the diagnosis childhood apraxia of speech (CAS) in Sweden and compare speech characteristics and symptoms to those of earlier survey findings in mainly English-speakers. In a web-based questionnaire 178 Swedish speech-language pathologists (SLPs) anonymously answered questions about their perception of typical speech characteristics for CAS. They graded own assessment skills and estimated clinical occurrence. The seven top speech characteristics reported as typical for children with CAS were: inconsistent speech production (85%), sequencing difficulties (71%), oro-motor deficits (63%), vowel errors (62%), voicing errors (61%), consonant cluster deletions (54%), and prosodic disturbance (53%). Motor-programming deficits described as lack of automatization of speech movements were perceived by 82%. All listed characteristics were consistent with the American Speech-Language-Hearing Association (ASHA) consensus-based features, Strand's 10-point checklist, and the diagnostic model proposed by Ozanne. The mode for clinical occurrence was 5%. Number of suspected cases of CAS in the clinical caseload was approximately one new patient/year and SLP. The results support and add to findings from studies of CAS in English-speaking children with similar speech characteristics regarded as typical. Possibly, these findings could contribute to cross-linguistic consensus on CAS characteristics.
Alemi, Minoo; Khanlarzadeh, Neda
The analysis of raters' comments on pragmatic assessment of L2 learners is among new and understudied concepts in second language studies. To shed light on this issue, the present investigation targeted important variables such as raters' criteria and rating patterns by analyzing the interlanguage pragmatic assessment process of the Iranian…
Gopi, E S
Digital Speech Processing Using Matlab deals with digital speech pattern recognition, speech production model, speech feature extraction, and speech compression. The book is written in a manner that is suitable for beginners pursuing basic research in digital speech processing. Matlab illustrations are provided for most topics to enable better understanding of concepts. This book also deals with the basic pattern recognition techniques (illustrated with speech signals using Matlab) such as PCA, LDA, ICA, SVM, HMM, GMM, BPN, and KSOM.
Kurylo, Daniel D; Waxman, Richard; Kidron, Rachel; Silverstein, Steven M
Training on visual tasks improves performance on basic and higher order visual capacities. Such improvement has been linked to changes in connectivity among mediating neurons. We investigated whether training effects occur for perceptual grouping. It was hypothesized that repeated engagement of integration mechanisms would enhance grouping processes. Thirty-six participants underwent 15 sessions of training on a visual discrimination task that required perceptual grouping. Participants viewed 20 × 20 arrays of dots or Gabor patches and indicated whether the array appeared grouped as vertical or horizontal lines. Across trials stimuli became progressively disorganized, contingent upon successful discrimination. Four visual dimensions were examined, in which grouping was based on similarity in luminance, color, orientation, and motion. Psychophysical thresholds of grouping were assessed before and after training. Results indicate that performance in all four dimensions improved with training. Training on a control condition, which paralleled the discrimination task but without a grouping component, produced no improvement. In addition, training on only the luminance and orientation dimensions improved performance for those conditions as well as for grouping by color, on which training had not occurred. However, improvement from partial training did not generalize to motion. Results demonstrate that a training protocol emphasizing stimulus integration enhanced perceptual grouping. Results suggest that neural mechanisms mediating grouping by common luminance and/or orientation contribute to those mediating grouping by color but do not share resources for grouping by common motion. Results are consistent with theories of perceptual learning emphasizing plasticity in early visual processing regions.
Huh, Young Eun; Park, Jongkyu; Suh, Mee Kyung; Lee, Sang Eun; Kim, Jumin; Jeong, Yuri; Kim, Hee-Tae; Cho, Jin Whan
In Parkinson variant of multiple system atrophy (MSA-P), patterns of early speech impairment and their distinguishing features from Parkinson's disease (PD) require further exploration. Here, we compared speech data among patients with early-stage MSA-P, PD, and healthy subjects using quantitative acoustic and perceptual analyses. Variables were analyzed for men and women in view of gender-specific features of speech. Acoustic analysis revealed that male patients with MSA-P exhibited more profound speech abnormalities than those with PD, regarding increased voice pitch, prolonged pause time, and reduced speech rate. This might be due to widespread pathology of MSA-P in nigrostriatal or extra-striatal structures related to speech production. Although several perceptual measures were mildly impaired in MSA-P and PD patients, none of these parameters showed a significant difference between patient groups. Detailed speech analysis using acoustic measures may help distinguish between MSA-P and PD early in the disease process. Copyright © 2015 Elsevier Inc. All rights reserved.
Nadig, Aparna; Shaw, Holly
Are there consistent markers of atypical prosody in speakers with high functioning autism (HFA) compared to typically-developing speakers? We examined: (1) acoustic measurements of pitch range, mean pitch and speech rate in conversation, (2) perceptual ratings of conversation for these features and overall prosody, and (3) acoustic measurements of…
The present study has surveyed post-editor trainees’ views and attitudes before and after the introduction of speech technology as a front end to a computer-aided translation workbench. The aim of the survey was (i) to identify attitudes and perceptions among post-editor trainees before performing...... a post-editing task using automatic speech recognition (ASR); and (ii) to assess the degree to which post-editors’ attitudes and expectations to the use of speech technology changed after actually using it. The survey was based on two questionnaires: the first one administered before the participants...
Full Text Available Listeners vary in their ability to understand speech in noisy environments. Hearing sensitivity, as measured by pure-tone audiometry, can only partly explain these results, and cognition has emerged as another key concept. Although cognition relates to speech perception, the exact nature of the relationship remains to be fully understood. This study investigates how different aspects of cognition, particularly working memory and attention, relate to speech intelligibility for various tests.Perceptual accuracy of speech perception represents just one aspect of functioning in a listening environment. Activity and participation limits imposed by hearing loss, in addition to the demands of a listening environment, are also important and may be better captured by self-report questionnaires. Understanding how speech perception relates to self-reported aspects of listening forms the second focus of the study.Forty-four listeners aged between 50-74 years with mild SNHL were tested on speech perception tests differing in complexity from low (phoneme discrimination in quiet, to medium (digit triplet perception in speech-shaped noise to high (sentence perception in modulated noise; cognitive tests of attention, memory, and nonverbal IQ; and self-report questionnaires of general health-related and hearing-specific quality of life.Hearing sensitivity and cognition related to intelligibility differently depending on the speech test: neither was important for phoneme discrimination, hearing sensitivity alone was important for digit triplet perception, and hearing and cognition together played a role in sentence perception. Self-reported aspects of auditory functioning were correlated with speech intelligibility to different degrees, with digit triplets in noise showing the richest pattern. The results suggest that intelligibility tests can vary in their auditory and cognitive demands and their sensitivity to the challenges that auditory environments pose on
Heinrich, Antje; Henshaw, Helen; Ferguson, Melanie A.
Listeners vary in their ability to understand speech in noisy environments. Hearing sensitivity, as measured by pure-tone audiometry, can only partly explain these results, and cognition has emerged as another key concept. Although cognition relates to speech perception, the exact nature of the relationship remains to be fully understood. This study investigates how different aspects of cognition, particularly working memory and attention, relate to speech intelligibility for various tests. Perceptual accuracy of speech perception represents just one aspect of functioning in a listening environment. Activity and participation limits imposed by hearing loss, in addition to the demands of a listening environment, are also important and may be better captured by self-report questionnaires. Understanding how speech perception relates to self-reported aspects of listening forms the second focus of the study. Forty-four listeners aged between 50 and 74 years with mild sensorineural hearing loss were tested on speech perception tests differing in complexity from low (phoneme discrimination in quiet), to medium (digit triplet perception in speech-shaped noise) to high (sentence perception in modulated noise); cognitive tests of attention, memory, and non-verbal intelligence quotient; and self-report questionnaires of general health-related and hearing-specific quality of life. Hearing sensitivity and cognition related to intelligibility differently depending on the speech test: neither was important for phoneme discrimination, hearing sensitivity alone was important for digit triplet perception, and hearing and cognition together played a role in sentence perception. Self-reported aspects of auditory functioning were correlated with speech intelligibility to different degrees, with digit triplets in noise showing the richest pattern. The results suggest that intelligibility tests can vary in their auditory and cognitive demands and their sensitivity to the challenges that
Análise comparativa entre avaliação fonoaudiológica perceptivo-auditiva, análise acústica e laringoscopias indiretas para avaliação vocal em população com queixa vocal Comparative analysis of perceptual evaluation, acoustic analysis and indirect laryngoscopy for vocal assessment of a population with vocal complaint
Full Text Available Com a evolução e o desenvolvimento tecnológico, houve mudanças nos métodos de avaliação da voz, tanto na prática médica como fonoaudiológica. OBJETIVO: Relacionar os resultados da avaliação perceptivo-auditiva vocal, análise acústica e avaliações médicas no diagnóstico de alterações vocais e/ou laríngeas em indivíduos com queixa vocal. FORMA DE ESTUDO: Clínico prospectivo. MATERIAL E MÉTODO: Foram avaliados 29 indivíduos que participaram de uma ação de proteção de saúde. Os sujeitos foram submetidos à avaliação fonoaudiológica peceptivo-auditiva (AFPA, análise acústica (AA, laringoscopia indireta (LI e telelaringoscopia (TL. RESULTADOS: Foram estabelecidas as relações entre os métodos de avaliação médica e fonoaudiológica, verificando possíveis significâncias estatísticas a partir da aplicação do Teste Exato de Fischer. Houve significância estatística na relação entre AFPA e LI, AFPA e TL, LI e TL. CONCLUSÃO: Esta pesquisa realizada numa ação de proteção de saúde vocal mostrou concordância entre a avaliação fonoaudiológica perceptivo-auditiva e as avaliações médicas, bem como os exames médicos entre si no diagnóstico de alterações vocais e/ou laríngeas.As a result of technology evolution and development, methods of voice evaluation have changed both in medical and speech and language pathology practice. AIM: To relate the results of perceptual evaluation, acoustic analysis and medical evaluation in the diagnosis of vocal and/or laryngeal affections of the population with vocal complaint. STUDY DESIGN: Clinical prospective. MATERIAL AND METHOD: 29 people that attended vocal health protection campaign were evaluated. They were submitted to perceptual evaluation (AFPA, acoustic analysis (AA, indirect laryngoscopy (LI and telelaryngoscopy (TL. RESULTS: Correlations between medical and speech language pathology evaluation methods were established, verifying possible statistical
Full Text Available The influence of experience with human speech sounds on speech perception in budgerigars, vocal mimics whose speech exposure can be tightly controlled in a laboratory setting, was measured. Budgerigars were divided into groups that differed in auditory exposure and then tested on a cue-trading identification paradigm with synthetic speech. Phonetic cue trading is a perceptual phenomenon observed when changes on one cue dimension are offset by changes in another cue dimension while still maintaining the same phonetic percept. The current study examined whether budgerigars would trade the cues of voice onset time (VOT and the first formant onset frequency when identifying syllable initial stop consonants and if this would be influenced by exposure to speech sounds. There were a total of four different exposure groups: No speech exposure (completely isolated, Passive speech exposure (regular exposure to human speech, and two Speech-trained groups. After the exposure period, all budgerigars were tested for phonetic cue trading using operant conditioning procedures. Birds were trained to peck keys in response to different synthetic speech sounds that began with "d" or "t" and varied in VOT and frequency of the first formant at voicing onset. Once training performance criteria were met, budgerigars were presented with the entire intermediate series, including ambiguous sounds. Responses on these trials were used to determine which speech cues were used, if a trading relation between VOT and the onset frequency of the first formant was present, and whether speech exposure had an influence on perception. Cue trading was found in all birds and these results were largely similar to those of a group of humans. Results indicated that prior speech experience was not a requirement for cue trading by budgerigars. The results are consistent with theories that explain phonetic cue trading in terms of a rich auditory encoding of the speech signal.
Clark, Torin K.; Lu, Yue M.; Karmali, Faisal
Perceptual decision making is fundamental to a broad range of fields including neurophysiology, economics, medicine, advertising, law, etc. Although recent findings have yielded major advances in our understanding of perceptual decision making, decision making as a function of time and frequency (i.e., decision-making dynamics) is not well understood. To limit the review length, we focus most of this review on human findings. Animal findings, which are extensively reviewed elsewhere, are included when beneficial or necessary. We attempt to put these various findings and data sets, which can appear to be unrelated in the absence of a formal dynamic analysis, into context using published models. Specifically, by adding appropriate dynamic mechanisms (e.g., high-pass filters) to existing models, it appears that a number of otherwise seemingly disparate findings from the literature might be explained. One hypothesis that arises through this dynamic analysis is that decision making includes phasic (high pass) neural mechanisms, an evidence accumulator and/or some sort of midtrial decision-making mechanism (e.g., peak detector and/or decision boundary). PMID:26467513
Chiang, I-Ping; Lin, Chih-Ying; Wang, Kaisheng M
Many companies have launched their products or services online as a new business focus, but only a few of them have survived the competition and made profits. The most important key to an online business's success is to create "brand value" for the customers. Although the concept of online brand has been discussed in previous studies, there is no empirical study on the measurement of online branding. As Web 2.0 emerges to be critical to online branding, the purpose of this study was to measure Taiwan's major Web sites with a number of personality traits to build a perceptual map for online brands. A pretest identified 10 most representative online brand perceptions. The results of the correspondence analysis showed five groups in the perceptual map. This study provided a practical view of the associations and similarities among online brands for potential alliance or branding strategies. The findings also suggested that brand perceptions can be used with identified consumer needs and behaviors to better position online services. The brand perception map in the study also contributed to a better understanding of the online brands in Taiwan.
Ghirardi, Gian Carlo; Romano, Raffaele
Theories including a collapse mechanism have been presented various years ago. They are based on a modification of standard quantum mechanics in which nonlinear and stochastic terms are added to the evolution equation. Their principal merits derive from the fact that they are mathematically precise schemes accounting, on the basis of a unique universal dynamical principle, both for the quantum behavior of microscopic systems as well as for the reduction associated to measurement processes and for the classical behavior of macroscopic objects. Since such theories qualify themselves not as new interpretations but as modifications of the standard theory they can be, in principle, tested against quantum mechanics. Recently, various investigations identifying possible crucial test have been discussed. In spite of the extreme difficulty to perform such tests it seems that recent technological developments allow at least to put precise limits on the parameters characterizing the modifications of the evolution equation. Here we will simply mention some of the recent investigations in this direction, while we will mainly concentrate our attention to the way in which collapse theories account for definite perceptual process. The differences between the case of reductions induced by perceptions and those related to measurement procedures by means of standard macroscopic devices will be discussed. On this basis, we suggest a precise experimental test of collapse theories involving conscious observers. We make plausible, by discussing in detail a toy model, that the modified dynamics can give rise to quite small but systematic errors in the visual perceptual process.
It is often hypothesized that young children's difficulties with producing weak-strong (iambic) prosodic forms arise from perceptual or linguistically based production factors. A third possible contributor to errors in the iambic form may be biological constraints, or biases, of the motor system. In the present study, 7 children with specific language impairment (SLI) and speech deficits were matched to same age peers. Multiple levels of analysis, including kinematic (modulation and stability of movement), acoustic, and transcription, were applied to children's productions of iambic (weak-strong) and trochaic (strong-weak) prosodic forms. Findings suggest that a motor bias toward producing unmodulated rhythmic articulatory movements, similar to that observed in canonical babbling, contribute to children's acquisition of metrical forms. Children with SLI and speech deficits show less mature segmental and speech motor systems, as well as decreased modulation of movement in later developing iambic forms. Further, components of prosodic and segmental acquisition develop independently and at different rates.
Docherty, Nancy M.; McCleery, Amanda; Divilbiss, Marielle; Schumann, Emily B.; Moe, Aubrey; Shakeel, Mohammed K.
Disordered speech in schizophrenia impairs social functioning because it impedes communication with others. Treatment approaches targeting this symptom have been limited by an incomplete understanding of its causes. This study examined the process underpinnings of speech disorder, assessed in terms of communication failure. Contributions of impairments in 2 social cognitive abilities, emotion perception and theory of mind (ToM), to speech disorder were assessed in 63 patients with schizophren...
Speech processing by human listeners derives meaning from acoustic input via intermediate steps involving abstract representations of what has been heard. Recent results from several lines of research are here brought together to shed light on the nature and role of these representations. In spoken-word recognition, representations of phonological form and of conceptual content are dissociable. This follows from the independence of patterns of priming for a word's form and its meaning. The nature of the phonological-form representations is determined not only by acoustic-phonetic input but also by other sources of information, including metalinguistic knowledge. This follows from evidence that listeners can store two forms as different without showing any evidence of being able to detect the difference in question when they listen to speech. The lexical representations are in turn separate from prelexical representations, which are also abstract in nature. This follows from evidence that perceptual learning about speaker-specific phoneme realization, induced on the basis of a few words, generalizes across the whole lexicon to inform the recognition of all words containing the same phoneme. The efficiency of human speech processing has its basis in the rapid execution of operations over abstract representations.
Bitner, Rachel M.; Begault, Durand R.
Humans may be exposed to whole-body vibration in environments where clear speech communications are crucial, particularly during the launch phases of space flight and in high-performance aircraft. Prior research has shown that high levels of vibration cause a decrease in speech intelligibility. However, the effects of whole-body vibration upon speech are not well understood, and no attempt has been made to restore speech distorted by whole-body vibration. In this paper, a model for speech under whole-body vibration is proposed and a method to remove its effect is described. The method described reduces the perceptual effects of vibration, yields higher ASR accuracy scores, and may significantly improve intelligibility. Possible applications include incorporation within communication systems to improve radio-communication systems in environments such a spaceflight, aviation, or off-road vehicle operations.
Zeremdini, Jihen; Ben Messaoud, Mohamed Anouar; Bouzid, Aicha
Humans have the ability to easily separate a composed speech and to form perceptual representations of the constituent sources in an acoustic mixture thanks to their ears. Until recently, researchers attempt to build computer models of high-level functions of the auditory system. The problem of the composed speech segregation is still a very challenging problem for these researchers. In our case, we are interested in approaches that are addressed to the monaural speech segregation. For this purpose, we study in this paper the computational auditory scene analysis (CASA) to segregate speech from monaural mixtures. CASA is the reproduction of the source organization achieved by listeners. It is based on two main stages: segmentation and grouping. In this work, we have presented, and compared several studies that have used CASA for speech separation and recognition.
A.A. Salah (Albert Ali); O. Tanrı dağ
htmlabstractHumans perceive the world through different perceptual modalities, which are processed in the brain by modality-specific areas and structures. However, there also exist multimodal neurons and areas, specialized in integrating perceptual information to enhance or suppress brain response.
Olsen, Sune L.; Agerkvist, Finn T.; MacDonald, Ewen
While non-linear distortion in loudspeakers decreases audio quality, the perceptual consequences can vary substantially. This paper investigates the metric Rnonlin  which was developed to predict subjective measurements of sound quality in nonlinear systems. The generalisability of the metric...... the perceptual consequences of non-linear distortion....
Full Text Available Barsalou's (1999 perceptual theory of knowledge echoes the pre-20th century tradition of conceptualizing all knowledge as inherently perceptual. Hence conceptual space has an infinite number of dimensions and heavily relies on perceptual experience. Osgood's (1952 semantic differential technique was developed as a bridge between perception and semantics. We updated Osgood's methodology in order to investigate current issues in visual cognition by: (1 using a 2D rather than a 1D space to place the concepts, (2 having dimensions that were perceptual while the targets were conceptual, (3 coupling visual experience with another two perceptual domains (audition and touch, (4 analyzing the data using MDS (not factor analysis. In three experiments, subjects (N = 57 judged five concrete and five abstract words on seven bipolar scales in three perceptual modalities. The 2D space led to different patterns of response compared to the classic 1D space. MDS revealed that perceptual modalities are not equally informative for mapping word-meaning distances (Mantel min = −.23; Mantel max = .88. There was no reliable differences due to test administration modality (paper vs. computer, nor scale orientation. The present findings are consistent with multidimensionality of conceptual space, a perceptual basis for knowledge, and dynamic characteristics of concepts discussed in contemporary theories.
Bedford, Felice L.
Addresses two questions that may be unique to perceptual learning: What are the circumstances that produce learning? and What is the content of learning? Suggests a critical principle for each question. Provides a discussion of perceptual learning theory, how learning occurs, and what gets learned. Includes a 121-item bibliography. (DR)
Maniscalco, Brian; McCurdy, Li Yan; Odegaard, Brian; Lau, Hakwan
Why do experimenters give subjects short breaks in long behavioral experiments? Whereas previous studies suggest it is difficult to maintain attention and vigilance over long periods of time, it is unclear precisely what mechanisms benefit from rest after short experimental blocks. Here, we evaluate decline in both perceptual performance and metacognitive sensitivity (i.e., how well confidence ratings track perceptual decision accuracy) over time and investigate whether characteristics of prefrontal cortical areas correlate with these measures. Whereas a single-process signal detection model predicts that these two forms of fatigue should be strongly positively correlated, a dual-process model predicts that rates of decline may dissociate. Here, we show that these measures consistently exhibited negative or near-zero correlations, as if engaged in a trade-off relationship, suggesting that different mechanisms contribute to perceptual and metacognitive decisions. Despite this dissociation, the two mechanisms likely depend on common resources, which could explain their trade-off relationship. Based on structural MRI brain images of individual human subjects, we assessed gray matter volume in the frontal polar area, a region that has been linked to visual metacognition. Variability of frontal polar volume correlated with individual differences in behavior, indicating the region may play a role in supplying common resources for both perceptual and metacognitive vigilance. Additional experiments revealed that reduced metacognitive demand led to superior perceptual vigilance, providing further support for this hypothesis. Overall, results indicate that during breaks between short blocks, it is the higher-level perceptual decision mechanisms, rather than lower-level sensory machinery, that benefit most from rest. Perceptual task performance declines over time (the so-called vigilance decrement), but the relationship between vigilance in perception and metacognition has
Eisner, Frank; McGettigan, Carolyn; Faulkner, Andrew; Rosen, Stuart; Scott, Sophie K
This study investigated the neural plasticity associated with perceptual learning of a cochlear implant (CI) simulation. Normal-hearing listeners were trained with vocoded and spectrally shifted speech simulating a CI while cortical responses were measured with functional magnetic resonance imaging (fMRI). A condition in which the vocoded speech was spectrally inverted provided a control for learnability and adaptation. Behavioral measures showed considerable individual variability both in the ability to learn to understand the degraded speech, and in phonological working memory capacity. Neurally, left-lateralized regions in superior temporal sulcus and inferior frontal gyrus (IFG) were sensitive to the learnability of the simulations, but only the activity in prefrontal cortex correlated with interindividual variation in intelligibility scores and phonological working memory. A region in left angular gyrus (AG) showed an activation pattern that reflected learning over the course of the experiment, and covariation of activity in AG and IFG was modulated by the learnability of the stimuli. These results suggest that variation in listeners' ability to adjust to vocoded and spectrally shifted speech is partly reflected in differences in the recruitment of higher-level language processes in prefrontal cortex, and that this variability may further depend on functional links between the left inferior frontal gyrus and angular gyrus. Differences in the engagement of left inferior prefrontal cortex, and its covariation with posterior parietal areas, may thus underlie some of the variation in speech perception skills that have been observed in clinical populations of CI users.
Li, Jin-rang; Sun, Yan-yan; Xu, Wen
To design a speech voice sample text with all phonemes in Mandarin for subjective auditory perceptual evaluation of voice disorders. The principles for design of a speech voice sample text are: The short text should include the 21 initials and 39 finals, this may cover all the phonemes in Mandarin. Also, the short text should have some meanings. A short text was made out. It had 155 Chinese words, and included 21 initials and 38 finals (the final, ê, was not included because it was rarely used in Mandarin). Also, the text covered 17 light tones and one "Erhua". The constituent ratios of the initials and finals presented in this short text were statistically similar as those in Mandarin according to the method of similarity of the sample and population (r = 0.742, P text were statistically not similar as those in Mandarin (r = 0.731, P > 0.05). A speech voice sample text with all phonemes in Mandarin was made out. The constituent ratios of the initials and finals presented in this short text are similar as those in Mandarin. Its value for subjective auditory perceptual evaluation of voice disorders need further study.
Sandor, Aniko; Moses, Haifa
Speech alarms have been used extensively in aviation and included in International Building Codes (IBC) and National Fire Protection Association's (NFPA) Life Safety Code. However, they have not been implemented on space vehicles. Previous studies conducted at NASA JSC showed that speech alarms lead to faster identification and higher accuracy. This research evaluated updated speech and tone alerts in a laboratory environment and in the Human Exploration Research Analog (HERA) in a realistic setup.
Moerman, Mieke; Martens, Jean-Pierre; Dejonckere, Philippe
This article is a compilation of own research performed during the European COoperation in Science and Technology (COST) action 2103: 'Advance Voice Function Assessment', an initiative of voice and speech processing teams consisting of physicists, engineers, and clinicians. This manuscript concerns analyzing largely irregular voicing types, namely substitution voicing (SV) and adductor spasmodic dysphonia (AdSD). A specific perceptual rating scale (IINFVo) was developed, and the Auditory Model Based Pitch Extractor (AMPEX), a piece of software that automatically analyses running speech and generates pitch values in background noise, was applied. The IINFVo perceptual rating scale has been shown to be useful in evaluating SV. The analysis of strongly irregular voices stimulated a modification of the European Laryngological Society's assessment protocol which was originally designed for the common types of (less severe) dysphonia. Acoustic analysis with AMPEX demonstrates that the most informative features are, for SV, the voicing-related acoustic features and, for AdSD, the perturbation measures. Poor correlations between self-assessment and acoustic and perceptual dimensions in the assessment of highly irregular voices argue for a multidimensional approach.
Mostert, Pim; Kok, Peter; de Lange, Floris P.
A key question within systems neuroscience is how the brain translates physical stimulation into a behavioral response: perceptual decision making. To answer this question, it is important to dissociate the neural activity underlying the encoding of sensory information from the activity underlying the subsequent temporal integration into a decision variable. Here, we adopted a decoding approach to empirically assess this dissociation in human magnetoencephalography recordings. We used a funct...
An introduction is given to the the anatomy and the function of the ear, basic psychoacoustic matters (hearing threshold, loudness, masking), the speech signal and speech intelligibility. The lecture note is written for the course: Fundamentals of Acoustics and Noise Control (51001)......An introduction is given to the the anatomy and the function of the ear, basic psychoacoustic matters (hearing threshold, loudness, masking), the speech signal and speech intelligibility. The lecture note is written for the course: Fundamentals of Acoustics and Noise Control (51001)...
It is becoming increasingly apparent that all forms of communication-including voice-will be transmitted through packet-switched networks based on the Internet Protocol (IP). Therefore, the design of modern devices that rely on speech interfaces, such as cell phones and PDAs, requires a complete and up-to-date understanding of the basics of speech coding. Outlines key signal processing algorithms used to mitigate impairments to speech quality in VoIP networksOffering a detailed yet easily accessible introduction to the field, Principles of Speech Coding provides an in-depth examination of the
Full Text Available Language therapy has trafficked from a medical focus until a preventive focus. However, difficulties are evidenced in the development of this last task, because he is devoted bigger space to the correction of the disorders of the language. Because the speech disorders is the dysfunction with more frequently appearance, acquires special importance the preventive work that is developed to avoid its appearance. Speech education since early age of the childhood makes work easier for prevent the appearance of speech disorders in the children. The present work has as objective to offer different activities for the prevention of the speech disorders.
Poock, G. K.; Martin, B. J.
This was an applied investigation examining the ability of a speech recognition system to recognize speakers' inputs when the speakers were under different stress levels. Subjects were asked to speak to a voice recognition system under three conditions: (1) normal office environment, (2) emotional stress, and (3) perceptual-motor stress. Results indicate a definite relationship between voice recognition system performance and the type of low stress reference patterns used to achieve recognition.
Nicolás Alejandro Serrano
Full Text Available The main objective of this paper is to show that perceptual conceptualism can be understood as an empirically meaningful position and, furthermore, that there is some degree of empirical support for its main theses. In order to do this, I will start by offering an empirical reading of the conceptualist position, and making three predictions from it. Then, I will consider recent experimental results from cognitive sciences that seem to point towards those predictions. I will conclude that, while the evidence offered by those experiments is far from decisive, it is enough not only to show that conceptualism is an empirically meaningful position but also that there is empirical support for it.
Bocast, Christopher S.
A portfolio dissertation that began as acoustic ecology and matured into perceptual ecology, centered on ecomusicology, bioacoustics, and translational audio-based media works with environmental perspectives. The place of music in Western eco-cosmology through time provides a basis for structuring an environmental history of human sound perception. That history suggests that music may stabilize human mental activity, and that an increased musical practice may be essential for the human project. An overview of recent antecedents preceding the emergence of acoustic ecology reveals structural foundations from 20th century culture that underpin modern sound studies. The contextual role that Aldo Leopold, Jacob von Uexkull, John Cage, Marshall McLuhan, and others played in anticipating the development of acoustic ecology as an interdiscipline is detailed. This interdisciplinary aspect of acoustic ecology is defined and defended, while new developments like soundscape ecology are addressed, though ultimately sound studies will need to embrace a broader concept of full-spectrum "sensory" or "perceptual" ecology. The bioacoustic fieldwork done on spawning sturgeon emphasized this necessity. That study yielded scientific recordings and spectrographic analyses of spawning sounds produced by lake sturgeon, Acipenser fulvescens, during reproduction in natural habitats in the Lake Winnebago watershed in Wisconsin. Recordings were made on the Wolf and Embarrass River during the 2011-2013 spawning seasons. Several specimens were dissected to investigate possible sound production mechanisms; no sonic musculature was found. Drumming sounds, ranging from 5 to 7 Hz fundamental frequency, verified the infrasonic nature of previously undocumented "sturgeon thunder". Other characteristic noises of sturgeon spawning including low-frequency rumbles and hydrodynamic sounds were identified. Intriguingly, high-frequency signals resembling electric organ discharges were discovered. These
recognition process. The relationship between acoustic results and perceptual accuracy is limited in this study suggesting that listeners incorporate acoustic and non-acoustic information to maximize speech intelligibility.
Menegueti, Katia Ignacio; Mangilli, Laura Davison; Alonso, Nivaldo; Andrade, Claudia Regina Furquim de
To characterize the profile and speech characteristics of patients undergoing primary palatoplasty in a Brazilian university hospital, considering the time of intervention (early, before two years of age; late, after two years of age). Participants were 97 patients of both genders with cleft palate and/or cleft and lip palate, assigned to the Speech-language Pathology Department, who had been submitted to primary palatoplasty and presented no prior history of speech-language therapy. Patients were divided into two groups: early intervention group (EIG) - 43 patients undergoing primary palatoplasty before 2 years of age and late intervention group (LIG) - 54 patients undergoing primary palatoplasty after 2 years of age. All patients underwent speech-language pathology assessment. The following parameters were assessed: resonance classification, presence of nasal turbulence, presence of weak intraoral air pressure, presence of audible nasal air emission, speech understandability, and compensatory articulation disorder (CAD). At statistical significance level of 5% (p≤0.05), no significant difference was observed between the groups in the following parameters: resonance classification (p=0.067); level of hypernasality (p=0.113), presence of nasal turbulence (p=0.179); presence of weak intraoral air pressure (p=0.152); presence of nasal air emission (p=0.369), and speech understandability (p=0.113). The groups differed with respect to presence of compensatory articulation disorders (p=0.020), with the LIG presenting higher occurrence of altered phonemes. It was possible to assess the general profile and speech characteristics of the study participants. Patients submitted to early primary palatoplasty present better speech profile.
Yao, Bo; Scheepers, Christoph
In human communication, direct speech (e.g., Mary said: "I'm hungry") is perceived to be more vivid than indirect speech (e.g., Mary said [that] she was hungry). However, the processing consequences of this distinction are largely unclear. In two experiments, participants were asked to either orally (Experiment 1) or silently (Experiment 2, eye-tracking) read written stories that contained either a direct speech or an indirect speech quotation. The context preceding those quotations described a situation that implied either a fast-speaking or a slow-speaking quoted protagonist. It was found that this context manipulation affected reading rates (in both oral and silent reading) for direct speech quotations, but not for indirect speech quotations. This suggests that readers are more likely to engage in perceptual simulations of the reported speech act when reading direct speech as opposed to meaning-equivalent indirect speech quotations, as part of a more vivid representation of the former. Copyright © 2011 Elsevier B.V. All rights reserved.
Full Text Available Foreign-accented speech often presents a challenging listening condition. In addition to deviations from the target speech norms related to the inexperience of the nonnative speaker, listener characteristics may play a role in determining intelligibility levels. We have previously shown that an implicit visual bias for associating East Asian faces and foreignness predicts the listeners’ perceptual ability to process Korean-accented English audiovisual speech (Yi et al., 2013. Here, we examine the neural mechanism underlying the influence of listener bias to foreign faces on speech perception. In a functional magnetic resonance imaging (fMRI study, native English speakers listened to native- and Korean-accented English sentences, with or without faces. The participants’ Asian-foreign association was measured using an implicit association test (IAT, conducted outside the scanner. We found that foreign-accented speech evoked greater activity in the bilateral primary auditory cortices and the inferior frontal gyri, potentially reflecting greater computational demand. Higher IAT scores, indicating greater bias, were associated with increased BOLD response to foreign-accented speech with faces in the primary auditory cortex, the early node for spectrotemporal analysis. We conclude the following: (1 foreign-accented speech perception places greater demand on the neural systems underlying speech perception; (2 face of the talker can exaggerate the perceived foreignness of foreign-accented speech; (3 implicit Asian-foreign association is associated with decreased neural efficiency in early spectrotemporal processing.
Full Text Available A 40-year-old, non-aphasic, right-handed, and polyglot (L1: French, L2: Dutch, L3: English woman with a 12 year history of addiction to opiates and psychoactive substances, and clear psychiatric problems, presented with a foreign accent of sudden onset in L1. Speech evolved towards a mostly fluent output, despite a stutter-like behavior and a marked grammatical output disorder. The psychogenic etiology of the accent foreignness was construed based upon the patient’s complex medical history, and psychodiagnostic, neuropsychological, and neurolinguistic assessments. The presence of a foreign accent was affirmed by a perceptual accent rating and attribution experiment. It is argued that this patient provides additional evidence demonstrating the outdatedness of Whitaker’s (1982 definition of Foreign Accent Syndrome, as only one of the four operational criteria was unequivocally applicable to our patient: her accent foreignness was not only recognized by her relatives and the medical staff, but also by a group of native French-speaking laymen. However, our patient defied the three remaining criteria, as central nervous system damage could not conclusively be demonstrated, psychodiagnostic assessment raised the hypothesis of a conversion disorder, and the patient was a polyglot whose newly gained accent was associated with a range of foreign languages, which exceeded the ones she spoke.
Chen, Zhaocong; Wong, Francis C K; Jones, Jeffery A; Li, Weifeng; Liu, Peng; Chen, Xi; Liu, Hanjun
Speech perception and production are intimately linked. There is evidence that speech motor learning results in changes to auditory processing of speech. Whether speech motor control benefits from perceptual learning in speech, however, remains unclear. This event-related potential study investigated whether speech-sound learning can modulate the processing of feedback errors during vocal pitch regulation. Mandarin speakers were trained to perceive five Thai lexical tones while learning to associate pictures with spoken words over 5 days. Before and after training, participants produced sustained vowel sounds while they heard their vocal pitch feedback unexpectedly perturbed. As compared to the pre-training session, the magnitude of vocal compensation significantly decreased for the control group, but remained consistent for the trained group at the post-training session. However, the trained group had smaller and faster N1 responses to pitch perturbations and exhibited enhanced P2 responses that correlated significantly with their learning performance. These findings indicate that the cortical processing of vocal pitch regulation can be shaped by learning new speech-sound associations, suggesting that perceptual learning in speech can produce transfer effects to facilitating the neural mechanisms underlying the online monitoring of auditory feedback regarding vocal production.
Understanding of the behavioural, cognitive and neural underpinnings of speech production is of interest theoretically, and is important for understanding disorders of speech production and how to assess and treat such disorders in the clinic. This paper addresses two claims about the neuromotor control of speech production: (1) speech is subserved by a distinct, specialised motor control system and (2) speech is holistic and cannot be decomposed into smaller primitives. Both claims have gained traction in recent literature, and are central to a task-dependent model of speech motor control. The purpose of this paper is to stimulate thinking about speech production, its disorders and the clinical implications of these claims. The paper poses several conceptual and empirical challenges for these claims - including the critical importance of defining speech. The emerging conclusion is that a task-dependent model is called into question as its two central claims are founded on ill-defined and inconsistently applied concepts. The paper concludes with discussion of methodological and clinical implications, including the potential utility of diadochokinetic (DDK) tasks in assessment of motor speech disorders and the contraindication of nonspeech oral motor exercises to improve speech function.
Werker, J F; Tees, R C
To comprehend and produce language, we must be able to recognize the sound patterns of our language and the rules for how these sounds "map on" to meaning. Human infants are born with a remarkable array of perceptual sensitivities that allow them to detect the basic properties that are common to the world's languages. During the first year of life, these sensitivities undergo modification reflecting an exquisite tuning to just that phonological information that is needed to map sound to meaning in the native language. We review this transition from language-general to language-specific perceptual sensitivity that occurs during the first year of life and consider whether the changes propel the child into word learning. To account for the broad-based initial sensitivities and subsequent reorganizations, we offer an integrated transactional framework based on the notion of a specialized perceptual-motor system that has evolved to serve human speech, but which functions in concert with other developing abilities. In so doing, we highlight the links between infant speech perception, babbling, and word learning.
Spriet, Ann; Van Deun, Lieselot; Eftaxiadis, Kyriaky; Laneau, Johan; Moonen, Marc; van Dijk, Bas; van Wieringen, Astrid; Wouters, Jan
This paper evaluates the benefit of the two-microphone adaptive beamformer BEAM in the Nucleus Freedom cochlear implant (CI) system for speech understanding in background noise by CI users. A double-blind evaluation of the two-microphone adaptive beamformer BEAM and a hardware directional microphone was carried out with five adult Nucleus CI users. The test procedure consisted of a pre- and post-test in the lab and a 2-wk trial period at home. In the pre- and post-test, the speech reception threshold (SRT) with sentences and the percentage correct phoneme scores for CVC words were measured in quiet and background noise at different signal-to-noise ratios. Performance was assessed for two different noise configurations (with a single noise source and with three noise sources) and two different noise materials (stationary speech-weighted noise and multitalker babble). During the 2-wk trial period at home, the CI users evaluated the noise reduction performance in different listening conditions by means of the SSQ questionnaire. In addition to the perceptual evaluation, the noise reduction performance of the beamformer was measured physically as a function of the direction of the noise source. Significant improvements of both the SRT in noise (average improvement of 5-16 dB) and the percentage correct phoneme scores (average improvement of 10-41%) were observed with BEAM compared to the standard hardware directional microphone. In addition, the SSQ questionnaire and subjective evaluation in controlled and real-life scenarios suggested a possible preference for the beamformer in noisy environments. The evaluation demonstrates that the adaptive noise reduction algorithm BEAM in the Nucleus Freedom CI-system may significantly increase the speech perception by cochlear implantees in noisy listening conditions. This is the first monolateral (adaptive) noise reduction strategy actually implemented in a mainstream commercial CI.
Cosman, Joshua D; Vecera, Shaun P
Attentional capture by abrupt onsets can be modulated by several factors, including the complexity, or perceptual load, of a scene. We have recently demonstrated that observers are less likely to be captured by abruptly appearing, task-irrelevant stimuli when they perform a search that is high, as opposed to low, in perceptual load (Cosman & Vecera, 2009), consistent with perceptual load theory. However, recent results indicate that onset frequency can influence stimulus-driven capture, with infrequent onsets capturing attention more often than did frequent onsets. Importantly, in our previous task, an abrupt onset was present on every trial, and consequently, attentional capture might have been affected by both onset frequency and perceptual load. In the present experiment, we examined whether onset frequency influences attentional capture under conditions of high perceptual load. When onsets were presented frequently, we replicated our earlier results; attentional capture by onsets was modulated under conditions of high perceptual load. Importantly, however, when onsets were presented infrequently, we observed robust capture effects. These results conflict with a strong form of load theory and, instead, suggest that exposure to the elements of a task (e.g., abrupt onsets) combines with high perceptual load to modulate attentional capture by task-irrelevant information.
Full Text Available This study describes the methodology used for designing a database of speech under real stress. Based on limits of existing stress databases, we used a communication task via a computer game to collect speech data. To validate the presence of stress, known psychophysiological indicators such as heart rate and electrodermal activity, as well as subjective self-assessment were used. This paper presents the data from first 5 speakers (3 men, 2 women who participated in initial tests of the proposed design. In 4 out of 5 speakers increases in fundamental frequency and intensity of speech were registered. Similarly, in 4 out of 5 speakers heart rate was significantly increased during the task, when compared with reference measurement from before the task. These first results show that proposed design might be appropriate for building a speech under stress database. However, there are still considerations that need to be addressed.
Kent, Ray D; Vorperian, Houri K
This review summarizes research on disorders of speech production in Down syndrome (DS) for the purposes of informing clinical services and guiding future research. Review of the literature was based on searches using MEDLINE, Google Scholar, PsycINFO, and HighWire Press, as well as consideration of reference lists in retrieved documents (including online sources). Search terms emphasized functions related to voice, articulation, phonology, prosody, fluency, and intelligibility. The following conclusions pertain to four major areas of review: voice, speech sounds, fluency and prosody, and intelligibility. The first major area is voice. Although a number of studies have reported on vocal abnormalities in DS, major questions remain about the nature and frequency of the phonatory disorder. Results of perceptual and acoustic studies have been mixed, making it difficult to draw firm conclusions or even to identify sensitive measures for future study. The second major area is speech sounds. Articulatory and phonological studies show that speech patterns in DS are a combination of delayed development and errors not seen in typical development. Delayed (i.e., developmental) and disordered (i.e., nondevelopmental) patterns are evident by the age of about 3 years, although DS-related abnormalities possibly appear earlier, even in infant babbling. The third major area is fluency and prosody. Stuttering and/or cluttering occur in DS at rates of 10%-45%, compared with about 1% in the general population. Research also points to significant disturbances in prosody. The fourth major area is intelligibility. Studies consistently show marked limitations in this area, but only recently has the research gone beyond simple rating scales.
Sündermann, Oliver; Hauschildt, Marit; Ehlers, Anke
Background Intrusive reexperiencing in posttraumatic stress disorder (PTSD) is commonly triggered by stimuli with perceptual similarity to those present during the trauma. Information processing theories suggest that perceptual processing during the trauma and enhanced perceptual priming contribute to the easy triggering of intrusive memories by these cues. Methods Healthy volunteers (N = 51) watched neutral and trauma picture stories on a computer screen. Neutral objects that were unrelated to the content of the stories briefly appeared in the interval between the pictures. Dissociation and data-driven processing (as indicators of perceptual processing) and state anxiety during the stories were assessed with self-report questionnaires. After filler tasks, participants completed a blurred object identification task to assess priming and a recognition memory task. Intrusive memories were assessed with telephone interviews 2 weeks and 3 months later. Results Neutral objects were more strongly primed if they occurred in the context of trauma stories than if they occurred during neutral stories, although the effect size was only moderate (ηp2=.08) and only significant when trauma stories were presented first. Regardless of story order, enhanced perceptual priming predicted intrusive memories at 2-week follow-up (N = 51), but not at 3 months (n = 40). Data-driven processing, dissociation and anxiety increases during the trauma stories also predicted intrusive memories. Enhanced perceptual priming and data-driven processing were associated with lower verbal intelligence. Limitations It is unclear to what extent these findings generalize to real-life traumatic events and whether they are specific to negative emotional events. Conclusions The results provide some support for the role of perceptual processing and perceptual priming in reexperiencing symptoms. PMID:23207970
Cera, Maysa Luchesi; Ortiz, Karin Zazo; Bertolucci, Paulo Henrique Ferreira; Minett, Thaís Soares Cianciarullo
Alzheimer's disease (AD) affects not only memory but also other cognitive functions, such as orientation, language, praxis, attention, visual perception, or executive function. Most studies on oral communication in AD focus on aphasia; however, speech and orofacial apraxias are also present in these patients. The aim of this study was to investigate the presence of speech and orofacial apraxias in patients with AD with the hypothesis that apraxia severity is strongly correlated with disease severity. Ninety participants in different stages of AD (mild, moderate, and severe) underwent the following assessments: Clinical Dementia Rating, Mini-Mental State Examination, Lawton Instrumental Activities of Daily Living, a specific speech and orofacial praxis assessment, and the oral agility subtest of the Boston diagnostic aphasia examination. The mean age was 80.2 ± 7.2 years and 73% were women. Patients with AD had significantly lower scores than normal controls for speech praxis (mean difference=-2.9, 95% confidence interval (CI)=-3.3 to -2.4) and orofacial praxis (mean difference=-4.9, 95% CI=-5.4 to -4.3). Dementia severity was significantly associated with orofacial apraxia severity (moderate AD: β =-19.63, p= 0.011; and severe AD: β =-51.68, p speech apraxia severity (moderate AD: β = 7.07, p = 0.001; and severe AD: β =8.16, p Speech and orofacial apraxias were evident in patients with AD and became more pronounced with disease progression.
Preston, Jonathan L.; Seki, Ayumi
Purpose: To describe (a) the assessment of residual speech sound disorders (SSDs) in bilinguals by distinguishing speech patterns associated with second language acquisition from patterns associated with misarticulations and (b) how assessment of domains such as speech motor control and phonological awareness can provide a more complete…
Jürgens, Tim; Brand, Thomas
This study compares the phoneme recognition performance in speech-shaped noise of a microscopic model for speech recognition with the performance of normal-hearing listeners. "Microscopic" is defined in terms of this model twofold. First, the speech recognition rate is predicted on a phoneme-by-phoneme basis. Second, microscopic modeling means that the signal waveforms to be recognized are processed by mimicking elementary parts of human's auditory processing. The model is based on an approach by Holube and Kollmeier [J. Acoust. Soc. Am. 100, 1703-1716 (1996)] and consists of a psychoacoustically and physiologically motivated preprocessing and a simple dynamic-time-warp speech recognizer. The model is evaluated while presenting nonsense speech in a closed-set paradigm. Averaged phoneme recognition rates, specific phoneme recognition rates, and phoneme confusions are analyzed. The influence of different perceptual distance measures and of the model's a-priori knowledge is investigated. The results show that human performance can be predicted by this model using an optimal detector, i.e., identical speech waveforms for both training of the recognizer and testing. The best model performance is yielded by distance measures which focus mainly on small perceptual distances and neglect outliers.
Waqas, A.; Muhammad, T.; Jamal, H.
Speech is the most essential method of correspondence of humankind. Cell telephony, portable hearing assistants and, hands free are specific provisions in this respect. The performance of these communication devices could be affected because of distortions which might augment them. There are two essential sorts of distortions that might be recognized, specifically: convolutive and additive noises. These mutilations contaminate the clean speech and make it unsatisfactory to human audiences i.e. perceptual value and intelligibility of speech signal diminishes. The objective of speech upgrade systems is to enhance the quality and understandability of speech to make it more satisfactory to audiences. This paper recommends a modified hybrid approach for single channel devices to process the noisy signals considering only the effect of background noises. It is a mixture of pre-processing relative spectral amplitude (RASTA) filter, which is approximated by a straight forward 4th order band-pass filter, and conventional minimum mean square error short time spectral amplitude (MMSE STSA85) estimator. To analyze the performance of the algorithm an objective parameter called Perceptual estimation of speech quality (PESQ) is measured. The results show that the modified algorithm performs well to remove the background noises. SIMULINK implementation is also performed and its profile report has been generated to observe the execution time. (author)
Meijers, A.W.M.; Tsohatzidis, S.L.
From its early development in the 1960s, speech act theory always had an individualistic orientation. It focused exclusively on speech acts performed by individual agents. Paradigmatic examples are ‘I promise that p’, ‘I order that p’, and ‘I declare that p’. There is a single speaker and a single
Authoritarian teaching practices in ballet inhibit the use of private speech. This paper highlights the critical importance of private speech in the cognitive development of young ballet students, within what is largely a non-verbal art form. It draws upon research by Russian psychologist Lev Vygotsky and contemporary socioculturalists, to…
Kane, Peter E., Ed.
The 11 articles in this collection deal with theoretical and practical freedom of speech issues. The topics covered are (1) the United States Supreme Court and communication theory; (2) truth, knowledge, and a democratic respect for diversity; (3) denial of freedom of speech in Jock Yablonski's campaign for the presidency of the United Mine…
Shearer, William M.
Written for students in the fields of speech correction and audiology, the text deals with the following: structures involved in respiration; the skeleton and the processes of inhalation and exhalation; phonation and pitch, the larynx, and esophageal speech; muscles involved in articulation; muscles involved in resonance; and the anatomy of the…
Kane, Peter E., Ed.
This issue of "Free Speech" contains the following articles: "Daniel Schoor Relieved of Reporting Duties" by Laurence Stern, "The Sellout at CBS" by Michael Harrington, "Defending Dan Schorr" by Tome Wicker, "Speech to the Washington Press Club, February 25, 1976" by Daniel Schorr, "Funds…
Masapollo, Matthew; Polka, Linda; Ménard, Lucie
To learn to produce speech, infants must effectively monitor and assess their own speech output. Yet very little is known about how infants perceive speech produced by an infant, which has higher voice pitch and formant frequencies compared to adult or child speech. Here, we tested whether pre-babbling infants (at 4-6 months) prefer listening to…
Marschik, Peter B; Vollmann, Ralf; Bartl-Pokorny, Katrin D; Green, Vanessa A; van der Meer, Larah; Wolin, Thomas; Einspieler, Christa
We assessed various aspects of speech-language and communicative functions of an individual with the preserved speech variant of Rett syndrome (RTT) to describe her developmental profile over a period of 11 years. For this study, we incorporated the following data resources and methods to assess speech-language and communicative functions during pre-, peri- and post-regressional development: retrospective video analyses, medical history data, parental checklists and diaries, standardized tests on vocabulary and grammar, spontaneous speech samples and picture stories to elicit narrative competences. Despite achieving speech-language milestones, atypical behaviours were present at all times. We observed a unique developmental speech-language trajectory (including the RTT typical regression) affecting all linguistic and socio-communicative sub-domains in the receptive as well as the expressive modality. Future research should take into consideration a potentially considerable discordance between formal and functional language use by interpreting communicative acts on a more cautionary note.
This paper will firstly examine the International framework of human rights law and its guidelines for safeguarding the right to freedom of speech in the press. Secondly, it will describe the constitutional and other legal rights protecting freedom of speech in Indonesia and assess their compatibility with the right to freedom of speech under the International human rights law framework. Thirdly it will consider the impact of Indonesia's constitutional law and criminal and civil law, includin...
Vanessa de Sousa
Full Text Available Abstract: Although the relationship between perceptual motor skills and attention is reported in the literature, few studies have empirically explored this association. Thus, the objective of this study was to investigate the relationship between these constructs, using the Bender-Gestalt Test: Gradual Scoring System (B-SPG and the Psychological Battery for Attention Assessment (BPA. The participants were 320 children from four public schools in a city located in the South of the state of Minas Gerais, with ages ranging from seven to 10 years (M = 8.39, SD = 1.10 and 196 (55.9 % female. The results showed negative, moderate and significant correlations between the total scores of the instruments, indicating the relationship between the constructs. Although the data has confirmed the existence of a relationship between perceptual motor skills and attention, further studies with samples from other regions are necessary.
Liu, Ping; Forte, Jason; Sewell, David; Carter, Olivia
Contrast-based early visual processing has largely been considered to involve autonomous processes that do not need the support of cognitive resources. However, as spatial attention is known to modulate early visual perceptual processing, we explored whether cognitive load could similarly impact contrast-based perception. We used a dual-task paradigm to assess the impact of a concurrent working memory task on the performance of three different early visual tasks. The results from Experiment 1 suggest that cognitive load can modulate early visual processing. No effects of cognitive load were seen in Experiments 2 or 3. Together, the findings provide evidence that under some circumstances cognitive load effects can penetrate the early stages of visual processing and that higher cognitive function and early perceptual processing may not be as independent as was once thought.
Coutinho, Eduardo; Dibben, Nicola
There is strong evidence of shared acoustic profiles common to the expression of emotions in music and speech, yet relatively limited understanding of the specific psychoacoustic features involved. This study combined a controlled experiment and computational modelling to investigate the perceptual codes associated with the expression of emotion in the acoustic domain. The empirical stage of the study provided continuous human ratings of emotions perceived in excerpts of film music and natural speech samples. The computational stage created a computer model that retrieves the relevant information from the acoustic stimuli and makes predictions about the emotional expressiveness of speech and music close to the responses of human subjects. We show that a significant part of the listeners' second-by-second reported emotions to music and speech prosody can be predicted from a set of seven psychoacoustic features: loudness, tempo/speech rate, melody/prosody contour, spectral centroid, spectral flux, sharpness, and roughness. The implications of these results are discussed in the context of cross-modal similarities in the communication of emotion in the acoustic domain.
Babel, Molly; McAuliffe, Michael; Haber, Graham
This study examines spontaneous phonetic accommodation of a dialect with distinct categories by speakers who are in the process of merging those categories. We focus on the merger of the NEAR and SQUARE lexical sets in New Zealand English, presenting New Zealand participants with an unmerged speaker of Australian English. Mergers-in-progress are a uniquely interesting sound change as they showcase the asymmetry between speech perception and production. Yet, we examine mergers using spontaneous phonetic imitation, which is phenomenon that is necessarily a behavior where perceptual input influences speech production. Phonetic imitation is quantified by a perceptual measure and an acoustic calculation of mergedness using a Pillai-Bartlett trace. The results from both analyses indicate spontaneous phonetic imitation is moderated by extra-linguistic factors such as the valence of assigned conditions and social bias. We also find evidence for a decrease in the degree of mergedness in post-exposure productions. Taken together, our results suggest that under the appropriate conditions New Zealanders phonetically accommodate to Australian English and that in the process of speech imitation, mergers-in-progress can, but do not consistently, become less merged.
Full Text Available This study examines spontaneous phonetic accommodation of a dialect with distinct categories by speakers who are in the process of merging those categories. We focus on the merger of the NEAR and SQUARE lexical sets in New Zealand English, presenting New Zealand participants with an unmerged speaker of Australian English. Mergers-in-progress are a uniquely interesting sound change as they showcase the asymmetry between speech perception and production. Yet, we examine mergers using spontaneous phonetic imitation, which is phenomenon that is necessarily a behavior where perceptual input influences speech production. Phonetic imitation is quantified by a perceptual measure and an acoustic calculation of mergedness using a Pillai-Bartlett trace. The results from both analyses indicate spontaneous phonetic imitation is moderated by extra-linguistic factors such as the valence of assigned conditions and social bias. We also find evidence for a decrease in the degree of mergedness in post-exposure productions. Taken together, our results suggest that under the appropriate conditions New Zealanders phonetically accommodate to Australian English and that in the process of speech imitation, mergers-in-progress can, but do not consistently, become less merged.
Full Text Available Over the course of development, speech sounds that are contrastive in one’s native language tend to become perceived categorically: that is, listeners are unaware of variation within phonetic categories while showing excellent sensitivity to speech sounds that span linguistically meaningful phonetic category boundaries. The end stage of this developmental process is that the perceptual systems that handle acoustic-phonetic information show special tuning to native language contrasts, and as such, category-level information appears to be present at even fairly low levels of the neural processing stream. Research on adults acquiring non-native speech categories offers an avenue for investigating the interplay of category-level information and perceptual sensitivities to these sounds as speech categories emerge. In particular, one can observe the neural changes that unfold as listeners learn not only to perceive acoustic distinctions that mark non-native speech sound contrasts, but also to map these distinctions onto category-level representations. An emergent literature on the neural basis of novel and non-native speech sound learning offers new insight into this question. In this review, I will examine this literature in order to answer two key questions. First, where in the neural pathway does sensitivity to category-level phonetic information first emerge over the trajectory of speech sound learning? Second, how do frontal and temporal brain areas work in concert over the course of non-native speech sound learning? Finally, in the context of this literature I will describe a model of speech sound learning in which rapidly-adapting access to categorical information in the frontal lobes modulates the sensitivity of stable, slowly-adapting responses in the temporal lobes.
Norton, Daniel J.; McBain, Ryan K.; Ongur, Dost; Chen, Yue
Schizophrenia patients exhibit perceptual and cognitive deficits, including in visual motion processing. Given that cognitive systems depend upon perceptual inputs, improving patients' perceptual abilities may be an effective means of cognitive intervention. In healthy people, motion perception can be enhanced through perceptual learning, but it…
Liberman, A. M.
This interim status report on speech research discusses the following topics: On Vagueness and Fictions as Cornerstones of a Theory of Perceiving and Acting: A Comment on Walter (1983); The Informational Support for Upright Stance; Determining the Extent of Coarticulation-effects of Experimental Design; The Roles of Phoneme Frequency, Similarity, and Availability in the Experimental Elicitation of Speech Errors; On Learning to Speak; The Motor Theory of Speech Perception Revised; Linguistic and Acoustic Correlates of the Perceptual Structure Found in an Individual Differences Scaling Study of Vowels; Perceptual Coherence of Speech: Stability of Silence-cued Stop Consonants; Development of the Speech Perceptuomotor System; Dependence of Reading on Orthography-Investigations in Serbo-Croatian; The Relationship between Knowledge of Derivational Morphology and Spelling Ability in Fourth, Sixth, and Eighth Graders; Relations among Regular and Irregular, Morphologically-Related Words in the Lexicon as Revealed by Repetition Priming; Grammatical Priming of Inflected Nouns by the Gender of Possessive Adjectives; Grammatical Priming of Inflected Nouns by Inflected Adjectives; Deaf Signers and Serial Recall in the Visual Modality-Memory for Signs, Fingerspelling, and Print; Did Orthographies Evolve?; The Development of Children's Sensitivity to Factors Inf luencing Vowel Reading.
Başkent, Deniz; Gaudrain, Etienne
Evidence for transfer of musical training to better perception of speech in noise has been mixed. Unlike speech-in-noise, speech-on-speech perception utilizes many of the skills that musical training improves, such as better pitch perception and stream segregation, as well as use of higher-level
Novelli-Olmstead, Tina; Ling, Daniel
Seven hearing impaired children (five to seven years old) assigned to the Speakers group made highly significant gains in speech production and auditory discrimination of speech, while Listeners made only slight speech production gains and no gains in auditory discrimination. Combined speech and auditory training was more effective than auditory…
Maher, Stephen; Ekstrom, Tor; Chen, Yue
Perception of subtle facial expressions is essential for social functioning; yet it is unclear if human perceptual sensitivities differ in detecting varying types of facial emotions. Evidence diverges as to whether salient negative versus positive emotions (such as sadness versus happiness) are preferentially processed. Here, we measured perceptual thresholds for the detection of four types of emotion in faces--happiness, fear, anger, and sadness--using psychophysical methods. We also evaluated the association of the perceptual performances with facial morphological changes between neutral and respective emotion types. Human observers were highly sensitive to happiness compared with the other emotional expressions. Further, this heightened perceptual sensitivity to happy expressions can be attributed largely to the emotion-induced morphological change of a particular facial feature (end-lip raise).
Gijs Joost Brouwer
Full Text Available We employed a parametric psychophysical design in combination with functional imaging to examine the influence of metric changes in perceptual incongruence on perceptual alternation rates and cortical responses. Subjects viewed a bistable stimulus defined by incongruent depth cues; bistability resulted from incongruence between binocular disparity and monocular perspective cues that specify different slants (slant rivalry. Psychophysical results revealed that perceptual alternation rates were positively correlated with the degree of perceived incongruence. Functional imaging revealed systematic increases in activity that paralleled the psychophysical results within anterior intraparietal sulcus, prior to the onset of perceptual alternations. We suggest that this cortical activity predicts the frequency of subsequent alternations, implying a putative causal role for these areas in initiating bistable perception. In contrast, areas implicated in form and depth processing (LOC and V3A were sensitive to the degree of slant, but failed to show increases in activity when these cues were in conflict.
Gilbert, Charles D; Li, Wu; Piech, Valentin
The visual cortex retains the capacity for experience-dependent changes, or plasticity, of cortical function and cortical circuitry, throughout life. These changes constitute the mechanism of perceptual learning in normal visual experience and in recovery of function after CNS damage. Such plasticity can be seen at multiple stages in the visual pathway, including primary visual cortex. The manifestation of the functional changes associated with perceptual learning involve both long term modification of cortical circuits during the course of learning, and short term dynamics in the functional properties of cortical neurons. These dynamics are subject to top-down influences of attention, expectation and perceptual task. As a consequence, each cortical area is an adaptive processor, altering its function in accordance to immediate perceptual demands.
Full Text Available Significant insights into visual cognition have come from studying real-world perceptual expertise. Many have previously reviewed empirical findings and theoretical developments from this work. Here we instead provide a brief perspective on approaches, considerations, and challenges to studying real-world perceptual expertise. We discuss factors like choosing to use real-world versus artificial object domains of expertise, selecting a target domain of real-world perceptual expertise, recruiting experts, evaluating their level of expertise, and experimentally testing experts in the lab and online. Throughout our perspective, we highlight expert birding (also called birdwatching as an example, as it has been used as a target domain for over two decades in the perceptual expertise literature.
Calvillo, Dustin P; Jackson, Russell E
Inattentional blindness is the failure to notice unexpected objects in a visual scene while engaging in an attention-demanding task. We examined the effects of animacy and perceptual load on inattentional blindness. Participants searched for a category exemplar under low or high perceptual load. On the last trial, the participants were exposed to an unexpected object that was either animate or inanimate. Unexpected objects were detected more frequently when they were animate rather than inanimate, and more frequently with low than with high perceptual loads. We also measured working memory capacity and found that it predicted the detection of unexpected objects, but only with high perceptual loads. The results are consistent with the animate-monitoring hypothesis, which suggests that animate objects capture attention because of the importance of the detection of animate objects in ancestral hunter-gatherer environments.
Aziz, Azza Adel; Shohdi, Sahar; Osman, Dalia Mostafa; Habib, Emad Iskander
Childhood apraxia of speech is a neurological childhood speech-sound disorder in which the precision and consistency of movements underlying speech are impaired in the absence of neuromuscular deficits. Children with childhood apraxia of speech and those with multiple phonological disorder share some common phonological errors that can be misleading in diagnosis. This study posed a question about a possible significant difference in language, speech and non-speech oral performances between children with childhood apraxia of speech, multiple phonological disorder and normal children that can be used for a differential diagnostic purpose. 30 pre-school children between the ages of 4 and 6 years served as participants. Each of these children represented one of 3 possible subject-groups: Group 1: multiple phonological disorder; Group 2: suspected cases of childhood apraxia of speech; Group 3: control group with no communication disorder. Assessment procedures included: parent interviews; testing of non-speech oral motor skills and testing of speech skills. Data showed that children with suspected childhood apraxia of speech showed significantly lower language score only in their expressive abilities. Non-speech tasks did not identify significant differences between childhood apraxia of speech and multiple phonological disorder groups except for those which required two sequential motor performances. In speech tasks, both consonant and vowel accuracy were significantly lower and inconsistent in childhood apraxia of speech group than in the multiple phonological disorder group. Syllable number, shape and sequence accuracy differed significantly in the childhood apraxia of speech group than the other two groups. In addition, children with childhood apraxia of speech showed greater difficulty in processing prosodic features indicating a clear need to address these variables for differential diagnosis and treatment of children with childhood apraxia of speech. Copyright (c
Schmälzle, Ralf; Häcker, Frank E K; Honey, Christopher J; Hasson, Uri
Powerful speeches can captivate audiences, whereas weaker speeches fail to engage their listeners. What is happening in the brains of a captivated audience? Here, we assess audience-wide functional brain dynamics during listening to speeches of varying rhetorical quality. The speeches were given by German politicians and evaluated as rhetorically powerful or weak. Listening to each of the speeches induced similar neural response time courses, as measured by inter-subject correlation analysis, in widespread brain regions involved in spoken language processing. Crucially, alignment of the time course across listeners was stronger for rhetorically powerful speeches, especially for bilateral regions of the superior temporal gyri and medial prefrontal cortex. Thus, during powerful speeches, listeners as a group are more coupled to each other, suggesting that powerful speeches are more potent in taking control of the listeners' brain responses. Weaker speeches were processed more heterogeneously, although they still prompted substantially correlated responses. These patterns of coupled neural responses bear resemblance to metaphors of resonance, which are often invoked in discussions of speech impact, and contribute to the literature on auditory attention under natural circumstances. Overall, this approach opens up possibilities for research on the neural mechanisms mediating the reception of entertaining or persuasive messages. © The Author (2015). Published by Oxford University Press. For Permissions, please email: firstname.lastname@example.org.
Cataldo, Dana Michelle; Migliano, Andrea Bamberg; Vinicius, Lucio
The 'technological hypothesis' proposes that gestural language evolved in early hominins to enable the cultural transmission of stone tool-making skills, with speech appearing later in response to the complex lithic industries of more recent hominins. However, no flintknapping study has assessed the efficiency of speech alone (unassisted by gesture) as a tool-making transmission aid. Here we show that subjects instructed by speech alone underperform in stone tool-making experiments in comparison to subjects instructed through either gesture alone or 'full language' (gesture plus speech), and also report lower satisfaction with their received instruction. The results provide evidence that gesture was likely to be selected over speech as a teaching aid in the earliest hominin tool-makers; that speech could not have replaced gesturing as a tool-making teaching aid in later hominins, possibly explaining the functional retention of gesturing in the full language of modern humans; and that speech may have evolved for reasons unrelated to tool-making. We conclude that speech is unlikely to have evolved as tool-making teaching aid superior to gesture, as claimed by the technological hypothesis, and therefore alternative views should be considered. For example, gestural language may have evolved to enable tool-making in earlier hominins, while speech may have later emerged as a response to increased trade and more complex inter- and intra-group interactions in Middle Pleistocene ancestors of Neanderthals and Homo sapiens; or gesture and speech may have evolved in parallel rather than in sequence.
Ash, Sharon; McMillan, Corey; Gross, Rachel G; Cook, Philip; Gunawardena, Delani; Morgan, Brianna; Boller, Ashley; Siderowf, Andrew; Grossman, Murray
Few studies have examined connected speech in demented and non-demented patients with Parkinson's disease (PD). We assessed the speech production of 35 patients with Lewy body spectrum disorder (LBSD), including non-demented PD patients, patients with PD dementia (PDD), and patients with dementia with Lewy bodies (DLB), in a semi-structured narrative speech sample in order to characterize impairments of speech fluency and to determine the factors contributing to reduced speech fluency in these patients. Both demented and non-demented PD patients exhibited reduced speech fluency, characterized by reduced overall speech rate and long pauses between sentences. Reduced speech rate in LBSD correlated with measures of between-utterance pauses, executive functioning, and grammatical comprehension. Regression analyses related non-fluent speech, grammatical difficulty, and executive difficulty to atrophy in frontal brain regions. These findings indicate that multiple factors contribute to slowed speech in LBSD, and this is mediated in part by disease in frontal brain regions. Copyright Â© 2011 Elsevier Inc. All rights reserved.
McCann, Robert S.; Foyle, David C.; Johnston, James C.; Hart, Sandra G. (Technical Monitor)
Previous work using Head-Up Displays (HUDs) suggests that the visual system parses the HUD and the outside world into distinct perceptual groups, with attention deployed sequentially to first one group and then the other. New experiments show that both groups can be processed in parallel in a divided attention search task, even though subjects have just processed a stimulus in one perceptual group or the other. Implications for models of visual attention will be discussed.
Within attention studies, Lavie's load theory (Lavie & Tsal, 1994; Lavie, Hirst, de Fockert, & Viding, 2004) presented an account that could settle the question whether attention selects stimuli to be processed at an early or late stage of cognitive processing. This theory relied on the concepts of "perceptual load" and "attentional capacity", proposing that attentional resources are automatically allocated to stimuli, but when the perceptual load of the stimuli exceeds person's capacity, tas...
Perceptual dialectology is dedicated to the formal study of folk linguistic perceptions. Through an amalgamation of social psychology, ethnography, dialectology, sociolinguistics, cultural geography and myriad other fields, perceptual dialectology provides a methodology to gain insight to overt folk language attitudes, knowledge of regional distribution, and the importance of language variation and change (Preston 1989, 1999a). This study conducts the first investigation of folk percept...
Full Text Available Prosodic phrasing, i.e. division of speech into intonation units, represents a phenomenon which is central to language comprehension. Incorrect prosodic boundary markings may lead to serious misunderstandings and ambiguous interpretations of utterances. The present paper investigates prosodic competencies of Czech students of French in the domain of prosodic phrasing in French read speech. Two texts of different length are examined through a perceptual method to observe how Czech speakers of French (B1–B2 level of CEFR divide read speech into prosodic units compared to French native speakers.
Jones, Elizabeth A.; And Others
This study used an iterative Delphi survey process of about 600 faculty, employers, and policymakers to identify writing, speech and listening, and critical thinking skills that college graduates should achieve to become effective employees and citizens (National Education Goal 6). Participants reached a consensus about the importance in critical…
Harley, Trevor A.
Environmentally contaminated speech errors (irrelevant words or phrases derived from the speaker's environment and erroneously incorporated into speech) are hypothesized to occur at a high level of speech processing, but with a relatively late insertion point. The data indicate that speech production processes are not independent of other…
Meuwese, Julia D I; Post, Ruben A G; Scholte, H Steven; Lamme, Victor A F
It has been proposed that visual attention and consciousness are separate [Koch, C., & Tsuchiya, N. Attention and consciousness: Two distinct brain processes. Trends in Cognitive Sciences, 11, 16-22, 2007] and possibly even orthogonal processes [Lamme, V. A. F. Why visual attention and awareness are different. Trends in Cognitive Sciences, 7, 12-18, 2003]. Attention and consciousness converge when conscious visual percepts are attended and hence become available for conscious report. In such a view, a lack of reportability can have two causes: the absence of attention or the absence of a conscious percept. This raises an important question in the field of perceptual learning. It is known that learning can occur in the absence of reportability [Gutnisky, D. A., Hansen, B. J., Iliescu, B. F., & Dragoi, V. Attention alters visual plasticity during exposure-based learning. Current Biology, 19, 555-560, 2009; Seitz, A. R., Kim, D., & Watanabe, T. Rewards evoke learning of unconsciously processed visual stimuli in adult humans. Neuron, 61, 700-707, 2009; Seitz, A. R., & Watanabe, T. Is subliminal learning really passive? Nature, 422, 36, 2003; Watanabe, T., Náñez, J. E., & Sasaki, Y. Perceptual learning without perception. Nature, 413, 844-848, 2001], but it is unclear which of the two ingredients-consciousness or attention-is not necessary for learning. We presented textured figure-ground stimuli and manipulated reportability either by masking (which only interferes with consciousness) or with an inattention paradigm (which only interferes with attention). During the second session (24 hr later), learning was assessed neurally and behaviorally, via differences in figure-ground ERPs and via a detection task. Behavioral and neural learning effects were found for stimuli presented in the inattention paradigm and not for masked stimuli. Interestingly, the behavioral learning effect only became apparent when performance feedback was given on the task to measure learning
Macdonald, James S P; Lavie, Nilli
In this article, we establish a new phenomenon of "inattentional deafness" and highlight the level of load on visual attention as a critical determinant of this phenomenon. In three experiments, we modified an inattentional blindness paradigm to assess inattentional deafness. Participants made either a low- or high-load visual discrimination concerning a cross shape (respectively, a discrimination of line color or of line length with a subtle length difference). A brief pure tone was presented simultaneously with the visual task display on a final trial. Failures to notice the presence of this tone (i.e., inattentional deafness) reached a rate of 79% in the high-visual-load condition, significantly more than in the low-load condition. These findings establish the phenomenon of inattentional deafness under visual load, thereby extending the load theory of attention (e.g., Lavie, Journal of Experimental Psychology. Human Perception and Performance, 25, 596-616, 1995) to address the cross-modal effects of visual perceptual load.
Choi, Ja Young; Hu, Elly R; Perrachione, Tyler K
The nondeterministic relationship between speech acoustics and abstract phonemic representations imposes a challenge for listeners to maintain perceptual constancy despite the highly variable acoustic realization of speech. Talker normalization facilitates speech processing by reducing the degrees of freedom for mapping between encountered speech and phonemic representations. While this process has been proposed to facilitate the perception of ambiguous speech sounds, it is currently unknown whether talker normalization is affected by the degree of potential ambiguity in acoustic-phonemic mapping. We explored the effects of talker normalization on speech processing in a series of speeded classification paradigms, parametrically manipulating the potential for inconsistent acoustic-phonemic relationships across talkers for both consonants and vowels. Listeners identified words with varying potential acoustic-phonemic ambiguity across talkers (e.g., beet/boat vs. boot/boat) spoken by single or mixed talkers. Auditory categorization of words was always slower when listening to mixed talkers compared to a single talker, even when there was no potential acoustic ambiguity between target sounds. Moreover, the processing cost imposed by mixed talkers was greatest when words had the most potential acoustic-phonemic overlap across talkers. Models of acoustic dissimilarity between target speech sounds did not account for the pattern of results. These results suggest (a) that talker normalization incurs the greatest processing cost when disambiguating highly confusable sounds and (b) that talker normalization appears to be an obligatory component of speech perception, taking place even when the acoustic-phonemic relationships across sounds are unambiguous.
Harris, Laurence R; Herpers, Rainer; Hofhammer, Thomas; Jenkin, Michael
Might the gravity levels found on other planets and on the moon be sufficient to provide an adequate perception of upright for astronauts? Can the amount of gravity required be predicted from the physiological threshold for linear acceleration? The perception of upright is determined not only by gravity but also visual information when available and assumptions about the orientation of the body. Here, we used a human centrifuge to simulate gravity levels from zero to earth gravity along the long-axis of the body and measured observers' perception of upright using the Oriented Character Recognition Test (OCHART) with and without visual cues arranged to indicate a direction of gravity that differed from the body's long axis. This procedure allowed us to assess the relative contribution of the added gravity in determining the perceptual upright. Control experiments off the centrifuge allowed us to measure the relative contributions of normal gravity, vision, and body orientation for each participant. We found that the influence of 1 g in determining the perceptual upright did not depend on whether the acceleration was created by lying on the centrifuge or by normal gravity. The 50% threshold for centrifuge-simulated gravity's ability to influence the perceptual upright was at around 0.15 g, close to the level of moon gravity but much higher than the threshold for detecting linear acceleration along the long axis of the body. This observation may partially explain the instability of moonwalkers but is good news for future missions to Mars.
Parks, Nathan A; Beck, Diane M; Kramer, Arthur F
The perceptual load theory of attention proposes that the degree to which visual distractors are processed is a function of the attentional demands of a task-greater demands increase filtering of irrelevant distractors. The spatial configuration of such filtering is unknown. Here, we used steady-state visual evoked potentials (SSVEPs) in conjunction with time-domain event-related potentials (ERPs) to investigate the distribution of load-induced distractor suppression and task-relevant enhancement in the visual field. Electroencephalogram (EEG) was recorded while subjects performed a foveal go/no-go task that varied in perceptual load. Load-dependent distractor suppression was assessed by presenting a contrast reversing ring at one of three eccentricities (2, 6, or 11°) during performance of the go/no-go task. Rings contrast reversed at 8.3 Hz, allowing load-dependent changes in distractor processing to be tracked in the frequency-domain. ERPs were calculated to the onset of stimuli in the load task to examine load-dependent modulation of task-relevant processing. Results showed that the amplitude of the distractor SSVEP (8.3 Hz) was attenuated under high perceptual load (relative to low load) at the most proximal (2°) eccentricity but not at more eccentric locations (6 or 11°). Task-relevant ERPs revealed a significant increase in N1 amplitude under high load. These results are consistent with a center-surround configuration of load-induced enhancement and suppression in the visual field.
Mostert, Pim; Kok, Peter; de Lange, Floris P
A key question within systems neuroscience is how the brain translates physical stimulation into a behavioral response: perceptual decision making. To answer this question, it is important to dissociate the neural activity underlying the encoding of sensory information from the activity underlying the subsequent temporal integration into a decision variable. Here, we adopted a decoding approach to empirically assess this dissociation in human magnetoencephalography recordings. We used a functional localizer to identify the neural signature that reflects sensory-specific processes, and subsequently traced this signature while subjects were engaged in a perceptual decision making task. Our results revealed a temporal dissociation in which sensory processing was limited to an early time window and consistent with occipital areas, whereas decision-related processing became increasingly pronounced over time, and involved parietal and frontal areas. We found that the sensory processing accurately reflected the physical stimulus, irrespective of the eventual decision. Moreover, the sensory representation was stable and maintained over time when it was required for a subsequent decision, but unstable and variable over time when it was task-irrelevant. In contrast, decision-related activity displayed long-lasting sustained components. Together, our approach dissects neuro-anatomically and functionally distinct contributions to perceptual decisions.
Kellman, Philip J
Recent advances in the learning sciences offer remarkable potential to improve medical education and maximize the benefits of emerging medical technologies. This article describes 2 major innovation areas in the learning sciences that apply to simulation and other aspects of medical learning: Perceptual learning (PL) and adaptive learning technologies. PL technology offers, for the first time, systematic, computer-based methods for teaching pattern recognition, structural intuition, transfer, and fluency. Synergistic with PL are new adaptive learning technologies that optimize learning for each individual, embed objective assessment, and implement mastery criteria. The author describes the Adaptive Response-Time-based Sequencing (ARTS) system, which uses each learner's accuracy and speed in interactive learning to guide spacing, sequencing, and mastery. In recent efforts, these new technologies have been applied in medical learning contexts, including adaptive learning modules for initial medical diagnosis and perceptual/adaptive learning modules (PALMs) in dermatology, histology, and radiology. Results of all these efforts indicate the remarkable potential of perceptual and adaptive learning technologies, individually and in combination, to improve learning in a variety of medical domains. Reprint & Copyright © 2013 Association of Military Surgeons of the U.S.