WorldWideScience

Sample records for intense intonation-based speech

  1. Intonational Division of a Speech Flow in the Kazakh Language

    Science.gov (United States)

    Bazarbayeva, Zeynep M.; Zhalalova, Akshay M.; Ormakhanova, Yenlik N.; Ospangaziyeva, Nazgul B.; Karbozova, Bulbul D.

    2016-01-01

    The purpose of this research is to analyze the speech intonation of the French, Kazakh, English and Russian languages. The study considers intonation component functions (of melodics, duration, and intensity) in poetry and language spoken. It is defined that a set of prosodic means are used in order to convey the intonational specifics of sounding…

  2. Phrase-Final Syllable Lengthening and Intonation in Early Child Speech.

    Science.gov (United States)

    Snow, David

    1994-01-01

    To test opposing theories about the relationship between intonation and syllable timing, these boundary features were compared in a longitudinal study of 9 children's speech development between the mean ages of 16 and 25 months. Results suggest that young children acquire the skills that control intonation earlier than they do skills of final…

  3. Intonational speech prosody encoding in the human auditory cortex.

    Science.gov (United States)

    Tang, C; Hamilton, L S; Chang, E F

    2017-08-25

    Speakers of all human languages regularly use intonational pitch to convey linguistic meaning, such as to emphasize a particular word. Listeners extract pitch movements from speech and evaluate the shape of intonation contours independent of each speaker's pitch range. We used high-density electrocorticography to record neural population activity directly from the brain surface while participants listened to sentences that varied in intonational pitch contour, phonetic content, and speaker. Cortical activity at single electrodes over the human superior temporal gyrus selectively represented intonation contours. These electrodes were intermixed with, yet functionally distinct from, sites that encoded different information about phonetic features or speaker identity. Furthermore, the representation of intonation contours directly reflected the encoding of speaker-normalized relative pitch but not absolute pitch. Copyright © 2017 The Authors, some rights reserved; exclusive licensee American Association for the Advancement of Science. No claim to original U.S. Government Works.

  4. Processing melodic contour and speech intonation in congenital amusics with Mandarin Chinese.

    Science.gov (United States)

    Jiang, Cunmei; Hamm, Jeff P; Lim, Vanessa K; Kirk, Ian J; Yang, Yufang

    2010-07-01

    Congenital amusia is a disorder in the perception and production of musical pitch. It has been suggested that early exposure to a tonal language may compensate for the pitch disorder (Peretz, 2008). If so, it is reasonable to expect that there would be different characterizations of pitch perception in music and speech in congenital amusics who speak a tonal language, such as Mandarin. In this study, a group of 11 adults with amusia whose first language was Mandarin were tested with melodic contour and speech intonation discrimination and identification tasks. The participants with amusia were impaired in discriminating and identifying melodic contour. These abnormalities were also detected in identifying both speech and non-linguistic analogue derived patterns for the Mandarin intonation tasks. In addition, there was an overall trend for the participants with amusia to show deficits with respect to controls in the intonation discrimination tasks for both speech and non-linguistic analogues. These findings suggest that the amusics' melodic pitch deficits may extend to the perception of speech, and could potentially result in some language deficits in those who speak a tonal language. Copyright (c) 2010 Elsevier Ltd. All rights reserved.

  5. L’unité intonative dans les textes oralisés // Intonation unit in read speech

    Directory of Open Access Journals (Sweden)

    Lea Tylečková

    2015-12-01

    Full Text Available Prosodic phrasing, i.e. division of speech into intonation units, represents a phenomenon which is central to language comprehension. Incorrect prosodic boundary markings may lead to serious misunderstandings and ambiguous interpretations of utterances. The present paper investigates prosodic competencies of Czech students of French in the domain of prosodic phrasing in French read speech. Two texts of different length are examined through a perceptual method to observe how Czech speakers of French (B1–B2 level of CEFR divide read speech into prosodic units compared to French native speakers.

  6. Vocal Performance and Speech Intonation: Bob Dylan’s “Like a Rolling Stone”

    Directory of Open Access Journals (Sweden)

    Michael Daley

    2007-03-01

    Full Text Available This article proposes a linguistic analysis of a recorded performance of a single verse of one of Dylan’s most popular songs—the originally released studio recording of “Like A Rolling Stone”—and describes more specifically the ways in which intonation relates to lyrics and performance. This analysis is used as source material for a close reading of the semantic, affective, and “playful” meanings of the performance, and is compared with some published accounts of the song’s reception. The author has drawn on the linguistic methodology formulated by Michael Halliday, who has found speech intonation (which includes pitch movement, timbre, syllabic rhythm, and loudness to be an integral part of English grammar and crucial to the transmission of certain kinds of meaning. Speech intonation is a deeply-rooted and powerfully meaningful aspect of human communication. This article argues that is plausible that a system so powerful in speech might have some bearing on the communication of meaning in sung performance.

  7. Characterizing Intonation Deficit in Motor Speech Disorders: An Autosegmental-Metrical Analysis of Spontaneous Speech in Hypokinetic Dysarthria, Ataxic Dysarthria, and Foreign Accent Syndrome

    Science.gov (United States)

    Lowit, Anja; Kuschmann, Anja

    2012-01-01

    Purpose: The autosegmental-metrical (AM) framework represents an established methodology for intonational analysis in unimpaired speaker populations but has found little application in describing intonation in motor speech disorders (MSDs). This study compared the intonation patterns of unimpaired participants (CON) and those with Parkinson's…

  8. Discourse intonation and second language acquisition: Three genre-based studies

    Science.gov (United States)

    Wennerstrom, Ann Kristin

    1997-12-01

    This dissertation investigates intonation in the discourse of nonnative speakers of English. It is proposed that intonation functions as a grammar of cohesion, contributing to the coherence of the text. Based on a componential model of intonation adapted from Pierrehumbert and Hirshberg (1990), three empirical studies were conducted in different genres of spoken discourse: academic lectures, conversations, and oral narratives. Using computerized speech technology, excerpts of taped discourse were measured to determine how intonation associated with various constituents of text. All speakers were tested for overall English level on tests adapted from the SPEAK Test (ETS, 1985). Comparisons using native speaker data were also conducted. The first study investigated intonation in lectures given by Chinese teaching assistants. Multivariate analyses showed that intonation was a significant factor contributing to better scores on an exam of overall comprehensibility in English. The second study investigated the role of intonation in the turn-taking system in conversations between native and nonnative speakers of English. The final study considered emotional aspects of intonation in narratives, using the framework of Labov and Waletsky (1967). In sum, adult nonnative speakers can acquire intonation as part of their overall language development, although there is evidence against any specific order of acquisition. Intonation contributes to coherence by indicating the relationship between the current utterance and what is assumed to already be in participants' mental representations of the discourse. It also performs a segmentation function, denoting hierarchical relationships among utterances and/or turns. It is suggested that while pitch can be a resource in cross-cultural communication to show emotion and attitude, the grammatical aspects of intonation must be acquired gradually.

  9. Unenthusiastic Europeans or Affected English: the Impact of Intonation on the Overall Make-up of Speech

    Directory of Open Access Journals (Sweden)

    Smiljana Komar

    2005-06-01

    Full Text Available Attitudes and emotions are expressed by linguistic as well as extra-linguistic features. The linguistic features comprise the lexis, the word-order and the intonation of the utterance. The purpose of this article is to examine the impact of intonation on our perception of speech. I will attempt to show that our expression, as well as our perception and understanding of attitudes and emotions are realized in accordance with the intonation patterns typical of the mother tongue. When listening to non-native speakers using our mother tongue we expect and tolerate errors in pronunciation, grammar and lexis but are quite ignorant and intolerant of non-native intonation patterns. Foreigners often sound unenthusiastic to native English ears. On the basis of the results obtained from an analysis of speech produced by 21 non-native speakers of English, including Slovenes, I will show that the reasons for such an impression of being unenthusiastic stem from different tonality and tonicity rules, as well as from the lack of the fall-rise tone and a very narrow pitch range with no or very few pitch jumps or slumps.

  10. Intonational Structures in Romanian Yes-No Questions

    Directory of Open Access Journals (Sweden)

    Vasile Apopei

    2006-05-01

    Full Text Available This paper presents the conclusions resulted from an intonational analysis of Romanian Yes-No questions. The recent analysis results consist in dividing and structuring the F0 curves into intonational units. Each intonational unit is described by a tone sequence using ToBI labels used in annotation of the most important phonetic events: pitch accents and boundary tones. The authors of the present study propose a description of the resulted patterns for F0 contour in terms of intonational units structures described by their tone sequence. We consider this description suitable for the variety of melodic contours resulted from different speakers and different focalizations in their utterances. In paragraph 3, the paper presents the intonational variants resulted from our speech corpus analysis. The conclusions of Yes-No question analysis are important for linguistic studies and in Romanian speech synthesis.

  11. Amusia results in abnormal brain activity following inappropriate intonation during speech comprehension.

    Science.gov (United States)

    Jiang, Cunmei; Hamm, Jeff P; Lim, Vanessa K; Kirk, Ian J; Chen, Xuhai; Yang, Yufang

    2012-01-01

    Pitch processing is a critical ability on which humans' tonal musical experience depends, and which is also of paramount importance for decoding prosody in speech. Congenital amusia refers to deficits in the ability to properly process musical pitch, and recent evidence has suggested that this musical pitch disorder may impact upon the processing of speech sounds. Here we present the first electrophysiological evidence demonstrating that individuals with amusia who speak Mandarin Chinese are impaired in classifying prosody as appropriate or inappropriate during a speech comprehension task. When presented with inappropriate prosody stimuli, control participants elicited a larger P600 and smaller N100 relative to the appropriate condition. In contrast, amusics did not show significant differences between the appropriate and inappropriate conditions in either the N100 or the P600 component. This provides further evidence that the pitch perception deficits associated with amusia may also affect intonation processing during speech comprehension in those who speak a tonal language such as Mandarin, and suggests music and language share some cognitive and neural resources.

  12. Amusia results in abnormal brain activity following inappropriate intonation during speech comprehension.

    Directory of Open Access Journals (Sweden)

    Cunmei Jiang

    Full Text Available Pitch processing is a critical ability on which humans' tonal musical experience depends, and which is also of paramount importance for decoding prosody in speech. Congenital amusia refers to deficits in the ability to properly process musical pitch, and recent evidence has suggested that this musical pitch disorder may impact upon the processing of speech sounds. Here we present the first electrophysiological evidence demonstrating that individuals with amusia who speak Mandarin Chinese are impaired in classifying prosody as appropriate or inappropriate during a speech comprehension task. When presented with inappropriate prosody stimuli, control participants elicited a larger P600 and smaller N100 relative to the appropriate condition. In contrast, amusics did not show significant differences between the appropriate and inappropriate conditions in either the N100 or the P600 component. This provides further evidence that the pitch perception deficits associated with amusia may also affect intonation processing during speech comprehension in those who speak a tonal language such as Mandarin, and suggests music and language share some cognitive and neural resources.

  13. Intonation Processing in Congenital Amusia: Discrimination, Identification and Imitation

    Science.gov (United States)

    Liu, Fang; Patel, Aniruddh D.; Fourcin, Adrian; Stewart, Lauren

    2010-01-01

    This study investigated whether congenital amusia, a neuro-developmental disorder of musical perception, also has implications for speech intonation processing. In total, 16 British amusics and 16 matched controls completed five intonation perception tasks and two pitch threshold tasks. Compared with controls, amusics showed impaired performance…

  14. Melodic Intonation Therapy: Back to Basics for Future Research

    Directory of Open Access Journals (Sweden)

    Anna eZumbansen

    2014-01-01

    Full Text Available We present a critical review of the literature on Melodic intonation therapy (MIT, one of the most formalized treatments used by speech-language therapist in Broca’s aphasia. We suggest basic clarifications to enhance the scientific support of this promising treatment. First, MIT is a program, not a single speech facilitation technique. The goal of MIT is to restore propositional speech. The rational is that patients can learn a new way to speak through singing by using language-capable regions of the right cerebral hemisphere. We argue that many treatment programs covered in systematic reviews on MIT’s efficacy do not match MIT’s therapeutic goal and rationale. Second, we distinguish between the immediate effect of MIT’s main speech facilitation technique (i.e., intoned-speech and the effect of the entire program on language recovery. Many results in the MIT literature can be explained by this duration factor. Finally, we propose that MIT can be viewed as a treatment of apraxia of speech more than aphasia. This issue should be explored in future experimental studies.

  15. At the edge of intonation

    DEFF Research Database (Denmark)

    Niebuhr, Oliver

    2012-01-01

    The paper is concerned with the 'edge of intonation' in a twofold sense. It focuses on utterance-final F0 movements and crosses the traditional segment-prosody divide by investigating the interplay of F0 and voiceless fricatives in speech production. An experiment was performed for German with four...

  16. Intonation processing in congenital amusia: discrimination, identification and imitation.

    Science.gov (United States)

    Liu, Fang; Patel, Aniruddh D; Fourcin, Adrian; Stewart, Lauren

    2010-06-01

    This study investigated whether congenital amusia, a neuro-developmental disorder of musical perception, also has implications for speech intonation processing. In total, 16 British amusics and 16 matched controls completed five intonation perception tasks and two pitch threshold tasks. Compared with controls, amusics showed impaired performance on discrimination, identification and imitation of statements and questions that were characterized primarily by pitch direction differences in the final word. This intonation-processing deficit in amusia was largely associated with a psychophysical pitch direction discrimination deficit. These findings suggest that amusia impacts upon one's language abilities in subtle ways, and support previous evidence that pitch processing in language and music involves shared mechanisms.

  17. Listening instead of reading : The influence of voice intonation in auditory health persuasion aimed at increasing fruit and vegetable intake

    NARCIS (Netherlands)

    Elbert, Sarah; Dijkstra, Arie

    2013-01-01

    Purpose. In auditory health persuasion, the speaker’s speech becomes salient, as there is no visual information available. Intonation of speech is one important aspect that may influence persuasion. It was experimentally tested to what extent different levels of intonation are related to persuasion.

  18. YOUNG LEARNERS’ RHYTHMIC AND INTONATION SKILLS THROUGH DRAMA

    Directory of Open Access Journals (Sweden)

    Olena Beskorsa

    2016-11-01

    Full Text Available The article is devoted to the problem of implementing drama techniques into the process of developing young learners’ rhythmic and intonation skills. The main task of learning the foreign language is using it as a mean of pupils’ communication in oral and written forms. The author proves that drama techniques integrate successfully all types of speech activities. It is specified that this method transfers the focus from teaching grammatically correct speech to training clear and effective communication. The author emphasizes on that sentence stress and speed of speech has the greatest influence on the rhythm. The application of these drama techniques are thought to increase primary school pupils’ level of motivation to master the language skills perfectly, it provides a positive psychological climate in English classes. The teachers’ role has a tendency to minimizing. They act as facilitators. In author’s opinion if they do impose the authority implementing drama activities into the classroom, the educational value of drama techniques will be never gained. It is also disclosed that rhythmic and intonation skills shouldn’t be formed spontaneously, the process of their development has to be conducted in certain stages (presentation and production to make pupils’ speech fluent and pronunciation clear, introducing the exercises based on drama techniques. At the stage of presentation the following exercises have the most methodological value: speed dictations, dictogloss, asking questions to practise recognizing word boundaries, matching phrases to stress patterns, marking stresses and weak forms, authentic listening. At production stage they suggest using exercises like play reading and play production. The following pieces of drama texts are recommended to be applied for teaching primary school children: jazz chants, poems, scripted plays and simple scenes from different movie genres. It is also proved that drama techniques and

  19. The Effects of Modified Melodic Intonation Therapy on Nonfluent Aphasia: A Pilot Study

    Science.gov (United States)

    Conklyn, Dwyer; Novak, Eric; Boissy, Adrienne; Bethoux, Francois; Chemali, Kamal

    2012-01-01

    Objective: Positive results have been reported with melodic intonation therapy (MIT) in nonfluent aphasia patients with damage to their left-brain speech processes, using the patient's intact ability to sing to promote functional language. This pilot study sought to determine the immediate effects of introducing modified melodic intonation therapy…

  20. Intonation Contrast in Cantonese Speakers with Hypokinetic Dysarthria Associated with Parkinson's Disease

    Science.gov (United States)

    Ma, Joan K.-Y.; Whitehill, Tara L.; So, Susanne Y.-S.

    2010-01-01

    Purpose: Speech produced by individuals with hypokinetic dysarthria associated with Parkinson's disease (PD) is characterized by a number of features including impaired speech prosody. The purpose of this study was to investigate intonation contrasts produced by this group of speakers. Method: Speech materials with a question-statement contrast…

  1. Intonation processing deficits of emotional words among Mandarin Chinese speakers with congenital amusia: an ERP study.

    Science.gov (United States)

    Lu, Xuejing; Ho, Hao Tam; Liu, Fang; Wu, Daxing; Thompson, William F

    2015-01-01

    Congenital amusia is a disorder that is known to affect the processing of musical pitch. Although individuals with amusia rarely show language deficits in daily life, a number of findings point to possible impairments in speech prosody that amusic individuals may compensate for by drawing on linguistic information. Using EEG, we investigated (1) whether the processing of speech prosody is impaired in amusia and (2) whether emotional linguistic information can compensate for this impairment. Twenty Chinese amusics and 22 matched controls were presented pairs of emotional words spoken with either statement or question intonation while their EEG was recorded. Their task was to judge whether the intonations were the same. Amusics exhibited impaired performance on the intonation-matching task for emotional linguistic information, as their performance was significantly worse than that of controls. EEG results showed a reduced N2 response to incongruent intonation pairs in amusics compared with controls, which likely reflects impaired conflict processing in amusia. However, our EEG results also indicated that amusics were intact in early sensory auditory processing, as revealed by a comparable N1 modulation in both groups. We propose that the impairment in discriminating speech intonation observed among amusic individuals may arise from an inability to access information extracted at early processing stages. This, in turn, could reflect a disconnection between low-level and high-level processing.

  2. Model-based prosodic analysis of charismatic speech

    DEFF Research Database (Denmark)

    Mixdorff, Hansjörg; Niebuhr, Oliver; Hönemann, Angelika

    2018-01-01

    This study examines at a new level of quantitative detail the intonation and timing properties of charismatic speech by comparing two popular CEOs, Steve Jobs and Mark Zuckerberg, who are known from informal observations and formal perception experiments alike to be more or less charismatic...

  3. The Intonation-Syntax Interface in the Speech of Individuals with Parkinson's Disease

    Science.gov (United States)

    MacPherson, Megan K.; Huber, Jessica E.; Snow, David P.

    2011-01-01

    Purpose: This study examined the effect of Parkinson's disease (PD) on the intonational marking of final and nonfinal syntactic boundaries and investigated whether the effect of PD on intonation was sex specific. Method: Eight women and 8 men with PD and 16 age- and sex-matched control participants read a passage at comfortable pitch, rate, and…

  4. Wh-question intonation in Peninsular Spanish: Multiple contours and the effect of task type

    Directory of Open Access Journals (Sweden)

    Nicholas C. Henriksen

    2009-06-01

    Full Text Available This paper reports on an experimental investigation of wh-question intonation in Peninsular Spanish. Speech data were collected from six León, Spain Peninsular Spanish speakers, and oral production data were elicited under two conditions: a computerized sentence reading task and an information gap task-oriented dialogue. The latter task was an adaptation of the HCRC Map Task method (cf. Anderson et al., 1991 and was designed to elicit multiple wh-question productions in an unscripted and more spontaneous speech style than the standard sentence reading task. Results indicate that four contours exist in the tonal inventory of the six speakers. The two most frequent contours were a final rise contour and a nuclear circumflex contour. Systematic task-based differences were found for four of the six speakers, indicating that sentence reading task data alone may not accurately reflect spontaneous speech tonal patterns (cf. Cruttenden, 2007; but see also Lickley, Schepman, & Ladd, 2005. The experimental findings serve to clarify a number of assumptions about the syntax-prosody interface underlying wh-question utterance signaling; they also have implications for research methods in intonation and task-based variation in laboratory phonology.

  5. An experimental test of the relationship between voice intonation and persuasion in the domain of health

    NARCIS (Netherlands)

    Elbert, Sarah P.; Dijkstra, Arie

    2014-01-01

    Objective: In the process of behaviour change, intonation of speech is an important aspect that may influence persuasion when auditory messages are used. In two experiments, we tested to what extent different levels of intonation are related to persuasion and whether for some recipients the threat

  6. Intonation Processing Deficits of Emotional Words among Mandarin Chinese Speakers with Congenital Amusia: An ERP Study

    Directory of Open Access Journals (Sweden)

    XUEJING eLU

    2015-04-01

    Full Text Available Background: Amusia is a disorder that is known to affect the processing of musical pitch. Although individuals with amusia rarely show language deficits in daily life, a number of findings point to possible impairments in speech prosody that amusic perceivers may compensate for by drawing on linguistic information. Using EEG, we investigated (1 whether the processing of speech prosody is impaired in amusia and (2 whether emotional linguistic information can ease this process. Method: Twenty Chinese amusics and 22 matched controls were presented pairs of emotional words spoken with either statement or question intonation while their EEG was recorded. Their task was to judge whether the intonations were the same. Results: Emotional linguistic information did not facilitate amusics’ performance on the intonation-matching task, as their performance was significantly worse than that of controls. EEG results showed a reduced N2 response to incongruent intonation pairs in amusics compared with controls, which likely reflects impaired conflict processing in amusia. However, at an earlier processing stage, our EEG results indicate that amusics were intact in early sensory auditory processing, as revealed by a comparable N1 modulation in both groups. Conclusion: We propose that the impairment in discriminating speech intonation observed among amusic individuals may arise from an inability to access information extracted at early processing stages. This, in turn, could reflect a disconnection between low-level and high-level processing.

  7. An experimental test of the relationship between voice intonation and persuasion in the domain of health.

    Science.gov (United States)

    Elbert, Sarah P; Dijkstra, Arie

    2014-01-01

    In the process of behaviour change, intonation of speech is an important aspect that may influence persuasion when auditory messages are used. In two experiments, we tested to what extent different levels of intonation are related to persuasion and whether for some recipients the threat posed by the message information might become too strong to face. In Study 1, 130 respondents listened to a health message with either a low, moderate or high level of intonation. In Study 2 (N = 143), the same manipulations of intonation were applied but half of the respondents were affirmed before they listened to the persuasive message. Intention to increase fruit and vegetable intake was used as a dependent variable. Both studies showed that high intonation led to a significant drop in intention among respondents who perceived their own health as good. After self-affirmation, persuasion was increased. A high level of intonation seems to induce self-regulatory defences in people who do not see the necessity to change their health behaviour, whereas people with poor perceived health might perceive potential to change. The use of a normal level of intonation in auditory health messages is recommended.

  8. Detecting Abnormal Word Utterances in Children With Autism Spectrum Disorders: Machine-Learning-Based Voice Analysis Versus Speech Therapists.

    Science.gov (United States)

    Nakai, Yasushi; Takiguchi, Tetsuya; Matsui, Gakuyo; Yamaoka, Noriko; Takada, Satoshi

    2017-10-01

    Abnormal prosody is often evident in the voice intonations of individuals with autism spectrum disorders. We compared a machine-learning-based voice analysis with human hearing judgments made by 10 speech therapists for classifying children with autism spectrum disorders ( n = 30) and typical development ( n = 51). Using stimuli limited to single-word utterances, machine-learning-based voice analysis was superior to speech therapist judgments. There was a significantly higher true-positive than false-negative rate for machine-learning-based voice analysis but not for speech therapists. Results are discussed in terms of some artificiality of clinician judgments based on single-word utterances, and the objectivity machine-learning-based voice analysis adds to judging abnormal prosody.

  9. On the intonation of German intonation questions

    DEFF Research Database (Denmark)

    Petrone, Caterina; Niebuhr, Oliver

    2014-01-01

    German questions and statements are distinguished not only by lexical and syntactic but also by intonational means. This study revisits, for Northern Standard German, how questions are signalled intonationally in utterances that have neither lexical nor syntactic cues. Starting from natural......, but represents a separate attitudinal meaning dimension. Moreover, the findings support that both prenuclear and nuclear fundamental frequency (F0) patterns must be taken into account in the analysis of tune meaning....

  10. Typologizing native language influence on intonation in a second language: Three transfer phenomena in Japanese EFL learners

    Science.gov (United States)

    Albin, Aaron Lee

    While a substantial body of research has accumulated regarding how intonation is acquired in a second language (L2), the topic has historically received relatively little attention from mainstream models of L2 phonology. As such, a unified theoretical framework suited to address unique acquisitional challenges specific to this domain of L2 knowledge (such as form-function mapping) has been lacking. The theoretical component of the dissertation makes progress on this front by taking up the issue of crosslinguistic transfer in L2 intonation. Using Mennen's (2015) L2 Intonation Learning theory as a point of departure, the available empirical studies are synthesized into a typology of the different possible ways two languages' intonation systems can mismatch as well as the concomitant implications for transfer. Next, the methodological component of the dissertation presents a framework for overcoming challenges in the analysis of L2 learners' intonation production due to the interlanguage mixing of their native and L2 systems. The proposed method involves first creating a stylization of the learner's intonation contour and then running queries to extract phonologically-relevant features of interest for a particular research question. A novel approach to stylization is also introduced that not only allows for transitions between adjacent pitch targets to have a nonlinear shape but also explicitly parametrizes and stores this nonlinearity for analysis. Finally, these two strands are integrated in a third, empirical component to the dissertation. Three kinds of intonation transfer, representing nodes from different branches of the typology, are examined in Japanese learners of English as a Foreign Language (EFL). For each kind of transfer, fourteen sentences were selected from a large L2 speech corpus (English Speech Database Read by Japanese Students), and productions of each sentence by approximately 20-30 learners were analyzed using the proposed method. Results

  11. SOME EXAMPLES OF APPLIED SYSTEMS WITH SPEECH INTERFACE

    Directory of Open Access Journals (Sweden)

    V. A. Zhitko

    2017-01-01

    Full Text Available Three examples of applied systems with a speech interface are considered in the article. The first two of these provide the end user with the opportunity to ask verbally the question and to hear the response from the system, creating an addition to the traditional I / O via the keyboard and computer screen. The third example, the «IntonTrainer» system, provides the user with the possibility of voice interaction and is designed for in-depth self-learning of the intonation of oral speech.

  12. A Technique for Estimating Intensity of Emotional Expressions and Speaking Styles in Speech Based on Multiple-Regression HSMM

    Science.gov (United States)

    Nose, Takashi; Kobayashi, Takao

    In this paper, we propose a technique for estimating the degree or intensity of emotional expressions and speaking styles appearing in speech. The key idea is based on a style control technique for speech synthesis using a multiple regression hidden semi-Markov model (MRHSMM), and the proposed technique can be viewed as the inverse of the style control. In the proposed technique, the acoustic features of spectrum, power, fundamental frequency, and duration are simultaneously modeled using the MRHSMM. We derive an algorithm for estimating explanatory variables of the MRHSMM, each of which represents the degree or intensity of emotional expressions and speaking styles appearing in acoustic features of speech, based on a maximum likelihood criterion. We show experimental results to demonstrate the ability of the proposed technique using two types of speech data, simulated emotional speech and spontaneous speech with different speaking styles. It is found that the estimated values have correlation with human perception.

  13. Auditory Perception, Suprasegmental Speech Processing, and Vocabulary Development in Chinese Preschoolers.

    Science.gov (United States)

    Wang, Hsiao-Lan S; Chen, I-Chen; Chiang, Chun-Han; Lai, Ying-Hui; Tsao, Yu

    2016-10-01

    The current study examined the associations between basic auditory perception, speech prosodic processing, and vocabulary development in Chinese kindergartners, specifically, whether early basic auditory perception may be related to linguistic prosodic processing in Chinese Mandarin vocabulary acquisition. A series of language, auditory, and linguistic prosodic tests were given to 100 preschool children who had not yet learned how to read Chinese characters. The results suggested that lexical tone sensitivity and intonation production were significantly correlated with children's general vocabulary abilities. In particular, tone awareness was associated with comprehensive language development, whereas intonation production was associated with both comprehensive and expressive language development. Regression analyses revealed that tone sensitivity accounted for 36% of the unique variance in vocabulary development, whereas intonation production accounted for 6% of the variance in vocabulary development. Moreover, auditory frequency discrimination was significantly correlated with lexical tone sensitivity, syllable duration discrimination, and intonation production in Mandarin Chinese. Also it provided significant contributions to tone sensitivity and intonation production. Auditory frequency discrimination may indirectly affect early vocabulary development through Chinese speech prosody. © The Author(s) 2016.

  14. THE INTONATIONAL ORGANIZATION IN THE VIOLIN CONCERTO MOMENTS

    Directory of Open Access Journals (Sweden)

    CIOBANU GHENADIE

    2017-06-01

    Full Text Available The article examines the intonational organization of the Concerto for Violin and Symphony Orchestra „Moments”. A perspective of the analysis of the intonational system of the Concerto refers to highlighting the traditional narrative types, such as the meditative monolog, dialogue, motion etc. that are closely related to the concept and imagistic world of the work. Likewise, the personification of the soloist instrument in an instrumental concerto is determined mostly by the intonation system used by the composer in his work. The concretization of the intonations, their combination and integration into an actual system of meanings forms the musical work issues. At the same time, the intonation system reflects the ideas, images and sound collisions of the work. The „new tone” is an objectivity and constitutes the expression of a new melodic-harmonic organizational order of the sound material. The intonation of a musical work consists of the combination of specific properties generated from compositional practices. In the contemporary history of music there is a variety of individual systems of organizing the soundpitch. These specific intonation systems, however, are based on some common principles of organization. The personification of solo instrument in terms of intonation becomes an important task for a composer in the genre of concerto. A concern for the composers in the concertos for violin and orchestra refers to the methods influencing the violin intonation sonority in terms of the orchestral context in which the solo instrument is placed.

  15. Treatment Intensity and Childhood Apraxia of Speech

    Science.gov (United States)

    Namasivayam, Aravind K.; Pukonen, Margit; Goshulak, Debra; Hard, Jennifer; Rudzicz, Frank; Rietveld, Toni; Maassen, Ben; Kroll, Robert; van Lieshout, Pascal

    2015-01-01

    Background: Intensive treatment has been repeatedly recommended for the treatment of speech deficits in childhood apraxia of speech (CAS). However, differences in treatment outcomes as a function of treatment intensity have not been systematically studied in this population. Aim: To investigate the effects of treatment intensity on outcome…

  16. Influences of Infant-Directed Speech on Early Word Recognition

    Science.gov (United States)

    Singh, Leher; Nestor, Sarah; Parikh, Chandni; Yull, Ashley

    2009-01-01

    When addressing infants, many adults adopt a particular type of speech, known as infant-directed speech (IDS). IDS is characterized by exaggerated intonation, as well as reduced speech rate, shorter utterance duration, and grammatical simplification. It is commonly asserted that IDS serves in part to facilitate language learning. Although…

  17. Treatment intensity and childhood apraxia of speech

    NARCIS (Netherlands)

    Namasivayam, Aravind K.; Pukonen, Margit; Goshulak, Debra; Hard, Jennifer; Rudzicz, Frank; Rietveld, Toni; Maassen, Ben; Kroll, Robert; van Lieshout, Pascal

    BackgroundIntensive treatment has been repeatedly recommended for the treatment of speech deficits in childhood apraxia of speech (CAS). However, differences in treatment outcomes as a function of treatment intensity have not been systematically studied in this population. AimTo investigate the

  18. Do Korean L2 learners have a "foreign accent" when they speak French? Production and perception experiments on rhythm and intonation

    OpenAIRE

    Grandon , Bénédicte; Yoo , Hiyon

    2014-01-01

    International audience; French and Korean are two languages with similar prosodic characteristics as far as rhythm and intonation are concerned. In this paper, we present the results of production and perception tests where we describe the prosodic characteristics of Korean L2 learners of French. Our aim is to analyze the impression of "foreign accent" for two prosodic components (intonation and rhythm) of speech produced by Korean L2 learners of French and the perception of this "accent" by ...

  19. Emotional prosody of task-irrelevant speech interferes with the retention of serial order.

    Science.gov (United States)

    Kattner, Florian; Ellermeier, Wolfgang

    2018-04-09

    Task-irrelevant speech and other temporally changing sounds are known to interfere with the short-term memorization of ordered verbal materials, as compared to silence or stationary sounds. It has been argued that this disruption of short-term memory (STM) may be due to (a) interference of automatically encoded acoustical fluctuations with the process of serial rehearsal or (b) attentional capture by salient task-irrelevant information. To disentangle the contributions of these 2 processes, the authors investigated whether the disruption of serial recall is due to the semantic or acoustical properties of task-irrelevant speech (Experiment 1). They found that performance was affected by the prosody (emotional intonation), but not by the semantics (word meaning), of irrelevant speech, suggesting that the disruption of serial recall is due to interference of precategorically encoded changing-state sound (with higher fluctuation strength of emotionally intonated speech). The authors further demonstrated a functional distinction between this form of distraction and attentional capture by contrasting the effect of (a) speech prosody and (b) sudden prosody deviations on both serial and nonserial STM tasks (Experiment 2). Although serial recall was again sensitive to the emotional prosody of irrelevant speech, performance on a nonserial missing-item task was unaffected by the presence of neutral or emotionally intonated speech sounds. In contrast, sudden prosody changes tended to impair performance on both tasks, suggesting an independent effect of attentional capture. (PsycINFO Database Record (c) 2018 APA, all rights reserved).

  20. L2 English Intonation: Relations between Form-Meaning Associations, Access to Meaning, and L1 Transfer

    Science.gov (United States)

    Ortega-Llebaria, Marta; Colantoni, Laura

    2014-01-01

    Although there is consistent evidence that higher levels of processing, such as learning the form-meaning associations specific to the second language (L2), are a source of difficulty in acquiring L2 speech, no study has addressed how these levels interact in shaping L2 perception and production of intonation. We examine the hypothesis of whether…

  1. Empirical Analysis of Intonation Activities in EFL Student’s Books

    Directory of Open Access Journals (Sweden)

    Dušan Nikolić

    2018-05-01

    Full Text Available Intonation instruction has repeatedly proved a challenge for EFL teachers, who avoid getting involved in intonation teaching more than their EFL textbooks demand from them. Since a great number of teachers rely on EFL textbooks when implementing intonation practice, the intonation activities in EFL materials are often central to their classroom. Even though the research on intonation instruction has been well-documented, few papers have explored intonation activities in EFL materials. The present study thus provides an empirical analysis of intonation activities in five EFL student’s books series by exploring the overall coverage of intonation activities across the series and the quality of these activities. The results reveal that intonation activities are underrepresented in the EFL student’s books, and that discourse intonation deserves more attention in the activities. Considerations for EFL teachers and publishers are also discussed.

  2. Automatic discrimination between laughter and speech

    NARCIS (Netherlands)

    Truong, K.; Leeuwen, D. van

    2007-01-01

    Emotions can be recognized by audible paralinguistic cues in speech. By detecting these paralinguistic cues that can consist of laughter, a trembling voice, coughs, changes in the intonation contour etc., information about the speaker’s state and emotion can be revealed. This paper describes the

  3. Discussing about Functions of English Intonation

    Institute of Scientific and Technical Information of China (English)

    黎辉

    2009-01-01

    Intonation - the rise and fall of pitch in our voices - plays a crucial role in how we express meaning. Many people think that pronunciation is what makes up an accent. It may be that pronunciation is very important for an understandable accent. But it is intonation that gives the final touch that makes an accent native. Intonation is the "music" of a language, and is perhaps the most important element of a good accent. Often we hear someone speaking with perfect grammar, and perfect formation of the sounds of English but with a little something that gives them away as not being a native speaker. It looks in particular at three key functions of intonation - to express our attitude, to structure our messages to one another, and to focus attention on particular parts of what we are saying.

  4. Word Prosody and Intonation of Sgaw Karen

    Science.gov (United States)

    West, Luke Alexander

    The prosodic, and specifically intonation, systems of Tibeto-Burman languages have received less attention in research than those of other families. This study investigates the word prosody and intonation of Sgaw Karen, a tonal Tibeto-Burman language of eastern Burma, and finds similarities to both closely related Tibeto-Burman languages and the more distant Sinitic languages like Mandarin. Sentences of varying lengths with controlled tonal environments were elicited from a total of 12 participants (5 male). In terms of word prosody, Sgaw Karen does not exhibit word stress cues, but does maintain a prosodic distinction between the more prominent major syllable and the phonologically reduced minor syllable. In terms of intonation, Sgaw Karen patterns like related Pwo Karen in its limited use of post-lexical tone, which is only present at Intonation Phrase (IP) boundaries. Unlike the intonation systems of Pwo Karen and Mandarin, however, Sgaw Karen exhibits downstep across its Accentual Phrases (AP), similarly to phenomena identified in Tibetan and Burmese.

  5. Music and Speech Perception in Children Using Sung Speech.

    Science.gov (United States)

    Nie, Yingjiu; Galvin, John J; Morikawa, Michael; André, Victoria; Wheeler, Harley; Fu, Qian-Jie

    2018-01-01

    This study examined music and speech perception in normal-hearing children with some or no musical training. Thirty children (mean age = 11.3 years), 15 with and 15 without formal music training participated in the study. Music perception was measured using a melodic contour identification (MCI) task; stimuli were a piano sample or sung speech with a fixed timbre (same word for each note) or a mixed timbre (different words for each note). Speech perception was measured in quiet and in steady noise using a matrix-styled sentence recognition task; stimuli were naturally intonated speech or sung speech with a fixed pitch (same note for each word) or a mixed pitch (different notes for each word). Significant musician advantages were observed for MCI and speech in noise but not for speech in quiet. MCI performance was significantly poorer with the mixed timbre stimuli. Speech performance in noise was significantly poorer with the fixed or mixed pitch stimuli than with spoken speech. Across all subjects, age at testing and MCI performance were significantly correlated with speech performance in noise. MCI and speech performance in quiet was significantly poorer for children than for adults from a related study using the same stimuli and tasks; speech performance in noise was significantly poorer for young than for older children. Long-term music training appeared to benefit melodic pitch perception and speech understanding in noise in these pediatric listeners.

  6. Production of Intonation Patterns of Non-English Major Student Teachers

    Directory of Open Access Journals (Sweden)

    Emily Luib-Beltran

    2015-12-01

    Full Text Available The study described the production of intonation pattern of the Non-English major student teachers during their on-campus teaching. The qualitative research method was used to analyze the data and describe their intonation patterns. The utterances were investigated in distinction between falling and rising intonation of wh-questions and yes/no questions. In the conduct of analysis, an interview guide was used to gather data on the language profile of the student teachers. Data confirm that the student teachers’ mother tongue (Filipino was commonly used in most of their verbal exchanges activities. It is worthy to note that the utterances of the student teachers displayed evidence of intonation patterns variation on wh-questions and yes/no questions. The erratic production of intonation patterns of the student teachers was resulted from the common linguistic phenomenon in which they tended to carry the intonation and pronunciation rules from their mother tongue (Filipino into their English spoken discourse. This qualitative research study implies that there is an interference of Filipino language in the production of the student teachers’ intonation patterns which describes the Philippine English intonation pattern for wh- questions and yes/no questions. Forthcoming studies may obtain more valued insights by gathering geographically varied samples that would include student teachers across disciplines.

  7. Intonation and Attitude in Nigerian English | Akinjobi | Lwati: A ...

    African Journals Online (AJOL)

    Intonation is an important phenomenon in language believed to have strong effect on communication. It is often said in reference to the primacy of intonation meaning over lexical content that "It's not what you said, it's how you said it!”. To this effect, a number of scholars have argued that intonation conveys different attitudes ...

  8. Coding of intonational meanings beyond F0

    DEFF Research Database (Denmark)

    Niebuhr, Oliver

    2008-01-01

    to H+L*L-% and L*+HL-% within the autosegmental metrical (AM) model. Aspirations in early-peak contexts were characterized by (a) "short", (b) "high-intensity" noise with (c) "low" frequency values for the spectral energy maximum above the lower spectral energy boundary. The opposite holds...... for aspirations accompanying late-peak productions. Starting from the acoustic analysis, a perception experiment was performed using a variant of the semantic differential paradigm. The stimuli were varied in the duration and intensity pattern as well as the spectral energy pattern of the final /t/ aspiration......, so far solely associated with intonation contours. Hence, the traditionally separated segmental and suprasegmental coding levels seem to be more intertwined than previously thought....

  9. Interaction and notation of tone, stress and intonation in Ika Igbo ...

    African Journals Online (AJOL)

    This prominence has been proved by both acoustic and perceptual analyses. Hence in Ika, every intonation group (or tonal intonation group) has a nucleus (a stressed syllable) and it is this nucleus that bears the intonation. This is how stress interacts with intonation in Ika. Tone, on its part, is also affected by this interaction ...

  10. Vowel Generation for Children with Cerebral Palsy using Myocontrol of a Speech Synthesizer

    Directory of Open Access Journals (Sweden)

    Chuanxin M Niu

    2015-01-01

    Full Text Available For children with severe cerebral palsy (CP, social and emotional interactions can be significantly limited due to impaired speech motor function. However, if it is possible to extract continuous voluntary control signals from the electromyograph (EMG of limb muscles, then EMG may be used to drive the synthesis of intelligible speech with controllable speed, intonation and articulation. We report an important first step: the feasibility of controlling a vowel synthesizer using non-speech muscles. A classic formant-based speech synthesizer is adapted to allow the lowest two formants to be controlled by surface EMG from skeletal muscles. EMG signals are filtered using a non-linear Bayesian filtering algorithm that provides the high bandwidth and accuracy required for speech tasks. The frequencies of the first two formants determine points in a 2D plane, and vowels are targets on this plane. We focus on testing the overall feasibility of producing intelligible English vowels with myocontrol using two straightforward EMG-formant mappings. More mappings can be tested in the future to optimize the intelligibility. Vowel generation was tested on 10 healthy adults and 4 patients with dyskinetic CP. Five English vowels were generated by subjects in pseudo-random order, after only 10 minutes of device familiarization. The fraction of vowels correctly identified by 4 naive listeners exceeded 80% for the vowels generated by healthy adults and 57% for vowels generated by patients with CP. Our goal is a continuous virtual voice with personalized intonation and articulation that will restore not only the intellectual content but also the social and emotional content of speech for children and adults with severe movement disorders.

  11. Foundations of Intonational Meaning: Anatomical and Physiological Factors.

    Science.gov (United States)

    Gussenhoven, Carlos

    2016-04-01

    Like non-verbal communication, paralinguistic communication is rooted in anatomical and physiological factors. Paralinguistic form-meaning relations arise from the way these affect speech production, with some fine-tuning by the cultural and linguistic context. The effects have been classified as "biological codes," following the terminological lead of John Ohala's Frequency Code. Intonational morphemes, though arguably non-arbitrary in principle, are in fact heavily biased toward these paralinguistic meanings. Paralinguistic and linguistic meanings for four biological codes are illustrated. In addition to the Frequency Code, the Effort Code, and the Respiratory Code, the Sirenic Code is introduced here, which is based on the use of whispery phonation, widely seen as being responsible for the signaling and perception of feminine attractiveness and sometimes used to express interrogativity in language. In the context of the evolution of language, the relations between physiological conditions and the resulting paralinguistic and linguistic meanings will need to be clarified. Copyright © 2016 The Authors. Topics in Cognitive Science published by Wiley Periodicals, Inc. on behalf of Cognitive Science Society.

  12. Unruly intonation

    Directory of Open Access Journals (Sweden)

    Michael George Ashby

    2017-12-01

    Full Text Available In this paper we call into question the value of ‘rules’ concerning intonation to the learner of English. Are there predictive rules of sufficient generality and power to make them worth learning explicitly, or would learners’ time be better spent on habit-forming drills of common patterns? Examining a typical test passage for advanced students, we show that in all three systems of tonality, tonicity and tone, known ‘rules’ account only for a proportion of the ‘right’ or expected answers. There are plentiful instances where competent native speakers agree over the selection of a pattern, though no rule seems to guide their choice. We recommend that the utility of ‘rules’ should be evaluated in relation to the frequency of occurrence of the structures to which they apply, in the relevant types of discourse; that more attention be given to idiomatic expressions, and the prosodic patterns associated with particular lexical items; and that learners should be equipped with simple practical heuristics (e.g. for using punctuation as a guide to intonation when reading aloud.

  13. Phonetic and phonological imitation of intonation in two varieties of Italian

    Directory of Open Access Journals (Sweden)

    Mariapaola eD'Imperio

    2014-11-01

    Full Text Available The aim of this study was to test whether both phonetic and phonological representations of intonation can be rapidly modified when imitating utterances belonging to a different regional variety of the same language. Our main hypothesis was that tonal alignment, just as other phonetic features of speech, would be rapidly modified by Italian speakers when imitating pitch accents of a different (Southern variety of Italian. In particular, we tested whether Bari Italian speakers would produce later peaks for their native rising L+H* (question pitch accent in the process of imitating Neapolitan Italian rising L*+H accents. Also, we tested whether BI speakers are able to modify other phonetic properties (pitch level as well as phonological characteristics (changes in tonal composition of the same contour. In a follow-up study, we tested if the reverse was also true, i.e. whether NI speakers would produce earlier peaks within the L*+H accent in the process of imitating the L+H* of BI questions, despite the presence of a contrast between two rising accents in this variety. Our results show that phonetic detail of tonal alignment can be successfully modified by both BI and NI speakers when imitating a model speaker of the other variety. The hypothesis of a selective imitation process preventing alignment modifications in NI was hence not supported. Moreover the effect was significantly stronger for low frequency words. Participants were also able to imitate other phonetic cues, in that they modified global utterance pitch level. Concerning phonological convergence, speakers modified the tonal specification of the edge tones in order to resemble that of the other variety by either suppressing or increasing the presence of a final H%. Hence, our data show that intonation imitation leads to fast modification of both phonetic and phonological intonation representations including detail of tonal alignment and pitch scaling.

  14. Cortical oscillations and entrainment in speech processing during working memory load

    DEFF Research Database (Denmark)

    Hjortkjær, Jens; Märcher-Rørsted, Jonatan; Fuglsang, Søren A

    2018-01-01

    Neuronal oscillations are thought to play an important role in working memory (WM) and speech processing. Listening to speech in real-life situations is often cognitively demanding but it is unknown whether WM load influences how auditory cortical activity synchronizes to speech features. Here, we...... developed an auditory n-back paradigm to investigate cortical entrainment to speech envelope fluctuations under different degrees of WM load. We measured the electroencephalogram, pupil dilations and behavioural performance from 22 subjects listening to continuous speech with an embedded n-back task....... The speech stimuli consisted of long spoken number sequences created to match natural speech in terms of sentence intonation, syllabic rate and phonetic content. To burden different WM functions during speech processing, listeners performed an n-back task on the speech sequences in different levels...

  15. The Mechanism of Speech Processing in Congenital Amusia: Evidence from Mandarin Speakers

    OpenAIRE

    Liu, Fang; Jiang, Cunmei; Thompson, William Forde; Xu, Yi; Yang, Yufang; Stewart, Lauren

    2012-01-01

    Congenital amusia is a neuro-developmental disorder of pitch perception that causes severe problems with music processing but only subtle difficulties in speech processing. This study investigated speech processing in a group of Mandarin speakers with congenital amusia. Thirteen Mandarin amusics and thirteen matched controls participated in a set of tone and intonation perception tasks and two pitch threshold tasks. Compared with controls, amusics showed impaired performance on word discrimin...

  16. A tutorial on the principles of harmonic intonation for trombonists

    Science.gov (United States)

    Keener, Michael Kenneth

    A Tutorial on the Principles of Harmonic Intonation for Trombonists includes a manual containing background information, explanations of the principles of harmonic intonation, and printed musical examples for use in learning and practicing the concepts of harmonic intonation. An audio compact disk containing music files corresponding to the printed music completes the set. This tutorial is designed to allow performing musicians and students to practice intonation skills with the pitch-controlled music on the compact disc. The music on the CD was recorded in movable-comma just intonation, replicating performance parameters of wind, string, and vocal ensembles. The compact disc includes sixty tracks of ear-training exercises and interval studies with which to practice intonation perception and adjustment. Tuning notes and examples of equal-tempered intervals and just intervals are included on the CD. The intonation exercises consist of musical major scales, duets, trios, and quartet phrases to be referenced while playing the printed music. The CD tracks allow the performer to play scales in unison (or practice other harmonic intervals) or the missing part of the corresponding duet, trio, or quartet exercise. Instructions in the manual guide the user through a process that can help prepare musicians for more accurate musical ensemble performance. The contextual essay that accompanies the tutorial includes a description of the tutorial, a review of related literature, methodology of construction of the tutorial, evaluations and outcomes, conclusions and recommendations for further research, and a selected bibliography.

  17. A Study of Intonation in the Soccer Results.

    Science.gov (United States)

    Bonnet, G.

    1980-01-01

    Reports a study which illustrates that a listener can anticipate the score of the opposing team in sports match results from the variation in the announcer's intonation. Investigates how reliable this prediction is and what linguistic features it involves. Relates these findings to general problems in intonation contour interpretation. (PMJ)

  18. Intonation, Scientism, and "Archetypality".

    Science.gov (United States)

    Vanderslice, Ralph

    This paper reviews Philip Lieberman's "Intonation, Perception, and Language," (Research Monograph No. 38) Cambridge, Massachusetts, M.I.T. Press, 1967. The review is also scheduled to appear in the "Journal of Linguistics." (JD)

  19. Intensive treatment with ultrasound visual feedback for speech sound errors in childhood apraxia

    Directory of Open Access Journals (Sweden)

    Jonathan L Preston

    2016-08-01

    Full Text Available Ultrasound imaging is an adjunct to traditional speech therapy that has shown to be beneficial in the remediation of speech sound errors. Ultrasound biofeedback can be utilized during therapy to provide clients additional knowledge about their tongue shapes when attempting to produce sounds that are in error. The additional feedback may assist children with childhood apraxia of speech in stabilizing motor patterns, thereby facilitating more consistent and accurate productions of sounds and syllables. However, due to its specialized nature, ultrasound visual feedback is a technology that is not widely available to clients. Short-term intensive treatment programs are one option that can be utilized to expand access to ultrasound biofeedback. Schema-based motor learning theory suggests that short-term intensive treatment programs (massed practice may assist children in acquiring more accurate motor patterns. In this case series, three participants ages 10-14 diagnosed with childhood apraxia of speech attended 16 hours of speech therapy over a two-week period to address residual speech sound errors. Two participants had distortions on rhotic sounds, while the third participant demonstrated lateralization of sibilant sounds. During therapy, cues were provided to assist participants in obtaining a tongue shape that facilitated a correct production of the erred sound. Additional practice without ultrasound was also included. Results suggested that all participants showed signs of acquisition of sounds in error. Generalization and retention results were mixed. One participant showed generalization and retention of sounds that were treated; one showed generalization but limited retention; and the third showed no evidence of generalization or retention. Individual characteristics that may facilitate generalization are discussed. Short-term intensive treatment programs using ultrasound biofeedback may result in the acquisition of more accurate motor

  20. THE INTONATION AND SOUND CHARACTERISTICS OF ADVERTISING PRONUNCIATION STYLE

    Directory of Open Access Journals (Sweden)

    Chernyavskaya Elena Sergeevna

    2014-06-01

    Full Text Available The article aims at describing the intonation and sound characteristics of advertising phonetic style. On the basis of acoustic analysis of transcripts of radio advertising tape recordings, broadcasted at different radio stations, as well as in the result of processing the representative part of phrases with the help of special computer programs, the author determines the parameters of superfix means. The article proves that the stylistic parameters of advertising phonetic style are oriented on modern orthoepy, and that the originality of radio advertising sounding is determined by two tendencies – the reduction of stressed vowels duration in the terminal and non-terminal word and the increase of pre-tonic and post-tonic vowels duration of non-terminal word in a phrase. The article also shows that the peculiarity of rhythmic structure of terminal and non-terminal word in radio advertising is formed by means of leveling stressed and unstressed sounds in length. The specificity of intonational structure of an advertising text consists in the following peculiarities: matching of syntactic and syntagmatic division, which allows to denote the blocks of semantic models, forming the text of radio advertising; the allocation of keywords into separate syntagmas; the design of informative parts of advertising text by means of symmetric length correlation of minimal speech segments; the combination of interstyle prosodic elements in the framework of sounding text. Thus, the conducted analysis allowed to conclude, that the texts of sounding advertising are designed using special pronunciation style, marked by sound duration.

  1. Testing a model of intonation in a tone language.

    Science.gov (United States)

    Lindau, M

    1986-09-01

    Schematic fundamental frequency curves of simple statements and questions are generated for Hausa, a two-tone language of Nigeria, using a modified version of an intonational model developed by Gårding and Bruce [Nordic Prosody II, edited by T. Fretheim (Tapir, Trondheim, 1981), pp. 33-39]. In this model, rules for intonation and tones are separated. Intonation is represented as sloping grids of (near) parallel lines, inside which tones are placed. The tones are associated with turning points of the fundamental frequency contour. Local rules may also modify the exact placement of a tone within the grid. The continuous fundamental frequency contour is modeled by concatenating the tonal points using polynomial equations. Thus the final pitch contour is modeled as an interaction between global and local factors. The slope of the intonational grid lines depends at least on sentence type (statement or question), sentence length, and tone pattern. The model is tested by reference to data from nine speakers of Kano Hausa.

  2. Zilbes intonāciju stabilitāte divu paaudžu laikā

    Directory of Open Access Journals (Sweden)

    Velta Rūķe-Draviņa

    2011-07-01

    Full Text Available STABILITY OF SYLLABLE INTONATION IN TWO GENERATIONS Material for the history of Latvian intonationsSummaryThe purpose of this study is to investigate: 1 the retention of the Latvian language syllable intonation in the course of an individual's lifetime, 2 the possibility of transference of an intonation system from one generation to another in bilingual surroundings.This work analyzes three idiolects in the same family, namely, a Latvian-Swedish bilingual's (born 1946, Stockholm idiosyncracies of pronunciation in comparison to the language of his par­ents, who each represent the intonation system of their native region: the mother — Selian, the father — Southwestern-Kursa dialects.This investigation determines that an intonation system acquired in childhood and stabi­lized at grammar school age remains stable even later on in the life of the individual, and does not change even with prolonged contact with speakers of other intonation systems in the same family over a period of thirty years.If each of the parents has his own intonation system, and only one of them speaks with the intonations common to the literary language, then there is a high probability that the child will acquire the latter variant.In ideal balanced bilingual circumstances, a bilingual individual can learn genuine into­nations in both languages, i.e. in this case in Swedish the phonological contrast between "accent 1" and "accent 2", but in Latvian, the contrast between the level tone and the non-level tone.

  3. Intonation as an interface between language and affect.

    Science.gov (United States)

    Grandjean, Didier; Bänziger, Tanja; Scherer, Klaus R

    2006-01-01

    The vocal expression of human emotions is embedded within language and the study of intonation has to take into account two interacting levels of information--emotional and semantic meaning. In addition to the discussion of this dual coding system, an extension of Brunswik's lens model is proposed. This model includes the influences of conventions, norms, and display rules (pull effects) and psychobiological mechanisms (push effects) on emotional vocalizations produced by the speaker (encoding) and the reciprocal influences of these two aspects on attributions made by the listener (decoding), allowing the dissociation and systematic study of the production and perception of intonation. Three empirical studies are described as examples of possibilities of dissociating these different phenomena at the behavioral and neurological levels in the study of intonation.

  4. Neural Segregation of Concurrent Speech: Effects of Background Noise and Reverberation on Auditory Scene Analysis in the Ventral Cochlear Nucleus.

    Science.gov (United States)

    Sayles, Mark; Stasiak, Arkadiusz; Winter, Ian M

    2016-01-01

    Concurrent complex sounds (e.g., two voices speaking at once) are perceptually disentangled into separate "auditory objects". This neural processing often occurs in the presence of acoustic-signal distortions from noise and reverberation (e.g., in a busy restaurant). A difference in periodicity between sounds is a strong segregation cue under quiet, anechoic conditions. However, noise and reverberation exert differential effects on speech intelligibility under "cocktail-party" listening conditions. Previous neurophysiological studies have concentrated on understanding auditory scene analysis under ideal listening conditions. Here, we examine the effects of noise and reverberation on periodicity-based neural segregation of concurrent vowels /a/ and /i/, in the responses of single units in the guinea-pig ventral cochlear nucleus (VCN): the first processing station of the auditory brain stem. In line with human psychoacoustic data, we find reverberation significantly impairs segregation when vowels have an intonated pitch contour, but not when they are spoken on a monotone. In contrast, noise impairs segregation independent of intonation pattern. These results are informative for models of speech processing under ecologically valid listening conditions, where noise and reverberation abound.

  5. Long Intonations and Kalophonia

    DEFF Research Database (Denmark)

    Troelsgård, Christian

    2008-01-01

    The paper shows that the melodic material in quasi-improvised intonation melodies (echemata) in the period from c. 1200 to 1400 develop from being drawn from the traditional genres - especially the pasltikon and asmatikon - to acquiring a more free and decidely 'kalophonic' character....

  6. Temporal and spatio-temporal vibrotactile displays for voice fundamental frequency: an initial evaluation of a new vibrotactile speech perception aid with normal-hearing and hearing-impaired individuals.

    Science.gov (United States)

    Auer, E T; Bernstein, L E; Coulter, D C

    1998-10-01

    Four experiments were performed to evaluate a new wearable vibrotactile speech perception aid that extracts fundamental frequency (F0) and displays the extracted F0 as a single-channel temporal or an eight-channel spatio-temporal stimulus. Specifically, we investigated the perception of intonation (i.e., question versus statement) and emphatic stress (i.e., stress on the first, second, or third word) under Visual-Alone (VA), Visual-Tactile (VT), and Tactile-Alone (TA) conditions and compared performance using the temporal and spatio-temporal vibrotactile display. Subjects were adults with normal hearing in experiments I-III and adults with severe to profound hearing impairments in experiment IV. Both versions of the vibrotactile speech perception aid successfully conveyed intonation. Vibrotactile stress information was successfully conveyed, but vibrotactile stress information did not enhance performance in VT conditions beyond performance in VA conditions. In experiment III, which involved only intonation identification, a reliable advantage for the spatio-temporal display was obtained. Differences between subject groups were obtained for intonation identification, with more accurate VT performance by those with normal hearing. Possible effects of long-term hearing status are discussed.

  7. Young Children's Intonational Marking of New, Given and Contrastive Referents

    Science.gov (United States)

    Grünloh, Thomas; Lieven, Elena; Tomasello, Michael

    2015-01-01

    In the current study we investigate whether 2- and 3-year-old German children use intonation productively to mark the informational status of referents. Using a story-telling task, we compared children's and adults' intonational realization via pitch accent (H*, L* and de-accentuation) of New, Given, and Contrastive referents. Both children and…

  8. The normalities and abnormalities associated with speech in psychometrically-defined schizotypy.

    Science.gov (United States)

    Cohen, Alex S; Auster, Tracey L; McGovern, Jessica E; MacAulay, Rebecca K

    2014-12-01

    Speech deficits are thought to be an important feature of schizotypy--defined as the personality organization reflecting a putative liability for schizophrenia. There is reason to suspect that these deficits manifest as a function of limited cognitive resources. To evaluate this idea, we examined speech from individuals with psychometrically-defined schizotypy during a low cognitively-demanding task versus a relatively high cognitively-demanding task. A range of objective, computer-based measures of speech tapping speech production (silence, number and length of pauses, number and length of utterances), speech variability (global and local intonation and emphasis) and speech content (word fillers, idea density) were employed. Data for control (n=37) and schizotypy (n=39) groups were examined. Results did not confirm our hypotheses. While the cognitive-load task reduced speech expressivity for subjects as a group for most variables, the schizotypy group was not more pathological in speech characteristics compared to the control group. Interestingly, some aspects of speech in schizotypal versus control subjects were healthier under high cognitive load. Moreover, schizotypal subjects performed better, at a trend level, than controls on the cognitively demanding task. These findings hold important implications for our understanding of the neurocognitive architecture associated with the schizophrenia-spectrum. Of particular note concerns the apparent mismatch between self-reported schizotypal traits and objective performance, and the resiliency of speech under cognitive stress in persons with high levels of schizotypy. Copyright © 2014 Elsevier B.V. All rights reserved.

  9. Word final schwa is driven by intonation-The case of Bari Italian.

    Science.gov (United States)

    Grice, Martine; Savino, Michelina; Roettger, Timo B

    2018-04-01

    In order to convey pragmatic functions, a speaker has to select an intonation contour (the tune) in addition to the words that are to be spoken (the text). The tune and text are assumed to be independent of each other, such that any one intonation contour can be produced on different phrases, regardless of the number and nature of the segments they are made up of. However, if the segmental string is too short, certain tunes-especially those with a rising component-call for adjustments to the text. In Italian, for instance, loan words such as "chat" can be produced with a word final schwa when this word occurs at the end of a question. This paper investigates this word final schwa in the Bari variety in a number of different intonation contours. Although its presence and duration is to some extent dependent on idiosyncratic properties of speakers and words, schwa is largely conditioned by intonation. Schwa cannot thus be considered a mere phonetic artefact, since it is relevant for phonology, in that it facilitates the production of communicatively relevant intonation contours.

  10. Rule-Based Storytelling Text-to-Speech (TTS Synthesis

    Directory of Open Access Journals (Sweden)

    Ramli Izzad

    2016-01-01

    Full Text Available In recent years, various real life applications such as talking books, gadgets and humanoid robots have drawn the attention to pursue research in the area of expressive speech synthesis. Speech synthesis is widely used in various applications. However, there is a growing need for an expressive speech synthesis especially for communication and robotic. In this paper, global and local rule are developed to convert neutral to storytelling style speech for the Malay language. In order to generate rules, modification of prosodic parameters such as pitch, intensity, duration, tempo and pauses are considered. Modification of prosodic parameters is examined by performing prosodic analysis on a story collected from an experienced female and male storyteller. The global and local rule is applied in sentence level and synthesized using HNM. Subjective tests are conducted to evaluate the synthesized storytelling speech quality of both rules based on naturalness, intelligibility, and similarity to the original storytelling speech. The results showed that global rule give a better result than local rule

  11. Discourse Intonation - Making It Work

    Directory of Open Access Journals (Sweden)

    Tatjana Paunović

    2008-06-01

    Full Text Available Discourse Intonation (DI (Brazil 1997; Chun 2002 seems to be particularly well suited for use in the EFL classroom, much more so than the rather complex traditional models (e.g. O’Connor and Arnold 1973 or some recent phonological theories. Yet if L2 teachers are to be provided with clear guidelines on how to incorporate DI into communicative language teaching, much more empirical research is needed with L2 students of different L1 backgrounds to uncover the specific problems they face. The small-scale study presented here examines how 15 second-year students of the English Department in Niš manage intonation in a reading task. The analysis focuses on the components singled out by Chun (2002 as crucial for language learners: sentence stress (nuclear tone placement, terminal contour (direction of pitch change and key (pitch range at transition points.

  12. Music and speech prosody: a common rhythm.

    Science.gov (United States)

    Hausen, Maija; Torppa, Ritva; Salmela, Viljami R; Vainio, Martti; Särkämö, Teppo

    2013-01-01

    Disorders of music and speech perception, known as amusia and aphasia, have traditionally been regarded as dissociated deficits based on studies of brain damaged patients. This has been taken as evidence that music and speech are perceived by largely separate and independent networks in the brain. However, recent studies of congenital amusia have broadened this view by showing that the deficit is associated with problems in perceiving speech prosody, especially intonation and emotional prosody. In the present study the association between the perception of music and speech prosody was investigated with healthy Finnish adults (n = 61) using an on-line music perception test including the Scale subtest of Montreal Battery of Evaluation of Amusia (MBEA) and Off-Beat and Out-of-key tasks as well as a prosodic verbal task that measures the perception of word stress. Regression analyses showed that there was a clear association between prosody perception and music perception, especially in the domain of rhythm perception. This association was evident after controlling for music education, age, pitch perception, visuospatial perception, and working memory. Pitch perception was significantly associated with music perception but not with prosody perception. The association between music perception and visuospatial perception (measured using analogous tasks) was less clear. Overall, the pattern of results indicates that there is a robust link between music and speech perception and that this link can be mediated by rhythmic cues (time and stress).

  13. Music and speech prosody: A common rhythm

    Directory of Open Access Journals (Sweden)

    Maija eHausen

    2013-09-01

    Full Text Available Disorders of music and speech perception, known as amusia and aphasia, have traditionally been regarded as dissociated deficits based on studies of brain damaged patients. This has been taken as evidence that music and speech are perceived by largely separate and independent networks in the brain. However, recent studies of congenital amusia have broadened this view by showing that the deficit is associated with problems in perceiving speech prosody, especially intonation and emotional prosody. In the present study the association between the perception of music and speech prosody was investigated with healthy Finnish adults (n = 61 using an on-line music perception test including the Scale subtest of Montreal Battery of Evaluation of Amusia (MBEA and Off-Beat and Out-of-key tasks as well as a prosodic verbal task that measures the perception of word stress. Regression analyses showed that there was a clear association between prosody perception and music perception, especially in the domain of rhythm perception. This association was evident after controlling for music education, age, pitch perception, visuospatial perception and working memory. Pitch perception was significantly associated with music perception but not with prosody perception. The association between music perception and visuospatial perception (measured using analogous tasks was less clear. Overall, the pattern of results indicates that there is a robust link between music and speech perception and that this link can be mediated by rhythmic cues (time and stress.

  14. Music and speech prosody: a common rhythm

    Science.gov (United States)

    Hausen, Maija; Torppa, Ritva; Salmela, Viljami R.; Vainio, Martti; Särkämö, Teppo

    2013-01-01

    Disorders of music and speech perception, known as amusia and aphasia, have traditionally been regarded as dissociated deficits based on studies of brain damaged patients. This has been taken as evidence that music and speech are perceived by largely separate and independent networks in the brain. However, recent studies of congenital amusia have broadened this view by showing that the deficit is associated with problems in perceiving speech prosody, especially intonation and emotional prosody. In the present study the association between the perception of music and speech prosody was investigated with healthy Finnish adults (n = 61) using an on-line music perception test including the Scale subtest of Montreal Battery of Evaluation of Amusia (MBEA) and Off-Beat and Out-of-key tasks as well as a prosodic verbal task that measures the perception of word stress. Regression analyses showed that there was a clear association between prosody perception and music perception, especially in the domain of rhythm perception. This association was evident after controlling for music education, age, pitch perception, visuospatial perception, and working memory. Pitch perception was significantly associated with music perception but not with prosody perception. The association between music perception and visuospatial perception (measured using analogous tasks) was less clear. Overall, the pattern of results indicates that there is a robust link between music and speech perception and that this link can be mediated by rhythmic cues (time and stress). PMID:24032022

  15. Telling in-tune from out-of-tune: widespread evidence for implicit absolute intonation.

    Science.gov (United States)

    Van Hedger, Stephen C; Heald, Shannon L M; Huang, Alex; Rutstein, Brooke; Nusbaum, Howard C

    2017-04-01

    Absolute pitch (AP) is the rare ability to name or produce an isolated musical note without the aid of a reference note. One skill thought to be unique to AP possessors is the ability to provide absolute intonation judgments (e.g., classifying an isolated note as "in-tune" or "out-of-tune"). Recent work has suggested that absolute intonation perception among AP possessors is not crystallized in a critical period of development, but is dynamically maintained by the listening environment, in which the vast majority of Western music is tuned to a specific cultural standard. Given that all listeners of Western music are constantly exposed to this specific cultural tuning standard, our experiments address whether absolute intonation perception extends beyond AP possessors. We demonstrate that non-AP listeners are able to accurately judge the intonation of completely isolated notes. Both musicians and nonmusicians showed evidence for absolute intonation recognition when listening to familiar timbres (piano and violin). When testing unfamiliar timbres (triangle and inverted sine waves), only musicians showed weak evidence of absolute intonation recognition (Experiment 2). Overall, these results highlight a previously unknown similarity between AP and non-AP possessors' long-term musical note representations, including evidence of sensitivity to frequency.

  16. How Consciousness-Raising Affects Intonation and Facilitates Reading Comprehension

    Science.gov (United States)

    Shariati, Mohammad

    2007-01-01

    This paper reports on an investigation about the relation between a student's conscious awareness of the structure of a sentence and the degree of his/her intonation accuracy as well as his/her reading comprehension. The research was done based on the hypothesis that: "if the students are made conscious of the infrastructure of lengthy…

  17. Enseigner l'intonation: L'ordinateur a la rescousse (Teaching Intonation: The Computer to the Rescue).

    Science.gov (United States)

    Canto-Knoerr, Helene

    1992-01-01

    A discussion of intonation instruction in second-language teaching looks at the role, objectives, and scope of such instruction; the tools teachers need to accomplish it (background in theory, practical guidelines, a method, appropriate research, and equipment); and the need for and possible design of related software. (MSE)

  18. Amusia Results in Abnormal Brain Activity following Inappropriate Intonation during Speech Comprehension

    OpenAIRE

    Jiang, Cunmei; Hamm, Jeff P.; Lim, Vanessa K.; Kirk, Ian J.; Chen, Xuhai; Yang, Yufang

    2012-01-01

    Pitch processing is a critical ability on which humans' tonal musical experience depends, and which is also of paramount importance for decoding prosody in speech. Congenital amusia refers to deficits in the ability to properly process musical pitch, and recent evidence has suggested that this musical pitch disorder may impact upon the processing of speech sounds. Here we present the first electrophysiological evidence demonstrating that individuals with amusia who speak Mandarin Chinese are ...

  19. Efficacy of melody-based aphasia therapy may strongly depend on rhythm and conversational speech formulas

    Directory of Open Access Journals (Sweden)

    Benjamin Stahl

    2014-04-01

    Full Text Available Left-hemisphere stroke patients suffering from language and speech disorders are often able to sing entire pieces of text fluently. This finding has inspired a number of melody-based rehabilitation programs – most notable among them a treatment known as Melodic Intonation Therapy – as well as two fundamental research questions. When the experimental design focuses on one point in time (cross section, one may determine whether or not singing has an immediate effect on syllable production in patients with language and speech disorders. When the design focuses on changes over several points in time (longitudinal section, one may gain insight as to whether or not singing has a long-term effect on language and speech recovery. The current work addresses both of these questions with two separate experiments that investigate the interplay of melody, rhythm and lyric type in 32 patients with non-fluent aphasia and apraxia of speech (Stahl et al., 2011; Stahl et al., 2013. Taken together, the experiments deliver three main results. First, singing and rhythmic pacing proved to be equally effective in facilitating immediate syllable production and long-term language and speech recovery. Controlling for various influences such as prosody, syllable duration and phonetic complexity, the data did not reveal any advantage of singing over rhythmic speech. This result was independent of lesion size and lesion location in the patients. Second, patients with extensive left-sided basal ganglia lesions produced more correct syllables when their speech was paced by rhythmic drumbeats. This observation is consistent with the idea that regular auditory cues may partially compensate for corticostriatal damage and thereby improve speech-motor planning (Grahn & Watson, 2013. Third, conversational speech formulas and well-known song lyrics yielded higher rates of correct syllable production than novel word sequences – whether patients were singing or speaking

  20. The integration of prosodic speech in high functioning autism: a preliminary FMRI study.

    Directory of Open Access Journals (Sweden)

    Isabelle Hesling

    2010-07-01

    Full Text Available Autism is a neurodevelopmental disorder characterized by a specific triad of symptoms such as abnormalities in social interaction, abnormalities in communication and restricted activities and interests. While verbal autistic subjects may present a correct mastery of the formal aspects of speech, they have difficulties in prosody (music of speech, leading to communication disorders. Few behavioural studies have revealed a prosodic impairment in children with autism, and among the few fMRI studies aiming at assessing the neural network involved in language, none has specifically studied prosodic speech. The aim of the present study was to characterize specific prosodic components such as linguistic prosody (intonation, rhythm and emphasis and emotional prosody and to correlate them with the neural network underlying them.We used a behavioural test (Profiling Elements of the Prosodic System, PEPS and fMRI to characterize prosodic deficits and investigate the neural network underlying prosodic processing. Results revealed the existence of a link between perceptive and productive prosodic deficits for some prosodic components (rhythm, emphasis and affect in HFA and also revealed that the neural network involved in prosodic speech perception exhibits abnormal activation in the left SMG as compared to controls (activation positively correlated with intonation and emphasis and an absence of deactivation patterns in regions involved in the default mode.These prosodic impairments could not only result from activation patterns abnormalities but also from an inability to adequately use the strategy of the default network inhibition, both mechanisms that have to be considered for decreasing task performance in High Functioning Autism.

  1. Melodic intonation therapy in chronic aphasia: Evidence from a pilot randomized controlled trial

    NARCIS (Netherlands)

    I. van der Meulen (Ineke); W.M.E. van de Sandt-Koenderman (Mieke); Heijenbrok, M.H. (Majanka H.); E.G. Visch-Brink (Evy); Ribber, G.M. (Gerard M.)

    2016-01-01

    textabstractMelodic Intonation Therapy (MIT) is a language production therapy for severely nonfluent aphasic patients using melodic intoning and rhythm to restore language. Although many studies have reported its beneficial effects on language production, randomized controlled trials (RCT) examining

  2. Musical Intonation of Wind Instruments and Temperature

    Science.gov (United States)

    Zendri, G.; Valdan, M.; Gratton, L. M.; Oss, S.

    2015-01-01

    Wind musical instruments are affected in their intonation by temperature. We show how to account for these effects in a simple experiment, and provide results in languages accessible to both physics and music professionals.

  3. Choral Singing: Using Gestures as an Aid to Intonation and Tone Quality

    Directory of Open Access Journals (Sweden)

    Weider Martins

    2016-10-01

    Full Text Available The work of tone quality and intonation is a fundamental task of the conductor; and body movements associated with learning are favorable attitudes to effective learning. The objective of this article is to describe self-perception in the use of gesture for intonation and tone quality for singers in different contexts. The sample consisting of 45 volunteers divided into two groups, amateur group (N=30 and layman group (N=15, that underwent an evaluation by means of the Voice and Corporal Expression Protocol, as well as to a trial of the methodology used by conductors Henry Leck and  Randy Stenson (2012. The results showed on average improved body awareness of 87% of cases; just over 85% improved intonation and respiratory control; 80% asserted it easier to reach higher notes; 56% (layman group and 73% (amateur group felt less strain to sing. In both groups, the findings of this study confirm that the use of corporal gesture is beneficial, whether in a context of amateurs or laymen, improving intonation, resonance, respiratory support and less strain.

  4. Discrimination and preference of speech and non-speech sounds in autism patients%孤独症患者言语及非言语声音辨识和偏好特征

    Institute of Scientific and Technical Information of China (English)

    王崇颖; 江鸣山; 徐旸; 马斐然; 石锋

    2011-01-01

    Objective:To explore the discrimination and preference of speech and non-speech sounds in autism patients. Methods: Ten people (5 children vs. 5 adults) diagnosed with autism according to the criteria of Diagnostic and Statistical Manual of Mental Disorders. Fourth Version ( DSM-Ⅳ) were selected from database of Nankai University Center for Behavioural Science. Together with 10 healthy controls with matched age, people with autism were tested by three experiments on speech sounds, pure tone and intonation which were recorded and modified by Praat, a voice analysis software. Their discrimination and preference were collected orally. The exact probability values were calculated. Results: The results showed that there were no significant differences on the discrimination of speech sounds, pure tone and intonation between autism patients and controls ( P > 0. 05) while controls preferred speech and non-speech sounds with higher pitch than autism ( e. g. , - 100Hz/ +50Hz. 2 vs. 7. P < 0. 05:50Hz/250Hz. 4 vs. 10. P < 0. 05) and autism preferred non-speech sounds with lower pitch ( 100Hz/250Hz. 6 vs. 3.P < 0. 05). No significant difference on the preference of intonation between autism and controls ( P > 0. 05) was found. Conclusion:lt shows that people with autism have impaired auditory processing on speech and non-speech sounds.%目的:探究孤独症患者对言语及非言语声音的辨识和偏好特征.方法:从南开大学医学院行为医学中心患者数据库中选取根据美国精神障碍诊断与统计手册第4版(DSM-Ⅳ)诊断标准确诊的孤独症患者10名(儿童和成年人各5例),选取与年龄匹配的正常对照10名.所有被试均接受由专业的语音软件Praat录制和生成的语音音高、纯音音高和韵律的实验测试,口头报告其对言语及非言语声音的辨识和偏好结果.结果:孤独症患者在语音音高、纯音音高和韵律的辨识上和正常对照组差异无统计学意义(均P>0.05).

  5. The Therapeutic Effect of Speechvive on Prosody in Parkinson's Disease

    Science.gov (United States)

    Kiefer, Brianna Rose

    It is well known that physiological impairments secondary to Parkinson's Disease (PD) negatively impact speech production. Individuals with PD display vocal, prosodic, resonant, and articulatory abnormalities which reduce communicative effectiveness. Prosody is a broad term which refers to the alterations in pitch, duration, and loudness used by speakers to convey important linguistic and paralinguistic information during speech. Little is known about the prosodic abnormalities associated with PD relative to healthy older adults; however, it is well known that individuals with PD display impairments in their ability to modulate the acoustic cues (pitch, duration, intensity) associated with prosodic inflection in speech. Literature presently lacks sufficient evidence to support treatment paradigms commonly used to address dysprosody in PD. Thus, there is a significant need to develop and investigate potential evidence-based treatment paradigms for dysprosody associated with PD. The present study aimed to examine the potential treatment effects the SpeechVive device has on treating dysprosody in PD. Acoustic recordings were obtained from 15 individuals with PD during a reading task. Participants read the passage at the start of the study and 12 weeks later, after wearing the SpeechVive device for the intervening weeks. Main outcome measures examined productions of contrastive stress, intonation contours, rate, and patterns of pausing. The results revealed that participants increased vocal intensity levels during the production of stressed words and improved standard deviation of pitch during the productions of intonation contours. Lastly, the device was found to improve participants' abilities to pause relative to syntactic boundaries.

  6. The Intonation of Noun Phrase Subjects and Clause- Modifying ...

    African Journals Online (AJOL)

    FIRST LADY

    Oladipupo, Rotimi O. - Centre for Foundation Education, Bells University of. Technology ... native Englishes, especially at the level of phonology, this study investigates ... Keywords: Intonation tunes, Nigerian English, Noun Phrase Subjects,.

  7. Cortical oscillations and entrainment in speech processing during working memory load.

    Science.gov (United States)

    Hjortkjaer, Jens; Märcher-Rørsted, Jonatan; Fuglsang, Søren A; Dau, Torsten

    2018-02-02

    Neuronal oscillations are thought to play an important role in working memory (WM) and speech processing. Listening to speech in real-life situations is often cognitively demanding but it is unknown whether WM load influences how auditory cortical activity synchronizes to speech features. Here, we developed an auditory n-back paradigm to investigate cortical entrainment to speech envelope fluctuations under different degrees of WM load. We measured the electroencephalogram, pupil dilations and behavioural performance from 22 subjects listening to continuous speech with an embedded n-back task. The speech stimuli consisted of long spoken number sequences created to match natural speech in terms of sentence intonation, syllabic rate and phonetic content. To burden different WM functions during speech processing, listeners performed an n-back task on the speech sequences in different levels of background noise. Increasing WM load at higher n-back levels was associated with a decrease in posterior alpha power as well as increased pupil dilations. Frontal theta power increased at the start of the trial and increased additionally with higher n-back level. The observed alpha-theta power changes are consistent with visual n-back paradigms suggesting general oscillatory correlates of WM processing load. Speech entrainment was measured as a linear mapping between the envelope of the speech signal and low-frequency cortical activity (level) decreased cortical speech envelope entrainment. Although entrainment persisted under high load, our results suggest a top-down influence of WM processing on cortical speech entrainment. © 2018 The Authors. European Journal of Neuroscience published by Federation of European Neuroscience Societies and John Wiley & Sons Ltd.

  8. Rasper: a Mechatronic Noise-Intoner

    OpenAIRE

    Zareei, Mo; Kapur, Ajay; Carnegie, Dale A.

    2014-01-01

    Over the past few decades, there has been an increasing number of musical instruments and works of sound art that incorporate robotics and mechatronics. This paper proposes a new approach in classification of such works and focuses on those whose ideological roots can be sought in Luigi Russolo's noise-intoners (intonarumori). It presents a discussion on works in which mechatronics is used to investigate new and traditionally perceived as ``extra-musical" sonic territories, and introduces Ras...

  9. Intervention efficacy and intensity for children with speech sound disorder.

    Science.gov (United States)

    Allen, Melissa M

    2013-06-01

    Clinicians do not have an evidence base they can use to recommend optimum intervention intensity for preschool children who present with speech sound disorder (SSD). This study examined the effect of dose frequency on phonological performance and the efficacy of the multiple oppositions approach. Fifty-four preschool children with SSD were randomly assigned to one of three intervention conditions. Two intervention conditions received the multiple oppositions approach either 3 times per week for 8 weeks (P3) or once weekly for 24 weeks (P1). A control (C) condition received a storybook intervention. Percentage of consonants correct (PCC) was evaluated at 8 weeks and after 24 sessions. PCC gain was examined after a 6-week maintenance period. The P3 condition had a significantly better phonological outcome than the P1 and C conditions at 8 weeks and than the P1 condition after 24 weeks. There were no significant differences between the P1 and C conditions. There was no significant difference between the P1 and P3 conditions in PCC gain during the maintenance period. Preschool children with SSD who received the multiple oppositions approach made significantly greater gains when they were provided with a more intensive dose frequency and when cumulative intervention intensity was held constant.

  10. Automatic speech recognition (ASR) based approach for speech therapy of aphasic patients: A review

    Science.gov (United States)

    Jamal, Norezmi; Shanta, Shahnoor; Mahmud, Farhanahani; Sha'abani, MNAH

    2017-09-01

    This paper reviews the state-of-the-art an automatic speech recognition (ASR) based approach for speech therapy of aphasic patients. Aphasia is a condition in which the affected person suffers from speech and language disorder resulting from a stroke or brain injury. Since there is a growing body of evidence indicating the possibility of improving the symptoms at an early stage, ASR based solutions are increasingly being researched for speech and language therapy. ASR is a technology that transfers human speech into transcript text by matching with the system's library. This is particularly useful in speech rehabilitation therapies as they provide accurate, real-time evaluation for speech input from an individual with speech disorder. ASR based approaches for speech therapy recognize the speech input from the aphasic patient and provide real-time feedback response to their mistakes. However, the accuracy of ASR is dependent on many factors such as, phoneme recognition, speech continuity, speaker and environmental differences as well as our depth of knowledge on human language understanding. Hence, the review examines recent development of ASR technologies and its performance for individuals with speech and language disorders.

  11. Changes in N400 Topography Following Intensive Speech Language Therapy for Individuals with Aphasia

    Science.gov (United States)

    Wilson, K. Ryan; O'Rourke, Heather; Wozniak, Linda A.; Kostopoulos, Ellina; Marchand, Yannick; Newman, Aaron J.

    2012-01-01

    Our goal was to characterize the effects of intensive aphasia therapy on the N400, an electrophysiological index of lexical-semantic processing. Immediately before and after 4 weeks of intensive speech-language therapy, people with aphasia performed a task in which they had to determine whether spoken words were a "match" or a "mismatch" to…

  12. The mechanism of speech processing in congenital amusia: evidence from Mandarin speakers.

    Directory of Open Access Journals (Sweden)

    Fang Liu

    Full Text Available Congenital amusia is a neuro-developmental disorder of pitch perception that causes severe problems with music processing but only subtle difficulties in speech processing. This study investigated speech processing in a group of Mandarin speakers with congenital amusia. Thirteen Mandarin amusics and thirteen matched controls participated in a set of tone and intonation perception tasks and two pitch threshold tasks. Compared with controls, amusics showed impaired performance on word discrimination in natural speech and their gliding tone analogs. They also performed worse than controls on discriminating gliding tone sequences derived from statements and questions, and showed elevated thresholds for pitch change detection and pitch direction discrimination. However, they performed as well as controls on word identification, and on statement-question identification and discrimination in natural speech. Overall, tasks that involved multiple acoustic cues to communicative meaning were not impacted by amusia. Only when the tasks relied mainly on pitch sensitivity did amusics show impaired performance compared to controls. These findings help explain why amusia only affects speech processing in subtle ways. Further studies on a larger sample of Mandarin amusics and on amusics of other language backgrounds are needed to consolidate these results.

  13. The mechanism of speech processing in congenital amusia: evidence from Mandarin speakers.

    Science.gov (United States)

    Liu, Fang; Jiang, Cunmei; Thompson, William Forde; Xu, Yi; Yang, Yufang; Stewart, Lauren

    2012-01-01

    Congenital amusia is a neuro-developmental disorder of pitch perception that causes severe problems with music processing but only subtle difficulties in speech processing. This study investigated speech processing in a group of Mandarin speakers with congenital amusia. Thirteen Mandarin amusics and thirteen matched controls participated in a set of tone and intonation perception tasks and two pitch threshold tasks. Compared with controls, amusics showed impaired performance on word discrimination in natural speech and their gliding tone analogs. They also performed worse than controls on discriminating gliding tone sequences derived from statements and questions, and showed elevated thresholds for pitch change detection and pitch direction discrimination. However, they performed as well as controls on word identification, and on statement-question identification and discrimination in natural speech. Overall, tasks that involved multiple acoustic cues to communicative meaning were not impacted by amusia. Only when the tasks relied mainly on pitch sensitivity did amusics show impaired performance compared to controls. These findings help explain why amusia only affects speech processing in subtle ways. Further studies on a larger sample of Mandarin amusics and on amusics of other language backgrounds are needed to consolidate these results.

  14. A Russian Keyword Spotting System Based on Large Vocabulary Continuous Speech Recognition and Linguistic Knowledge

    Directory of Open Access Journals (Sweden)

    Valentin Smirnov

    2016-01-01

    Full Text Available The paper describes the key concepts of a word spotting system for Russian based on large vocabulary continuous speech recognition. Key algorithms and system settings are described, including the pronunciation variation algorithm, and the experimental results on the real-life telecom data are provided. The description of system architecture and the user interface is provided. The system is based on CMU Sphinx open-source speech recognition platform and on the linguistic models and algorithms developed by Speech Drive LLC. The effective combination of baseline statistic methods, real-world training data, and the intensive use of linguistic knowledge led to a quality result applicable to industrial use.

  15. Intonation and compensation of fretted string instruments

    Science.gov (United States)

    Varieschi, Gabriele; Gower, Christina

    2011-04-01

    We discuss theoretical and physical models that are useful for analyzing the intonation of musical instruments such as guitars and mandolins and can be used to improve the tuning on these instruments. The placement of frets on the fingerboard is designed according to mathematical rules and the assumption of an ideal string. The analysis becomes more complicated when we include the effects of deformation of the string and inharmonicity due to other string characteristics. As a consequence, perfect intonation of all the notes on the instrument cannot be achieved, but complex compensation procedures can be introduced to minimize the problem. To test the validity of these procedures, we performed extensive measurements using standard monochord sonometers and other acoustical devices, confirming the correctness of our theoretical models. These experimental activities can be integrated into acoustics courses and laboratories and can become a more advanced version of basic experiments with monochords and sonometers. This work was supported by a grant from the Frank R. Seaver College of Science and Engineering, Loyola Marymount University.

  16. Examining Acoustic and Kinematic Measures of Articulatory Working Space: Effects of Speech Intensity.

    Science.gov (United States)

    Whitfield, Jason A; Dromey, Christopher; Palmer, Panika

    2018-04-18

    The purpose of this study was to examine the effect of speech intensity on acoustic and kinematic vowel space measures and conduct a preliminary examination of the relationship between kinematic and acoustic vowel space metrics calculated from continuously sampled lingual marker and formant traces. Young adult speakers produced 3 repetitions of 2 different sentences at 3 different loudness levels. Lingual kinematic and acoustic signals were collected and analyzed. Acoustic and kinematic variants of several vowel space metrics were calculated from the formant frequencies and the position of 2 lingual markers. Traditional metrics included triangular vowel space area and the vowel articulation index. Acoustic and kinematic variants of sentence-level metrics based on the articulatory-acoustic vowel space and the vowel space hull area were also calculated. Both acoustic and kinematic variants of the sentence-level metrics significantly increased with an increase in loudness, whereas no statistically significant differences in traditional vowel-point metrics were observed for either the kinematic or acoustic variants across the 3 loudness conditions. In addition, moderate-to-strong relationships between the acoustic and kinematic variants of the sentence-level vowel space metrics were observed for the majority of participants. These data suggest that both kinematic and acoustic vowel space metrics that reflect the dynamic contributions of both consonant and vowel segments are sensitive to within-speaker changes in articulation associated with manipulations of speech intensity.

  17. Spectral integration in speech and non-speech sounds

    Science.gov (United States)

    Jacewicz, Ewa

    2005-04-01

    Spectral integration (or formant averaging) was proposed in vowel perception research to account for the observation that a reduction of the intensity of one of two closely spaced formants (as in /u/) produced a predictable shift in vowel quality [Delattre et al., Word 8, 195-210 (1952)]. A related observation was reported in psychoacoustics, indicating that when the components of a two-tone periodic complex differ in amplitude and frequency, its perceived pitch is shifted toward that of the more intense tone [Helmholtz, App. XIV (1875/1948)]. Subsequent research in both fields focused on the frequency interval that separates these two spectral components, in an attempt to determine the size of the bandwidth for spectral integration to occur. This talk will review the accumulated evidence for and against spectral integration within the hypothesized limit of 3.5 Bark for static and dynamic signals in speech perception and psychoacoustics. Based on similarities in the processing of speech and non-speech sounds, it is suggested that spectral integration may reflect a general property of the auditory system. A larger frequency bandwidth, possibly close to 3.5 Bark, may be utilized in integrating acoustic information, including speech, complex signals, or sound quality of a violin.

  18. Unifying Speech and Language in a Developmentally Sensitive Model of Production.

    Science.gov (United States)

    Redford, Melissa A

    2015-11-01

    Speaking is an intentional activity. It is also a complex motor skill; one that exhibits protracted development and the fully automatic character of an overlearned behavior. Together these observations suggest an analogy with skilled behavior in the non-language domain. This analogy is used here to argue for a model of production that is grounded in the activity of speaking and structured during language acquisition. The focus is on the plan that controls the execution of fluent speech; specifically, on the units that are activated during the production of an intonational phrase. These units are schemas: temporally structured sequences of remembered actions and their sensory outcomes. Schemas are activated and inhibited via associated goals, which are linked to specific meanings. Schemas may fuse together over developmental time with repeated use to form larger units, thereby affecting the relative timing of sequential action in participating schemas. In this way, the hierarchical structure of the speech plan and ensuing rhythm patterns of speech are a product of development. Individual schemas may also become differentiated during development, but only if subsequences are associated with meaning. The necessary association of action and meaning gives rise to assumptions about the primacy of certain linguistic forms in the production process. Overall, schema representations connect usage-based theories of language to the action of speaking.

  19. Influence of intensive phonomotor rehabilitation on apraxia of speech.

    Science.gov (United States)

    Kendall, Diane L; Rodriguez, Amy D; Rosenbek, John C; Conway, Tim; Gonzalez Rothi, Leslie J

    2006-01-01

    In this phase I rehabilitation study, we investigated the effects of an intensive phonomotor rehabilitation program on verbal production in a 73-year-old male, 11 years postonset a left-hemisphere stroke, who exhibited apraxia of speech and aphasia. In the context of a single-subject design, we studied whether treatment would improve phoneme production and generalize to repetition of multisyllabic words, words of increasing length, discourse, and measures of self-report. We predicted that a predominant motor impairment would respond to intensive phonomotor rehabilitation. While able to learn to produce individual sounds, the subject did not exhibit generalization to other aspects of motor production. Discourse production was judged perceptually slower in rate and less effortful, but also less natural. Finally, self-report indicated less apprehension toward speaking with unfamiliar people, increased telephone use, and increased ease of communication.

  20. Perception of Music and Speech in Adolescents with Cochlear Implants – A Pilot Study on Effects of Intensive Musical Ear Training

    DEFF Research Database (Denmark)

    Petersen, Bjørn; Sørensen, Stine Derdau; Pedersen, Ellen Raben

    measures of rehabilitation are important throughout adolescence. Music training may provide a beneficial method of strengthening not only music perception, but also linguistic skills, particularly prosody. The purpose of this study was to examine perception of music and speech and music engagement...... of adolescent CI users and the potential effects of an intensive musical ear training program. METHODS Eleven adolescent CI users participated in a short intensive training program involving music making activities and computer based listening exercises. Ten NH agemates formed a reference group, who followed...... their standard school schedule and received no music training. Before and after the intervention period, both groups completed a set of tests for perception of music, speech and emotional prosody. In addition, the participants filled out a questionnaire which examined music listening habits and enjoyment...

  1. Non-invasive brain stimulation enhances the effects of Melodic Intonation Therapy

    Directory of Open Access Journals (Sweden)

    Bradley W. Vines

    2011-09-01

    Full Text Available Research has suggested that a fronto-temporal network in the right hemisphere may be responsible for mediating Melodic Intonation Therapy’s positive effects on speech recovery. We investigated the potential for a non-invasive brain stimulation technique, transcranial direct current stimulation (tDCS, to augment the benefits of MIT in patients with non-fluent aphasia by modulating neural activity in the brain during treatment with MIT. The polarity of the current applied to the scalp determines the effects of tDCS on the underlying tissue: anodal tDCS increases excitability, whereas cathodal tDCS decreases excitability. We applied anodal tDCS to the posterior inferior frontal gyrus (IFG of the right hemisphere, an area that has been shown to both contribute to singing through the mapping of sounds to ariculatory actions and serve as a key region in the process of recovery from aphasia, particularly in patients with large left hemispheric lesions. The stimulation was applied while patients were treated with MIT by a trained therapist. Six patients with moderate to severe non-fluent aphasia underwent three consecutive days of anodal-tDCS+MIT, and an equivalent series of sham-tDCS+MIT. The two treatment series were separated by one week, and the order in which the treatments were administered was randomized. Compared to the effects of sham-tDCS+MIT, anodal-tDCS+MIT led to significant improvements in fluency of speech. These results support the hypothesis that, as the brain seeks to reorganize and compensate for damage to left-hemisphere language centers, combining anodal-tDCS with MIT may further recovery from post-stroke aphasia by enhancing activity in a right-hemisphere sensorimotor network for articulation.

  2. Treatment for Acquired Apraxia of Speech: Examination of Treatment Intensity and Practice Schedule

    Science.gov (United States)

    Wambaugh, Julie L.; Nessler, Christina; Cameron, Rosalea; Mauszycki, Shannon C.

    2013-01-01

    Purpose: The authors designed this investigation to extend the development of a treatment for acquired apraxia of speech (AOS)--sound production treatment (SPT)--by examining the effects of 2 treatment intensities and 2 schedules of practice. Method: The authors used a multiple baseline design across participants and behaviors with 4 speakers with…

  3. The Effects of Gesture and Movement Training on the Intonation of Children's Singing in Vocal Warm-Up Sessions

    Science.gov (United States)

    Liao, Mei-Ying; Davidson, Jane W.

    2016-01-01

    The main purpose of the current study was to examine the effects of gesture and movement training for beginning children's choirs with regard to improving intonation. It was a between-subjects design with one independent variable Training Technique (TT). One dependent variable was measured: intonation in the singing of vocal pattern warm-up…

  4. Tonal Targets in Early Child English, Spanish, and Catalan

    Science.gov (United States)

    Astruc, Lluisa; Payne, Elinor; Post, Brechtje; Vanrell, Maria del Mar; Prieto, Pilar

    2013-01-01

    This study analyses the scaling and alignment of low and high intonational targets in the speech of 27 children--nine English-speaking, nine Catalan-speaking and nine Spanish-speaking--between the ages of two and six years. We compared the intonational patterns of words controlled for number of syllables and stress position in the child speech to…

  5. Perceptual Learning of Intonation Contour Categories in Adults and 9- to 11-Year-Old Children: Adults Are More Narrow-Minded

    Science.gov (United States)

    Kapatsinski, Vsevolod; Olejarczuk, Paul; Redford, Melissa A.

    2017-01-01

    We report on rapid perceptual learning of intonation contour categories in adults and 9- to 11-year-old children. Intonation contours are temporally extended patterns, whose perception requires temporal integration and therefore poses significant working memory challenges. Both children and adults form relatively abstract representations of…

  6. Effects of Early Bilingual Experience with a Tone and a Non-Tone Language on Speech-Music Integration.

    Directory of Open Access Journals (Sweden)

    Salomi S Asaridou

    Full Text Available We investigated music and language processing in a group of early bilinguals who spoke a tone language and a non-tone language (Cantonese and Dutch. We assessed online speech-music processing interactions, that is, interactions that occur when speech and music are processed simultaneously in songs, with a speeded classification task. In this task, participants judged sung pseudowords either musically (based on the direction of the musical interval or phonologically (based on the identity of the sung vowel. We also assessed longer-term effects of linguistic experience on musical ability, that is, the influence of extensive prior experience with language when processing music. These effects were assessed with a task in which participants had to learn to identify musical intervals and with four pitch-perception tasks. Our hypothesis was that due to their experience in two different languages using lexical versus intonational tone, the early Cantonese-Dutch bilinguals would outperform the Dutch control participants. In online processing, the Cantonese-Dutch bilinguals processed speech and music more holistically than controls. This effect seems to be driven by experience with a tone language, in which integration of segmental and pitch information is fundamental. Regarding longer-term effects of linguistic experience, we found no evidence for a bilingual advantage in either the music-interval learning task or the pitch-perception tasks. Together, these results suggest that being a Cantonese-Dutch bilingual does not have any measurable longer-term effects on pitch and music processing, but does have consequences for how speech and music are processed jointly.

  7. Speech Transduction Based on Linguistic Content

    DEFF Research Database (Denmark)

    Juel Henrichsen, Peter; Christiansen, Thomas Ulrich

    Digital hearing aids use a variety of advanced digital signal processing methods in order to improve speech intelligibility. These methods are based on knowledge about the acoustics outside the ear as well as psychoacoustics. This paper investigates the recent observation that speech elements...... with a high degree of information can be robustly identified based on basic acoustic properties, i.e., function words have greater spectral tilt than content words for each of the 18 Danish talkers investigated. In this paper we examine these spectral tilt differences as a function of time based on a speech...... material six times the duration of previous investigations. Our results show that the correlation of spectral tilt with information content is relatively constant across time, even if averaged across talkers. This indicates that it is possible to devise a robust method for estimating information density...

  8. Specific neural traces for intonational discourse categories as revealed by human-evoked potentials.

    Science.gov (United States)

    Borràs-Comes, Joan; Costa-Faidella, Jordi; Prieto, Pilar; Escera, Carles

    2012-04-01

    The neural representation of segmental and tonal phonological distinctions has been shown by means of the MMN ERP, yet this is not the case for intonational discourse contrasts. In Catalan, a rising-falling intonational sequence can be perceived as a statement or as a counterexpectational question, depending exclusively on the size of the pitch range interval of the rising movement. We tested here, using the MMN, whether such categorical distinctions elicited distinct neurophysiological patterns of activity, supporting their specific neural representation. From a behavioral identification experiment, we set the boundary between the two categories and defined four stimuli across the continuum. Although the physical distance between each pair of stimuli was kept constant, the central pair represented an across-category contrast, whereas the other pairs represented within-category contrasts. These four auditory stimuli were contrasted by pairs in three different oddball blocks. The mean amplitude of the MMN was larger for the across-category contrast, suggesting that intonational contrasts in the target language can be encoded automatically in the auditory cortex. These results are in line with recent findings in other fields of linguistics, showing that, when a boundary between categories is crossed, the MMN response is not just larger but rather includes a separate subcomponent.

  9. Intonational Meaning in Institutional Settings: The Role of Syntagmatic Relations

    Science.gov (United States)

    Wichmann, Anne

    2010-01-01

    This paper addresses the power of intonation to convey interpersonal or attitudinal meaning. Speakers have been shown to accommodate to each other in the course of conversation, and this convergence may be perceived as a sign of empathy. Accommodation often involves paradigmatic choices--choosing the same words, gestures, regional accent or…

  10. Infants' Perception of the Intonation of Broad and Narrow Focus

    Science.gov (United States)

    Butler, Joseph; Vigário, Marina; Frota, Sónia

    2016-01-01

    Infants perceive intonation contrasts early in development in contrast to lexical stress but similarly to lexical pitch accent. Previous studies have mostly focused on pitch height/direction contrasts; however, languages use a variety of pitch features to signal meaning, including differences in pitch timing. In this study, we investigate infants'…

  11. The Influence of F0 Discontinuity on Intonational Cues to Word Segmentation

    DEFF Research Database (Denmark)

    Welby, Pauline; Niebuhr, Oliver

    2016-01-01

    The paper presents the results of a 2AFC offline wordidentification experiment by [1], reanalyzed to investigate how F0 discontinuities due to voiceless fricatives and voiceless stops affect cues to word segmentation in accentual phraseinitial rises (APRs) of French relative to a reference...... condition with liquid and nasal consonants. Although preliminary due to the small sample size, we found initial evidence that voiceless consonants degrade F0 cues to word segmentation relative to liquids and nasals. In addition, this degradation seems to be stronger for voiceless stops than for voiceless...... pitch impressions created by the fricative noise. Our results call for follow-up studies that use French APRs as a testing ground for this intonational model and also examine the precise nature of intonational cues to word segmentation....

  12. On the universal and language-specific perception of paralinguistic intonational meaning

    NARCIS (Netherlands)

    Chen, A.J.

    2005-01-01

    This thesis presents a collection of studies on the perception of paralinguistic intonational meanings, which stem from three biological codes, the Frequency Code, the Effort Code and the Production Code. On the one hand, these studies shown that listeners, regardless of language background,

  13. Measuring prosodic deficits in oral discourse by speakers with fluent aphasia

    Directory of Open Access Journals (Sweden)

    Tan Lee

    2015-04-01

    Full Text Available Introduction Prosody refers to the part of phonology that includes speech rhythm, stress, and intonation (Gandour, 1998. Proper intonation is crucial for expressing one’s emotion and linguistic meaning. Most studies examining prosodic deficits have been primarily focused on Broca’s aphasia (e.g., Danly & Shapiro, 1982. The current study proposed a computer-assisted method for systematic and objective examination of intonation patterns in aphasic oral discourse. The speech materials were in Hong Kong Cantonese, which is known of being rich in tones. Since surface F0 contour is the result of complicated interplay between sentence-level intonation and syllable-level lexical tones, the challenge is on how to extract meaningful representations of intonation from acoustic signals. Methods Four individuals with fluent aphasia (two anomic and two transcortical sensory and four gender-, age-, and education-matched controls participated. Based on the Cantonese AphasiaBank protocol (Kong, Law, & Lee, 2009, narrative samples and corresponding audio recordings were collected using discourse tasks of personal monologue, picture and sequential description, and story-telling. There were eight recordings for each subject. Each oral discourse was divided into sentences by manual inspection of the orthographic transcription and the respective acoustic signal. A sentence was defined as a sequence of words that in principle covers a complete thought. However, it was common in spontaneous oral discourse, especially in aphasia, that some of the sentences did not end with a completed expression but switched to a new topic. Occasionally, an obvious interjection was observed during an attempt of restarting a statement. Phoneme-level automatic time alignment was performed on each audio recording using hidden Markov model (HMM based forced alignment technique (Lee, Kong, Chan, & Wang, 2013. F0 was estimated from the acoustic signal at intervals of 0.01 second by

  14. The Relationship between Speech Production and Speech Perception Deficits in Parkinson's Disease

    Science.gov (United States)

    De Keyser, Kim; Santens, Patrick; Bockstael, Annelies; Botteldooren, Dick; Talsma, Durk; De Vos, Stefanie; Van Cauwenberghe, Mieke; Verheugen, Femke; Corthals, Paul; De Letter, Miet

    2016-01-01

    Purpose: This study investigated the possible relationship between hypokinetic speech production and speech intensity perception in patients with Parkinson's disease (PD). Method: Participants included 14 patients with idiopathic PD and 14 matched healthy controls (HCs) with normal hearing and cognition. First, speech production was objectified…

  15. Melodic Intonation Therapy in chronic aphasia: evidence from a pilot randomized controlled trial

    Directory of Open Access Journals (Sweden)

    Ineke Van Der Meulen

    2016-11-01

    Full Text Available AbstractMelodic Intonation Therapy (MIT is a language production therapy for severely non-fluent aphasic patients using melodic intoning and rhythm to restore language. Although many studies have reported its beneficial effects on language production, randomized controlled trials (RCT examining the efficacy of MIT are rare. In an earlier publication, we presented the results of an RCT on MIT in subacute aphasia and found that MIT was effective on trained and untrained items. Further, we observed a clear trend in improved functional language use after MIT. Subacute aphasic patients receiving MIT improved considerably on language tasks measuring connected speech and daily life verbal communication. Here, we present the results of a pilot RCT on MIT in chronic aphasia and compare these to the results observed in subacute aphasia. We used a multicenter waiting-list randomized controlled trial design. Patients with chronic (>1 year post-stroke aphasia were randomly allocated to the experimental group (6 weeks MIT or to the control group (6 weeks no intervention followed by 6 weeks MIT. Assessments were done at baseline (T1, after 6 weeks (T2, and 6 weeks later (T3. Efficacy was evaluated at T2 using univariable linear regression analyses. Outcome measures were chosen to examine several levels of therapy success: improvement on trained items, generalization to untrained items, and generalization to verbal communication. Of 17 included patients, 10 were allocated to the experimental condition and 7 to the control condition. MIT significantly improved repetition of trained items (β=13.32, p=.02. This effect did not remain stable at follow-up assessment. In contrast to earlier studies, we found only a limited and temporary effect of MIT, without generalization to untrained material or to functional communication. The results further suggest that the effect of MIT in chronic aphasia is more restricted than its effect in earlier stages post stroke. This

  16. Parent-child interaction in motor speech therapy.

    Science.gov (United States)

    Namasivayam, Aravind Kumar; Jethava, Vibhuti; Pukonen, Margit; Huynh, Anna; Goshulak, Debra; Kroll, Robert; van Lieshout, Pascal

    2018-01-01

    This study measures the reliability and sensitivity of a modified Parent-Child Interaction Observation scale (PCIOs) used to monitor the quality of parent-child interaction. The scale is part of a home-training program employed with direct motor speech intervention for children with speech sound disorders. Eighty-four preschool age children with speech sound disorders were provided either high- (2×/week/10 weeks) or low-intensity (1×/week/10 weeks) motor speech intervention. Clinicians completed the PCIOs at the beginning, middle, and end of treatment. Inter-rater reliability (Kappa scores) was determined by an independent speech-language pathologist who assessed videotaped sessions at the midpoint of the treatment block. Intervention sensitivity of the scale was evaluated using a Friedman test for each item and then followed up with Wilcoxon pairwise comparisons where appropriate. We obtained fair-to-good inter-rater reliability (Kappa = 0.33-0.64) for the PCIOs using only video-based scoring. Child-related items were more strongly influenced by differences in treatment intensity than parent-related items, where a greater number of sessions positively influenced parent learning of treatment skills and child behaviors. The adapted PCIOs is reliable and sensitive to monitor the quality of parent-child interactions in a 10-week block of motor speech intervention with adjunct home therapy. Implications for rehabilitation Parent-centered therapy is considered a cost effective method of speech and language service delivery. However, parent-centered models may be difficult to implement for treatments such as developmental motor speech interventions that require a high degree of skill and training. For children with speech sound disorders and motor speech difficulties, a translated and adapted version of the parent-child observation scale was found to be sufficiently reliable and sensitive to assess changes in the quality of the parent-child interactions during

  17. Intensive speech and language therapy in patients with chronic aphasia after stroke: a randomised, open-label, blinded-endpoint, controlled trial in a health-care setting.

    Science.gov (United States)

    Breitenstein, Caterina; Grewe, Tanja; Flöel, Agnes; Ziegler, Wolfram; Springer, Luise; Martus, Peter; Huber, Walter; Willmes, Klaus; Ringelstein, E Bernd; Haeusler, Karl Georg; Abel, Stefanie; Glindemann, Ralf; Domahs, Frank; Regenbrecht, Frank; Schlenck, Klaus-Jürgen; Thomas, Marion; Obrig, Hellmuth; de Langen, Ernst; Rocker, Roman; Wigbers, Franziska; Rühmkorf, Christina; Hempen, Indra; List, Jonathan; Baumgaertner, Annette

    2017-04-15

    adverse events during therapy or treatment deferral (one car accident [in the control group], two common cold [one patient per group], three gastrointestinal or cardiac symptoms [all intervention group], two recurrent stroke [one in intervention group before initiation of treatment, and one before group assignment had occurred]); all were unrelated to study participation. 3 weeks of intensive speech and language therapy significantly enhanced verbal communication in people aged 70 years or younger with chronic aphasia after stroke, providing an effective evidence-based treatment approach in this population. Future studies should examine the minimum treatment intensity required for meaningful treatment effects, and determine whether treatment effects cumulate over repeated intervention periods. German Federal Ministry of Education and Research and the German Society for Aphasia Research and Treatment. Copyright © 2017 Elsevier Ltd. All rights reserved.

  18. Predicting speech intelligibility in adverse conditions: evaluation of the speech-based envelope power spectrum model

    DEFF Research Database (Denmark)

    Jørgensen, Søren; Dau, Torsten

    2011-01-01

    conditions by comparing predictions to measured data from [Kjems et al. (2009). J. Acoust. Soc. Am. 126 (3), 1415-1426] where speech is mixed with four different interferers, including speech-shaped noise, bottle noise, car noise, and cafe noise. The model accounts well for the differences in intelligibility......The speech-based envelope power spectrum model (sEPSM) [Jørgensen and Dau (2011). J. Acoust. Soc. Am., 130 (3), 1475–1487] estimates the envelope signal-to-noise ratio (SNRenv) of distorted speech and accurately describes the speech recognition thresholds (SRT) for normal-hearing listeners...... observed for the different interferers. None of the standardized models successfully describe these data....

  19. Low-dimensional recurrent neural network-based Kalman filter for speech enhancement.

    Science.gov (United States)

    Xia, Youshen; Wang, Jun

    2015-07-01

    This paper proposes a new recurrent neural network-based Kalman filter for speech enhancement, based on a noise-constrained least squares estimate. The parameters of speech signal modeled as autoregressive process are first estimated by using the proposed recurrent neural network and the speech signal is then recovered from Kalman filtering. The proposed recurrent neural network is globally asymptomatically stable to the noise-constrained estimate. Because the noise-constrained estimate has a robust performance against non-Gaussian noise, the proposed recurrent neural network-based speech enhancement algorithm can minimize the estimation error of Kalman filter parameters in non-Gaussian noise. Furthermore, having a low-dimensional model feature, the proposed neural network-based speech enhancement algorithm has a much faster speed than two existing recurrent neural networks-based speech enhancement algorithms. Simulation results show that the proposed recurrent neural network-based speech enhancement algorithm can produce a good performance with fast computation and noise reduction. Copyright © 2015 Elsevier Ltd. All rights reserved.

  20. Improved Kalman Filter-Based Speech Enhancement with Perceptual Post-Filtering

    Institute of Scientific and Technical Information of China (English)

    WEIJianqiang; DULimin; YANZhaoli; ZENGHui

    2004-01-01

    In this paper, a Kalman filter-based speech enhancement algorithm with some improvements of previous work is presented. A new technique based on spectral subtraction is used for separation speech and noise characteristics from noisy speech and for the computation of speech and noise Autoregressive (AR) parameters. In order to obtain a Kalman filter output with high audible quality, a perceptual post-filter is placed at the output of the Kalman filter to smooth the enhanced speech spectra.Extensive experiments indicate that this newly proposed method works well.

  1. Study of accent-based music speech protocol development for improving voice problems in stroke patients with mixed dysarthria.

    Science.gov (United States)

    Kim, Soo Ji; Jo, Uiri

    2013-01-01

    Based on the anatomical and functional commonality between singing and speech, various types of musical elements have been employed in music therapy research for speech rehabilitation. This study was to develop an accent-based music speech protocol to address voice problems of stroke patients with mixed dysarthria. Subjects were 6 stroke patients with mixed dysarthria and they received individual music therapy sessions. Each session was conducted for 30 minutes and 12 sessions including pre- and post-test were administered for each patient. For examining the protocol efficacy, the measures of maximum phonation time (MPT), fundamental frequency (F0), average intensity (dB), jitter, shimmer, noise to harmonics ratio (NHR), and diadochokinesis (DDK) were compared between pre and post-test and analyzed with a paired sample t-test. The results showed that the measures of MPT, F0, dB, and sequential motion rates (SMR) were significantly increased after administering the protocol. Also, there were statistically significant differences in the measures of shimmer, and alternating motion rates (AMR) of the syllable /K$\\inve$/ between pre- and post-test. The results indicated that the accent-based music speech protocol may improve speech motor coordination including respiration, phonation, articulation, resonance, and prosody of patients with dysarthria. This suggests the possibility of utilizing the music speech protocol to maximize immediate treatment effects in the course of a long-term treatment for patients with dysarthria.

  2. A word by any other intonation: fMRI evidence for implicit memory traces for pitch contours of spoken words in adult brains.

    Directory of Open Access Journals (Sweden)

    Michael Inspector

    Full Text Available OBJECTIVES: Intonation may serve as a cue for facilitated recognition and processing of spoken words and it has been suggested that the pitch contour of spoken words is implicitly remembered. Thus, using the repetition suppression (RS effect of BOLD-fMRI signals, we tested whether the same spoken words are differentially processed in language and auditory brain areas depending on whether or not they retain an arbitrary intonation pattern. EXPERIMENTAL DESIGN: Words were presented repeatedly in three blocks for passive and active listening tasks. There were three prosodic conditions in each of which a different set of words was used and specific task-irrelevant intonation changes were applied: (i All words presented in a set flat monotonous pitch contour (ii Each word had an arbitrary pitch contour that was set throughout the three repetitions. (iii Each word had a different arbitrary pitch contour in each of its repetition. PRINCIPAL FINDINGS: The repeated presentations of words with a set pitch contour, resulted in robust behavioral priming effects as well as in significant RS of the BOLD signals in primary auditory cortex (BA 41, temporal areas (BA 21 22 bilaterally and in Broca's area. However, changing the intonation of the same words on each successive repetition resulted in reduced behavioral priming and the abolition of RS effects. CONCLUSIONS: Intonation patterns are retained in memory even when the intonation is task-irrelevant. Implicit memory traces for the pitch contour of spoken words were reflected in facilitated neuronal processing in auditory and language associated areas. Thus, the results lend support for the notion that prosody and specifically pitch contour is strongly associated with the memory representation of spoken words.

  3. English As A Foreign Language (Efl) Learning Through Classroom Interaction:An Investigation Of Participants’ Collaborative Use Of Speech Prosody In Classroom Activities In A Secondary Efl Classroom

    OpenAIRE

    Zhao, Xin

    2014-01-01

    Conversational prosody or tone of voice (e.g. intonation, pauses, speech rate etc.) plays an essential role in our daily communication. Research studies in various contexts have shown that prosody can function as an interactional device for the management of our social interaction (Hellermann, 2003, Wennerstrom, 2001, Wells and Macfarlane, 1998, Couper-Kuhlen, 1996). However, not much research focus has been given to the pedagogical implications of conversational prosody in classroom teaching...

  4. The Clinical Practice of Speech and Language Therapists with Children with Phonologically Based Speech Sound Disorders

    Science.gov (United States)

    Oliveira, Carla; Lousada, Marisa; Jesus, Luis M. T.

    2015-01-01

    Children with speech sound disorders (SSD) represent a large number of speech and language therapists' caseloads. The intervention with children who have SSD can involve different therapy approaches, and these may be articulatory or phonologically based. Some international studies reveal a widespread application of articulatory based approaches in…

  5. Japanese reported speech: Against a direct-indirect distinction

    NARCIS (Netherlands)

    Maier, E.; Ogata, N.

    2008-01-01

    English direct discourse is easily recognized by e.g. the lack of a complementizer, the quotation marks (or the intonational contour they induce), and verbatim ('shifted') pronouns. Japanese employs the same complementizer for all reports, does not have a consistent intonational quotation marking,

  6. Speech Clarity Index (Ψ): A Distance-Based Speech Quality Indicator and Recognition Rate Prediction for Dysarthric Speakers with Cerebral Palsy

    Science.gov (United States)

    Kayasith, Prakasith; Theeramunkong, Thanaruk

    It is a tedious and subjective task to measure severity of a dysarthria by manually evaluating his/her speech using available standard assessment methods based on human perception. This paper presents an automated approach to assess speech quality of a dysarthric speaker with cerebral palsy. With the consideration of two complementary factors, speech consistency and speech distinction, a speech quality indicator called speech clarity index (Ψ) is proposed as a measure of the speaker's ability to produce consistent speech signal for a certain word and distinguished speech signal for different words. As an application, it can be used to assess speech quality and forecast speech recognition rate of speech made by an individual dysarthric speaker before actual exhaustive implementation of an automatic speech recognition system for the speaker. The effectiveness of Ψ as a speech recognition rate predictor is evaluated by rank-order inconsistency, correlation coefficient, and root-mean-square of difference. The evaluations had been done by comparing its predicted recognition rates with ones predicted by the standard methods called the articulatory and intelligibility tests based on the two recognition systems (HMM and ANN). The results show that Ψ is a promising indicator for predicting recognition rate of dysarthric speech. All experiments had been done on speech corpus composed of speech data from eight normal speakers and eight dysarthric speakers.

  7. Spatiotemporal frequency characteristics of cerebral oscillations during the perception of fundamental frequency contour changes in one-syllable intonation.

    Science.gov (United States)

    Ueno, Sanae; Okumura, Eiichi; Remijn, Gerard B; Yoshimura, Yuko; Kikuchi, Mitsuru; Shitamichi, Kiyomi; Nagao, Kikuko; Mochiduki, Masayuki; Haruta, Yasuhiro; Hayashi, Norio; Munesue, Toshio; Tsubokawa, Tsunehisa; Oi, Manabu; Nakatani, Hideo; Higashida, Haruhiro; Minabe, Yoshio

    2012-05-02

    Accurate perception of fundamental frequency (F0) contour changes in the human voice is important for understanding a speaker's intonation, and consequently also his/her attitude. In this study, we investigated the neural processes involved in the perception of F0 contour changes in the Japanese one-syllable interjection "ne" in 21 native-Japanese listeners. A passive oddball paradigm was applied in which "ne" with a high falling F0 contour, used when urging a reaction from the listener, was randomly presented as a rare deviant among a frequent "ne" syllable with a flat F0 contour (i.e., meaningless intonation). We applied an adaptive spatial filtering method to the neuromagnetic time course recorded by whole-head magnetoencephalography (MEG) and estimated the spatiotemporal frequency dynamics of event-related cerebral oscillatory changes in the oddball paradigm. Our results demonstrated a significant elevation of beta band event-related desynchronization (ERD) in the right temporal and frontal areas, in time windows from 100 to 300 and from 300 to 500 ms after the onset of deviant stimuli (high falling F0 contour). This is the first study to reveal detailed spatiotemporal frequency characteristics of cerebral oscillations during the perception of intonational (not lexical) F0 contour changes in the human voice. The results further confirmed that the right hemisphere is associated with perception of intonational F0 contour information in the human voice, especially in early time windows. Copyright © 2012 Elsevier Ireland Ltd. All rights reserved.

  8. A Randomized, Rater-Blinded, Parallel Trial of Intensive Speech Therapy in Sub-Acute Post-Stroke Aphasia: The SP-I-R-IT Study

    Science.gov (United States)

    Martins, Isabel Pavao; Leal, Gabriela; Fonseca, Isabel; Farrajota, Luisa; Aguiar, Marta; Fonseca, Jose; Lauterbach, Martin; Goncalves, Luis; Cary, M. Carmo; Ferreira, Joaquim J.; Ferro, Jose M.

    2013-01-01

    Background: There is conflicting evidence regarding the benefits of intensive speech and language therapy (SLT), particularly because intensity is often confounded with total SLT provided. Aims: A two-centre, randomized, rater-blinded, parallel study was conducted to compare the efficacy of 100 h of SLT in a regular (RT) versus intensive (IT)…

  9. The Computational Processing of Intonational Prominence: A Functional Prosody Perspective

    OpenAIRE

    Nakatani, Christine Hisayo

    1997-01-01

    Intonational prominence, or accent, is a fundamental prosodic feature that is said to contribute to discourse meaning. This thesis outlines a new, computational theory of the discourse interpretation of prominence, from a FUNCTIONAL PROSODY perspective. Functional prosody makes the following two important assumptions: first, there is an aspect of prominence interpretation that centrally concerns discourse processes, namely the discourse focusing nature of prominence; and second, the role of p...

  10. Computer-based Programs in Speech Therapy of Dyslalia and Dyslexia- Dysgraphia

    Directory of Open Access Journals (Sweden)

    Mirela Danubianu

    2010-04-01

    Full Text Available During the last years, the researchers and therapists in speech therapy have been more and more concerned with the elaboration and use of computer programs in speech disorders therapy. The main objective of this study was to evaluate the therapeutic effectiveness of computer-based programs for the Romanian language in speech therapy. Along the study, we will present the experimental research through assessing the effectiveness of computer programs in the speech therapy for speech disorders: dyslalia, dyslexia and dysgraphia. Methodologically, the use of the computer in the therapeutic phases was carried out with the help of some computer-based programs (Logomon, Dislex-Test etc. that we elaborated and we experimented with during several years of therapeutic activity. The sample used in our experiments was composed of 120 subjects; two groups of 60 children with speech disorders were selected for both speech disorders: 30 for the experimental ('computer-based' group and 30 for the control ('classical method' group. The study hypotheses verified whether the results, obtained by the subjects within the experimental group, improved significantly after using the computer-based program, compared to the subjects within the control group, who did not use this program but got a classical therapy. The hypotheses were confirmed for the speech disorders included in this research; the conclusions of the study confirm the advantages of using computer-based programs within speech therapy by correcting these disorders, as well as due to the positive influence these programs have on the development of children’s personality.

  11. The Spanish of Ponce, Puerto Rico: A Phonetic, Phonological, and Intonational Analysis

    Science.gov (United States)

    Luna, Kenneth Vladimir

    2010-01-01

    This study investigates four aspects of Puerto Rican Spanish as represented in the Autonomous Municipality of Ponce: the behavior of coda /[alveolar flap]/, the behavior of /r/, the different realizations of coda /s/, and its intonational phonology. Previous studies on Puerto Rican Spanish report that coda /[alveolar flap]/ is normally realized as…

  12. Model based Binaural Enhancement of Voiced and Unvoiced Speech

    DEFF Research Database (Denmark)

    Kavalekalam, Mathew Shaji; Christensen, Mads Græsbøll; Boldt, Jesper B.

    2017-01-01

    This paper deals with the enhancement of speech in presence of non-stationary babble noise. A binaural speech enhancement framework is proposed which takes into account both the voiced and unvoiced speech production model. The usage of this model in enhancement requires the Short term predictor...... (STP) parameters and the pitch information to be estimated. This paper uses a codebook based approach for estimating the STP parameters and a parametric binaural method is proposed for estimating the pitch parameters. Improvements in objective score are shown when using the voicedunvoiced speech model...

  13. Identification of Changes along a Continuum of Speech Intonation is Impaired in Congenital Amusia.

    Science.gov (United States)

    Hutchins, Sean; Gosselin, Nathalie; Peretz, Isabelle

    2010-01-01

    A small number of individuals have severe musical problems that have neuro-genetic underpinnings. This musical disorder is termed "congenital amusia," an umbrella term for lifelong musical disabilities that cannot be attributed to deafness, lack of exposure, or brain damage after birth. Amusics seem to lack the ability to detect fine pitch differences in tone sequences. However, differences between statements and questions, which vary in final pitch, are well perceived by most congenital amusic individuals. We hypothesized that the origin of this apparent domain-specificity of the disorder lies in the range of pitch variations, which are very coarse in speech as compared to music. Here, we tested this hypothesis by using a continuum of gradually increasing final pitch in both speech and tone sequences. To this aim, nine amusic cases and nine matched controls were presented with statements and questions that varied on a pitch continuum from falling to rising in 11 steps. The sentences were either naturally spoken or were tone sequence versions of these. The task was to categorize the sentences as statements or questions and the tone sequences as falling or rising. In each case, the observation of an S-shaped identification function indicates that amusics can accurately identify unambiguous examples of statements and questions but have problems with fine variations between these endpoints. Thus, the results indicate that a deficient pitch perception might compromise music, not because it is specialized for that domain but because music requirements are more fine-grained.

  14. Deep neural network and noise classification-based speech enhancement

    Science.gov (United States)

    Shi, Wenhua; Zhang, Xiongwei; Zou, Xia; Han, Wei

    2017-07-01

    In this paper, a speech enhancement method using noise classification and Deep Neural Network (DNN) was proposed. Gaussian mixture model (GMM) was employed to determine the noise type in speech-absent frames. DNN was used to model the relationship between noisy observation and clean speech. Once the noise type was determined, the corresponding DNN model was applied to enhance the noisy speech. GMM was trained with mel-frequency cepstrum coefficients (MFCC) and the parameters were estimated with an iterative expectation-maximization (EM) algorithm. Noise type was updated by spectrum entropy-based voice activity detection (VAD). Experimental results demonstrate that the proposed method could achieve better objective speech quality and smaller distortion under stationary and non-stationary conditions.

  15. A homology sound-based algorithm for speech signal interference

    Science.gov (United States)

    Jiang, Yi-jiao; Chen, Hou-jin; Li, Ju-peng; Zhang, Zhan-song

    2015-12-01

    Aiming at secure analog speech communication, a homology sound-based algorithm for speech signal interference is proposed in this paper. We first split speech signal into phonetic fragments by a short-term energy method and establish an interference noise cache library with the phonetic fragments. Then we implement the homology sound interference by mixing the randomly selected interferential fragments and the original speech in real time. The computer simulation results indicated that the interference produced by this algorithm has advantages of real time, randomness, and high correlation with the original signal, comparing with the traditional noise interference methods such as white noise interference. After further studies, the proposed algorithm may be readily used in secure speech communication.

  16. L'entourage d'un Fait d'expression Comme Guide a la Traduction ...

    African Journals Online (AJOL)

    Speech is an act of communication. The speech environment consists of the context, situation of communication, mimicry and intonation which surround the act of enunciation resulting in meaning. The translator is always faced with choices to make, but the speech environment helps to delimit speech and enables him ...

  17. Under-resourced speech recognition based on the speech manifold

    CSIR Research Space (South Africa)

    Sahraeian, R

    2015-09-01

    Full Text Available Conventional acoustic modeling involves estimating many parameters to effectively model feature distributions. The sparseness of speech and text data, however, degrades the reliability of the estimation process and makes speech recognition a...

  18. Identification of changes along a continuum of speech intonation is impaired in congenital amusia

    Directory of Open Access Journals (Sweden)

    Sean eHutchins

    2010-12-01

    Full Text Available A small number of individuals have severe musical problems that have neuro-genetic underpinnings. This musical disorder is termed congenital amusia, an umbrella term for lifelong musical disabilities that cannot be attributed to deafness, lack of exposure, or brain damage after birth. Amusics seem to lack the ability to detect fine pitch differences in tone sequences. However, differences between statements and questions, which vary in final pitch, are well perceived by most congenital amusic individuals. We hypothesized that the origin of this apparent domain-specificity of the disorder lies in the range of pitch variations, which are very coarse in speech as compared to music. Here, we tested this hypothesis by using a continuum of gradually increasing final pitch in both speech and tone sequences. To this aim, nine amusic cases and nine matched controls were presented with statements and questions that varied on a pitch continuum from falling to rising in 11 steps. The sentences were either naturally spoken or were tone sequence versions of these. The task was to categorize the sentences as statements or questions and the tone sequences, as falling or rising. In each case, the observation of an S-shaped identification function indicates that amusics can accurately identify unambiguous examples of statements and questions but have problems with fine variations between these endpoints. Thus, the results indicate that a deficient pitch perception might compromise music, not because it is specialized for that domain but because music requirements are more fine-grained.

  19. A diphone-based speech-synthesis system for British English

    NARCIS (Netherlands)

    Pijper, de J.R.

    1987-01-01

    This article describes a keyboard-to-speech system for British English synthetic speech based on diphones. It concentrates on the development and composition of the diphone inventory and briefly describes a computer program which makes it possible to quickly concatenate diphones and synthesise

  20. Model-Based Synthesis of Visual Speech Movements from 3D Video

    Directory of Open Access Journals (Sweden)

    Edge JamesD

    2009-01-01

    Full Text Available We describe a method for the synthesis of visual speech movements using a hybrid unit selection/model-based approach. Speech lip movements are captured using a 3D stereo face capture system and split up into phonetic units. A dynamic parameterisation of this data is constructed which maintains the relationship between lip shapes and velocities; within this parameterisation a model of how lips move is built and is used in the animation of visual speech movements from speech audio input. The mapping from audio parameters to lip movements is disambiguated by selecting only the most similar stored phonetic units to the target utterance during synthesis. By combining properties of model-based synthesis (e.g., HMMs, neural nets with unit selection we improve the quality of our speech synthesis.

  1. Computational speech segregation based on an auditory-inspired modulation analysis

    DEFF Research Database (Denmark)

    May, Tobias; Dau, Torsten

    2014-01-01

    A monaural speech segregation system is presented that estimates the ideal binary mask from noisy speech based on the supervised learning of amplitude modulation spectrogram (AMS) features. Instead of using linearly scaled modulation filters with constant absolute bandwidth, an auditory- inspired...... about speech activity present in neighboring time-frequency units. In order to evaluate the generalization performance of the system to unseen acoustic conditions, the speech segregation system is trained with a limited set of low signal-to-noise ratio (SNR) conditions, but tested over a wide range...

  2. INTEGRATING MACHINE TRANSLATION AND SPEECH SYNTHESIS COMPONENT FOR ENGLISH TO DRAVIDIAN LANGUAGE SPEECH TO SPEECH TRANSLATION SYSTEM

    Directory of Open Access Journals (Sweden)

    J. SANGEETHA

    2015-02-01

    Full Text Available This paper provides an interface between the machine translation and speech synthesis system for converting English speech to Tamil text in English to Tamil speech to speech translation system. The speech translation system consists of three modules: automatic speech recognition, machine translation and text to speech synthesis. Many procedures for incorporation of speech recognition and machine translation have been projected. Still speech synthesis system has not yet been measured. In this paper, we focus on integration of machine translation and speech synthesis, and report a subjective evaluation to investigate the impact of speech synthesis, machine translation and the integration of machine translation and speech synthesis components. Here we implement a hybrid machine translation (combination of rule based and statistical machine translation and concatenative syllable based speech synthesis technique. In order to retain the naturalness and intelligibility of synthesized speech Auto Associative Neural Network (AANN prosody prediction is used in this work. The results of this system investigation demonstrate that the naturalness and intelligibility of the synthesized speech are strongly influenced by the fluency and correctness of the translated text.

  3. The Suitability of Cloud-Based Speech Recognition Engines for Language Learning

    Science.gov (United States)

    Daniels, Paul; Iwago, Koji

    2017-01-01

    As online automatic speech recognition (ASR) engines become more accurate and more widely implemented with call software, it becomes important to evaluate the effectiveness and the accuracy of these recognition engines using authentic speech samples. This study investigates two of the most prominent cloud-based speech recognition engines--Apple's…

  4. Comparing grapheme-based and phoneme-based speech recognition for Afrikaans

    CSIR Research Space (South Africa)

    Basson, WD

    2012-11-01

    Full Text Available This paper compares the recognition accuracy of a phoneme-based automatic speech recognition system with that of a grapheme-based system, using Afrikaans as case study. The first system is developed using a conventional pronunciation dictionary...

  5. Designing the Database of Speech Under Stress

    Directory of Open Access Journals (Sweden)

    Sabo Róbert

    2017-12-01

    Full Text Available This study describes the methodology used for designing a database of speech under real stress. Based on limits of existing stress databases, we used a communication task via a computer game to collect speech data. To validate the presence of stress, known psychophysiological indicators such as heart rate and electrodermal activity, as well as subjective self-assessment were used. This paper presents the data from first 5 speakers (3 men, 2 women who participated in initial tests of the proposed design. In 4 out of 5 speakers increases in fundamental frequency and intensity of speech were registered. Similarly, in 4 out of 5 speakers heart rate was significantly increased during the task, when compared with reference measurement from before the task. These first results show that proposed design might be appropriate for building a speech under stress database. However, there are still considerations that need to be addressed.

  6. Prayeras the Subject of Speech Activity and its Communicative Intension

    OpenAIRE

    Khomulenko Tamara; Kuznetsov Oleksiy; Fomenko Karyna

    2018-01-01

    The article reveals the phenomenon of prayer as a form of communication between individual and God. It is shown that prayer as a subject of speech activity is the leading form of religious discourse. Based on the analysis of the definitions and classifications of prayer in psychological and psycholinguistic studies, the author concluded that the communicative intention of prayer is highly variable and emphasized the significant heuristic potential of the psychodiagnostic method of prayer for ...

  7. Aristotle on Sentence Types and Forms of Speech

    Directory of Open Access Journals (Sweden)

    Gábor Bolonyai

    2005-12-01

    Full Text Available According to the Hermeneutics, Ch. 4, the analysis of non-assertoric sentences such as wishes, commands, etc. belongs to rhetoric or poetics. They are, however, examined neither in the Rhetoric nor in the Poetics, where (Ch. 20 their treatment. is explicitly excluded from the art of poetry and referred to that of delivery or performance. The paper gives an explanation for this discrepancy, based on an interpretation of Aristotle's rejection of Protagoras' criticism of Homer. The sophist found fault with the first line of the Iliad where Homer invokes the Muse by the imperative Menin aeide, thea thus uttering a command while believing that he is expressing a prayer. Aristotle's grounds for rejecting this criticism remain implicit, but it appears very likely that he thought that, if uttered or performed in the right manner, the sentence could he taken as a prayer. From this observation, which is certainly valid in this particular case, he drew the conclusion that performative or vocal features in themselves, i.e. rhythm, intonation and volume of sound, are always sufficient to identify particular „figures of speech“, as he calls non-assertoric sentence types in the Poetics. This conclusion is, however, not entirely justified. Performative features are not always enough to differentiate between two `figures of speech'; the possible range of verbal moods and sentence types is likewise determined by morphological marks (e.g. mood signs, syntactical features (word-order, and lexical items (certain adverbs or particles. Aristotle’s decision to dismiss figures of speech altogether from the field of lexis may also have contributed to the later development of keeping linguistics and theory of style apart as two separate branches of inquiry.

  8. Dysprosody nonassociated with neurological diseases--a case report.

    Science.gov (United States)

    Pinto, José Antonio; Corso, Renato José; Guilherme, Ana Cláudia Rocha; Pinho, Sílvia Rebelo; Nóbrega, Monica de Oliveira

    2004-03-01

    Dysprosody also known as pseudo-foreign dialect, is the rarest neurological speech disorder. It is characterized by alterations in intensity, in the timing of utterance segments, and in rhythm, cadency, and intonation of words. The terms refers to changes as to duration, fundamental frequency, and intensity of tonic and atonic syllables of the sentences spoken, which deprive an individual's particular speech of its characteristics. The cause of this disease is usually associated with neurological pathologies such as brain vascular accidents, cranioencephalic traumatisms, and brain tumors. The authors report a case of dysprosody attended to at the Núcleo de Otorrinolaringologia e Cirurgia de Cabeça e Pescoço de São Paulo (NOSP). It is about a female patient with bilateral III degree Reinke's edema and normal neurological examinations that started presenting characteristics of the German dialect following a larynx microsurgery.

  9. The Speech multi features fusion perceptual hash algorithm based on tensor decomposition

    Science.gov (United States)

    Huang, Y. B.; Fan, M. H.; Zhang, Q. Y.

    2018-03-01

    With constant progress in modern speech communication technologies, the speech data is prone to be attacked by the noise or maliciously tampered. In order to make the speech perception hash algorithm has strong robustness and high efficiency, this paper put forward a speech perception hash algorithm based on the tensor decomposition and multi features is proposed. This algorithm analyses the speech perception feature acquires each speech component wavelet packet decomposition. LPCC, LSP and ISP feature of each speech component are extracted to constitute the speech feature tensor. Speech authentication is done by generating the hash values through feature matrix quantification which use mid-value. Experimental results showing that the proposed algorithm is robust for content to maintain operations compared with similar algorithms. It is able to resist the attack of the common background noise. Also, the algorithm is highly efficiency in terms of arithmetic, and is able to meet the real-time requirements of speech communication and complete the speech authentication quickly.

  10. Model-Based Speech Signal Coding Using Optimized Temporal Decomposition for Storage and Broadcasting Applications

    Science.gov (United States)

    Athaudage, Chandranath R. N.; Bradley, Alan B.; Lech, Margaret

    2003-12-01

    A dynamic programming-based optimization strategy for a temporal decomposition (TD) model of speech and its application to low-rate speech coding in storage and broadcasting is presented. In previous work with the spectral stability-based event localizing (SBEL) TD algorithm, the event localization was performed based on a spectral stability criterion. Although this approach gave reasonably good results, there was no assurance on the optimality of the event locations. In the present work, we have optimized the event localizing task using a dynamic programming-based optimization strategy. Simulation results show that an improved TD model accuracy can be achieved. A methodology of incorporating the optimized TD algorithm within the standard MELP speech coder for the efficient compression of speech spectral information is also presented. The performance evaluation results revealed that the proposed speech coding scheme achieves 50%-60% compression of speech spectral information with negligible degradation in the decoded speech quality.

  11. Speech-based Class Attendance

    Science.gov (United States)

    Faizel Amri, Umar; Nur Wahidah Nik Hashim, Nik; Hazrin Hany Mohamad Hanif, Noor

    2017-11-01

    In the department of engineering, students are required to fulfil at least 80 percent of class attendance. Conventional method requires student to sign his/her initial on the attendance sheet. However, this method is prone to cheating by having another student signing for their fellow classmate that is absent. We develop our hypothesis according to a verse in the Holy Qur’an (95:4), “We have created men in the best of mould”. Based on the verse, we believe each psychological characteristic of human being is unique and thus, their speech characteristic should be unique. In this paper we present the development of speech biometric-based attendance system. The system requires user’s voice to be installed in the system as trained data and it is saved in the system for registration of the user. The following voice of the user will be the test data in order to verify with the trained data stored in the system. The system uses PSD (Power Spectral Density) and Transition Parameter as the method for feature extraction of the voices. Euclidean and Mahalanobis distances are used in order to verified the user’s voice. For this research, ten subjects of five females and five males were chosen to be tested for the performance of the system. The system performance in term of recognition rate is found to be 60% correct identification of individuals.

  12. Epoch-based analysis of speech signals

    Indian Academy of Sciences (India)

    on speech production characteristics, but also helps in accurate analysis of speech. .... include time delay estimation, speech enhancement from single and multi- ...... log. (. E[k]. ∑K−1 l=0. E[l]. ) ,. (7) where K is the number of samples in the ...

  13. A Novel AMR-WB Speech Steganography Based on Diameter-Neighbor Codebook Partition

    Directory of Open Access Journals (Sweden)

    Junhui He

    2018-01-01

    Full Text Available Steganography is a means of covert communication without revealing the occurrence and the real purpose of communication. The adaptive multirate wideband (AMR-WB is a widely adapted format in mobile handsets and is also the recommended speech codec for VoLTE. In this paper, a novel AMR-WB speech steganography is proposed based on diameter-neighbor codebook partition algorithm. Different embedding capacity may be achieved by adjusting the iterative parameters during codebook division. The experimental results prove that the presented AMR-WB steganography may provide higher and flexible embedding capacity without inducing perceptible distortion compared with the state-of-the-art methods. With 48 iterations of cluster merging, twice the embedding capacity of complementary-neighbor-vertices-based embedding method may be obtained with a decrease of only around 2% in speech quality and much the same undetectability. Moreover, both the quality of stego speech and the security regarding statistical steganalysis are better than the recent speech steganography based on neighbor-index-division codebook partition.

  14. Automated recognition of helium speech. Phase I: Investigation of microprocessor based analysis/synthesis system

    Science.gov (United States)

    Jelinek, H. J.

    1986-01-01

    This is the Final Report of Electronic Design Associates on its Phase I SBIR project. The purpose of this project is to develop a method for correcting helium speech, as experienced in diver-surface communication. The goal of the Phase I study was to design, prototype, and evaluate a real time helium speech corrector system based upon digital signal processing techniques. The general approach was to develop hardware (an IBM PC board) to digitize helium speech and software (a LAMBDA computer based simulation) to translate the speech. As planned in the study proposal, this initial prototype may now be used to assess expected performance from a self contained real time system which uses an identical algorithm. The Final Report details the work carried out to produce the prototype system. Four major project tasks were: a signal processing scheme for converting helium speech to normal sounding speech was generated. The signal processing scheme was simulated on a general purpose (LAMDA) computer. Actual helium speech was supplied to the simulation and the converted speech was generated. An IBM-PC based 14 bit data Input/Output board was designed and built. A bibliography of references on speech processing was generated.

  15. Pitch modelling for the Nguni languages

    CSIR Research Space (South Africa)

    Govender, N

    2007-06-01

    Full Text Available Govender ngovender@csir.co.za, Etienne Barnard ebarnard@csir.co.za, Marelie Davel mdavel@csir.co.za by varying the levels of pitch, intensity and duration in the voice. An overview of intonation as observed in a variety of languages is provided in [1... nature of laryngograph data in voiced speech) and thus either could be used as the basis for the experiments. The pitch values extracted by Yin for all the laryngograph databases was consequently used as the basis for our comparisons. Pitch...

  16. Speech Intelligibility Prediction Based on Mutual Information

    DEFF Research Database (Denmark)

    Jensen, Jesper; Taal, Cees H.

    2014-01-01

    This paper deals with the problem of predicting the average intelligibility of noisy and potentially processed speech signals, as observed by a group of normal hearing listeners. We propose a model which performs this prediction based on the hypothesis that intelligibility is monotonically related...... to the mutual information between critical-band amplitude envelopes of the clean signal and the corresponding noisy/processed signal. The resulting intelligibility predictor turns out to be a simple function of the mean-square error (mse) that arises when estimating a clean critical-band amplitude using...... a minimum mean-square error (mmse) estimator based on the noisy/processed amplitude. The proposed model predicts that speech intelligibility cannot be improved by any processing of noisy critical-band amplitudes. Furthermore, the proposed intelligibility predictor performs well ( ρ > 0.95) in predicting...

  17. Auditory analysis for speech recognition based on physiological models

    Science.gov (United States)

    Jeon, Woojay; Juang, Biing-Hwang

    2004-05-01

    To address the limitations of traditional cepstrum or LPC based front-end processing methods for automatic speech recognition, more elaborate methods based on physiological models of the human auditory system may be used to achieve more robust speech recognition in adverse environments. For this purpose, a modified version of a model of the primary auditory cortex featuring a three dimensional mapping of auditory spectra [Wang and Shamma, IEEE Trans. Speech Audio Process. 3, 382-395 (1995)] is adopted and investigated for its use as an improved front-end processing method. The study is conducted in two ways: first, by relating the model's redundant representation to traditional spectral representations and showing that the former not only encompasses information provided by the latter, but also reveals more relevant information that makes it superior in describing the identifying features of speech signals; and second, by observing the statistical features of the representation for various classes of sound to show how different identifying features manifest themselves as specific patterns on the cortical map, thereby becoming a place-coded data set on which detection theory could be applied to simulate auditory perception and cognition.

  18. Prediction of speech intelligibility based on a correlation metric in the envelope power spectrum domain

    DEFF Research Database (Denmark)

    Relano-Iborra, Helia; May, Tobias; Zaar, Johannes

    A powerful tool to investigate speech perception is the use of speech intelligibility prediction models. Recently, a model was presented, termed correlation-based speechbased envelope power spectrum model (sEPSMcorr) [1], based on the auditory processing of the multi-resolution speech-based Envel...

  19. ІНТОНАЦІЯ ЯК ЗАСІБ ОФОРМЛЕННЯ ОКЛИЧНИХ РЕЧЕНЬ / INTONATION AS A MEANS OF EXPRESSION OF EXCLAMATION SENTENCES

    Directory of Open Access Journals (Sweden)

    Галина НАВЧУК

    2016-03-01

    Full Text Available In the article the tone as an equal language tool that is involved in the transmission side of semantic expression as a fundamental unit of speech communication, because it is a natural form of existence of language, a way of expression. It is generalized that exclamation sentence intonation is a means of conveying complex emotions, subtle shades experiences. That is, as a means of execution, it performs more functions, clarifies and specifies the content of the statement by expressing particular emotions. Motivated speech that tone together with a sound basis carries the material form of the four basic types of information: 1 emotional; 2 physiological; 3 semantic; 4 linguistic. The author analyzed examples of different types of intonation expression in exclamation sentences. It is demonstrated that diversity in speech intonation types and kinds of exclamation sentences are determined by the specific situation of communication. It proves that intonation according to its particular expression and cause expresses those goals and objectives, which a speaker poses, his mood, the type of relationship that is developed between the interlocutors, the culture of the speaker, his emotional state, etc. The idea is substantiated that the tone of exclamation sentences for the most part only reinforces verbalized lexical, morphological and syntactic language means feeling, without being the only means of expression. It is established that its role in substantially increasing sentences that form and content are not the same, and in the interpretation non-divided structures, rhetorical questions, and content decoding statements. It was found that in the implementation of the communication content sentence it is not less important than to have a logical emphasis that focuses on the most important word in terms address. However, the nature of the logical stress influences the structure of tonal expression. Мовознавство. Українська

  20. Speech Silicon: An FPGA Architecture for Real-Time Hidden Markov-Model-Based Speech Recognition

    Directory of Open Access Journals (Sweden)

    Schuster Jeffrey

    2006-01-01

    Full Text Available This paper examines the design of an FPGA-based system-on-a-chip capable of performing continuous speech recognition on medium sized vocabularies in real time. Through the creation of three dedicated pipelines, one for each of the major operations in the system, we were able to maximize the throughput of the system while simultaneously minimizing the number of pipeline stalls in the system. Further, by implementing a token-passing scheme between the later stages of the system, the complexity of the control was greatly reduced and the amount of active data present in the system at any time was minimized. Additionally, through in-depth analysis of the SPHINX 3 large vocabulary continuous speech recognition engine, we were able to design models that could be efficiently benchmarked against a known software platform. These results, combined with the ability to reprogram the system for different recognition tasks, serve to create a system capable of performing real-time speech recognition in a vast array of environments.

  1. Speech Silicon: An FPGA Architecture for Real-Time Hidden Markov-Model-Based Speech Recognition

    Directory of Open Access Journals (Sweden)

    Alex K. Jones

    2006-11-01

    Full Text Available This paper examines the design of an FPGA-based system-on-a-chip capable of performing continuous speech recognition on medium sized vocabularies in real time. Through the creation of three dedicated pipelines, one for each of the major operations in the system, we were able to maximize the throughput of the system while simultaneously minimizing the number of pipeline stalls in the system. Further, by implementing a token-passing scheme between the later stages of the system, the complexity of the control was greatly reduced and the amount of active data present in the system at any time was minimized. Additionally, through in-depth analysis of the SPHINX 3 large vocabulary continuous speech recognition engine, we were able to design models that could be efficiently benchmarked against a known software platform. These results, combined with the ability to reprogram the system for different recognition tasks, serve to create a system capable of performing real-time speech recognition in a vast array of environments.

  2. Texting while driving: is speech-based text entry less risky than handheld text entry?

    Science.gov (United States)

    He, J; Chaparro, A; Nguyen, B; Burge, R J; Crandall, J; Chaparro, B; Ni, R; Cao, S

    2014-11-01

    Research indicates that using a cell phone to talk or text while maneuvering a vehicle impairs driving performance. However, few published studies directly compare the distracting effects of texting using a hands-free (i.e., speech-based interface) versus handheld cell phone, which is an important issue for legislation, automotive interface design and driving safety training. This study compared the effect of speech-based versus handheld text entries on simulated driving performance by asking participants to perform a car following task while controlling the duration of a secondary text-entry task. Results showed that both speech-based and handheld text entries impaired driving performance relative to the drive-only condition by causing more variation in speed and lane position. Handheld text entry also increased the brake response time and increased variation in headway distance. Text entry using a speech-based cell phone was less detrimental to driving performance than handheld text entry. Nevertheless, the speech-based text entry task still significantly impaired driving compared to the drive-only condition. These results suggest that speech-based text entry disrupts driving, but reduces the level of performance interference compared to text entry with a handheld device. In addition, the difference in the distraction effect caused by speech-based and handheld text entry is not simply due to the difference in task duration. Copyright © 2014 Elsevier Ltd. All rights reserved.

  3. Towards Real-Time Speech Emotion Recognition for Affective E-Learning

    Science.gov (United States)

    Bahreini, Kiavash; Nadolski, Rob; Westera, Wim

    2016-01-01

    This paper presents the voice emotion recognition part of the FILTWAM framework for real-time emotion recognition in affective e-learning settings. FILTWAM (Framework for Improving Learning Through Webcams And Microphones) intends to offer timely and appropriate online feedback based upon learner's vocal intonations and facial expressions in order…

  4. Short-term effect of short, intensive speech therapy on articulation and resonance in Ugandan patients with cleft (lip and) palate

    NARCIS (Netherlands)

    Anke Luyten; H. Vermeersch; A. Hodges; K. Bettens; K. van Lierde; G. Galiwango

    2016-01-01

    Objectives: The purpose of the current study was to assess the short-term effectiveness of short and intensive speech therapy provided to patients with cleft (lip and) palate (C(L)P) in terms of articulation and resonance. Methods: Five Ugandan patients (age: 7.3-19.6 years) with non-syndromic C(L)P

  5. Effects of irrelevant speech and traffic noise on speech perception and cognitive performance in elementary school children.

    Science.gov (United States)

    Klatte, Maria; Meis, Markus; Sukowski, Helga; Schick, August

    2007-01-01

    The effects of background noise of moderate intensity on short-term storage and processing of verbal information were analyzed in 6 to 8 year old children. In line with adult studies on "irrelevant sound effect" (ISE), serial recall of visually presented digits was severely disrupted by background speech that the children did not understand. Train noises of equal Intensity however, had no effect. Similar results were demonstrated with tasks requiring storage and processing of heard information. Memory for nonwords, execution of oral instructions and categorizing speech sounds were significantly disrupted by irrelevant speech. The affected functions play a fundamental role in the acquisition of spoken and written language. Implications concerning current models of the ISE and the acoustic conditions in schools and kindergardens are discussed.

  6. Speech Compression

    Directory of Open Access Journals (Sweden)

    Jerry D. Gibson

    2016-06-01

    Full Text Available Speech compression is a key technology underlying digital cellular communications, VoIP, voicemail, and voice response systems. We trace the evolution of speech coding based on the linear prediction model, highlight the key milestones in speech coding, and outline the structures of the most important speech coding standards. Current challenges, future research directions, fundamental limits on performance, and the critical open problem of speech coding for emergency first responders are all discussed.

  7. Man machine interface based on speech recognition

    International Nuclear Information System (INIS)

    Jorge, Carlos A.F.; Aghina, Mauricio A.C.; Mol, Antonio C.A.; Pereira, Claudio M.N.A.

    2007-01-01

    This work reports the development of a Man Machine Interface based on speech recognition. The system must recognize spoken commands, and execute the desired tasks, without manual interventions of operators. The range of applications goes from the execution of commands in an industrial plant's control room, to navigation and interaction in virtual environments. Results are reported for isolated word recognition, the isolated words corresponding to the spoken commands. For the pre-processing stage, relevant parameters are extracted from the speech signals, using the cepstral analysis technique, that are used for isolated word recognition, and corresponds to the inputs of an artificial neural network, that performs recognition tasks. (author)

  8. Kalman filter for speech enhancement in cocktail party scenarios using a codebook-based approach

    DEFF Research Database (Denmark)

    Kavalekalam, Mathew Shaji; Christensen, Mads Græsbøll; Gran, Fredrik

    2016-01-01

    Enhancement of speech in non-stationary background noise is a challenging task, and conventional single channel speech enhancement algorithms have not been able to improve the speech intelligibility in such scenarios. The work proposed in this paper investigates a single channel Kalman filter based...... trained codebook over a generic speech codebook in relation to the performance of the speech enhancement system....

  9. Real-Time Control of an Articulatory-Based Speech Synthesizer for Brain Computer Interfaces.

    Directory of Open Access Journals (Sweden)

    Florent Bocquelet

    2016-11-01

    Full Text Available Restoring natural speech in paralyzed and aphasic people could be achieved using a Brain-Computer Interface (BCI controlling a speech synthesizer in real-time. To reach this goal, a prerequisite is to develop a speech synthesizer producing intelligible speech in real-time with a reasonable number of control parameters. We present here an articulatory-based speech synthesizer that can be controlled in real-time for future BCI applications. This synthesizer converts movements of the main speech articulators (tongue, jaw, velum, and lips into intelligible speech. The articulatory-to-acoustic mapping is performed using a deep neural network (DNN trained on electromagnetic articulography (EMA data recorded on a reference speaker synchronously with the produced speech signal. This DNN is then used in both offline and online modes to map the position of sensors glued on different speech articulators into acoustic parameters that are further converted into an audio signal using a vocoder. In offline mode, highly intelligible speech could be obtained as assessed by perceptual evaluation performed by 12 listeners. Then, to anticipate future BCI applications, we further assessed the real-time control of the synthesizer by both the reference speaker and new speakers, in a closed-loop paradigm using EMA data recorded in real time. A short calibration period was used to compensate for differences in sensor positions and articulatory differences between new speakers and the reference speaker. We found that real-time synthesis of vowels and consonants was possible with good intelligibility. In conclusion, these results open to future speech BCI applications using such articulatory-based speech synthesizer.

  10. The Error Is the Clue: Breakdown In Human-Machine Interaction

    National Research Council Canada - National Science Library

    Martinovsky, Bilyana; Traum, David

    2006-01-01

    .... Human reactions to these irritating features typically appear in the following order: tiredness, tolerance, anger, confusion, irony, humor, exhaustion, uncertainty, lack of desire to communicate. The studied features of human expressions of irritation in nonface- to-face interaction are: intonation, emphatic speech, elliptic speech, speed of speech, extra-linguistic signs, speed of verbal action, and overlap.

  11. Stable 1-Norm Error Minimization Based Linear Predictors for Speech Modeling

    DEFF Research Database (Denmark)

    Giacobello, Daniele; Christensen, Mads Græsbøll; Jensen, Tobias Lindstrøm

    2014-01-01

    In linear prediction of speech, the 1-norm error minimization criterion has been shown to provide a valid alternative to the 2-norm minimization criterion. However, unlike 2-norm minimization, 1-norm minimization does not guarantee the stability of the corresponding all-pole filter and can generate...... saturations when this is used to synthesize speech. In this paper, we introduce two new methods to obtain intrinsically stable predictors with the 1-norm minimization. The first method is based on constraining the roots of the predictor to lie within the unit circle by reducing the numerical range...... based linear prediction for modeling and coding of speech....

  12. Successful and rapid response of speech bulb reduction program combined with speech therapy in velopharyngeal dysfunction: a case report.

    Science.gov (United States)

    Shin, Yu-Jeong; Ko, Seung-O

    2015-12-01

    Velopharyngeal dysfunction in cleft palate patients following the primary palate repair may result in nasal air emission, hypernasality, articulation disorder and poor intelligibility of speech. Among conservative treatment methods, speech aid prosthesis combined with speech therapy is widely used method. However because of its long time of treatment more than a year and low predictability, some clinicians prefer a surgical intervention. Thus, the purpose of this report was to increase an attention on the effectiveness of speech aid prosthesis by introducing a case that was successfully treated. In this clinical report, speech bulb reduction program with intensive speech therapy was applied for a patient with velopharyngeal dysfunction and it was rapidly treated by 5months which was unusually short period for speech aid therapy. Furthermore, advantages of pre-operative speech aid therapy were discussed.

  13. Pitch peak alignment in an outside focus realization : Frisian-Dutch bilingual speakers’ declarative and imperative intonation patterns

    NARCIS (Netherlands)

    Nota, Amber; Hilton, Nanna; Coler, Matthew

    2016-01-01

    This paper investigates intonational pitch variations and pitch peak alignment in declarative sentences and is part of a larger study of declarative, interrogative and imperative grammatical constructions in the Frisian-Dutch contact situation. Frisian is a minority language spoken in the province

  14. A computational comparison of theory and practice of scale intonation in Byzantine chant

    DEFF Research Database (Denmark)

    Panteli, Maria; Purwins, Hendrik

    2013-01-01

    Byzantine Chant performance practice is quantitatively compared to the Chrysanthine theory. The intonation of scale degrees is quantified, based on pitch class profiles. An analysis procedure is introduced that consists of the following steps: 1) Pitch class histograms are calculated via non-parametric...... kernel smoothing. 2) Histogram peaks are detected. 3) Phrase ending analysis aids the finding of the tonic to align histogram peaks. 4) The theoretical scale degrees are mapped to the practical ones. 5) A schema of statistical tests detects significant deviations of theoretical scale tuning from...... the estimated ones in performance practice. The analysis of 94 echoi shows a tendency of the singer to level theoretic particularities of the echos that stand out of the general norm in the octoechos: theoretically extremely large scale steps are diminished in performance....

  15. Transcription of Russian intonation ToRI, an interactive research tool and learning module on the internet

    NARCIS (Netherlands)

    Odé, C.

    2008-01-01

    This article discusses a new system for the Transcription of Russian Intonation, ToRI, on the Internet. Section 1 presents a general outline of the system. The terminology used in ToRI is defined in an online glossary, from which Section 2 gives the following examples: pitch accent and

  16. Atendimento fonoaudiológico intensivo em pacientes operados de fissura labiopalatina: relato de casos Intensive speech therapy in patients operated for cleft lip and palate: case report

    Directory of Open Access Journals (Sweden)

    Maria do Rosário Ferreira Lima

    2007-09-01

    Full Text Available Devido à carência de fonoaudiólogos para atendimento ao paciente com fissura labiopalatina em várias regiões do Brasil, novos programas de atendimento devem ser desenvolvidos para esses indivíduos. A terapia intensiva de fala tem sido relatada na literatura como uma modalidade alternativa. Este trabalho relata a experiência com alguns casos de atendimento fonoaudiológico intensivo, e compara o desempenho na produção da fala de quatro pacientes operados de fissura palatina, antes e após a terapia fonoaudiológica intensiva. Foram atendidos, no período de férias escolares, três adultos e um adolescente que apresentavam distúrbios articulatórios compensatórios. O atendimento teve duração de três horas diárias para cada paciente, durante dez dias, divididos em terapia individual e em grupo. No início e fim do período de terapia, os pacientes foram avaliados por uma fonoaudióloga que não participou dos atendimentos. Também foi gravada em vídeo uma amostra de fala espontânea, contagem de 1 a 20 e repetição de uma lista de palavras e frases com fonemas oclusivos orais e fricativos. Todos os pacientes mostraram evolução satisfatória na terapia intensiva, com adequação dos fonemas trabalhados na fala dirigida, necessitando ainda de acompanhamento fonoterápico para sua automatização. A terapia intensiva mostrou ser uma alternativa eficaz e viável nesses casos, podendo também ser uma estratégia durante o início do tratamento fonoaudiológico convencional.Due to the lack of speech therapists at various regions of Brazil to assist patients with cleft lip and palate, new intervention programs must be developed for these individuals. Intensive speech therapy has been cited in literature as an alternative modality. This article relates the experience of four cleft lip patients, comparing their speech performances before and after the intensive intervention. The subjects, three adults and one adolescent with compensatory

  17. Exploring Australian speech-language pathologists' use and perceptions ofnon-speech oral motor exercises.

    Science.gov (United States)

    Rumbach, Anna F; Rose, Tanya A; Cheah, Mynn

    2018-01-29

    To explore Australian speech-language pathologists' use of non-speech oral motor exercises, and rationales for using/not using non-speech oral motor exercises in clinical practice. A total of 124 speech-language pathologists practising in Australia, working with paediatric and/or adult clients with speech sound difficulties, completed an online survey. The majority of speech-language pathologists reported that they did not use non-speech oral motor exercises when working with paediatric or adult clients with speech sound difficulties. However, more than half of the speech-language pathologists working with adult clients who have dysarthria reported using non-speech oral motor exercises with this population. The most frequently reported rationale for using non-speech oral motor exercises in speech sound difficulty management was to improve awareness/placement of articulators. The majority of speech-language pathologists agreed there is no clear clinical or research evidence base to support non-speech oral motor exercise use with clients who have speech sound difficulties. This study provides an overview of Australian speech-language pathologists' reported use and perceptions of non-speech oral motor exercises' applicability and efficacy in treating paediatric and adult clients who have speech sound difficulties. The research findings provide speech-language pathologists with insight into how and why non-speech oral motor exercises are currently used, and adds to the knowledge base regarding Australian speech-language pathology practice of non-speech oral motor exercises in the treatment of speech sound difficulties. Implications for Rehabilitation Non-speech oral motor exercises refer to oral motor activities which do not involve speech, but involve the manipulation or stimulation of oral structures including the lips, tongue, jaw, and soft palate. Non-speech oral motor exercises are intended to improve the function (e.g., movement, strength) of oral structures. The

  18. Predicting speech intelligibility in conditions with nonlinearly processed noisy speech

    DEFF Research Database (Denmark)

    Jørgensen, Søren; Dau, Torsten

    2013-01-01

    The speech-based envelope power spectrum model (sEPSM; [1]) was proposed in order to overcome the limitations of the classical speech transmission index (STI) and speech intelligibility index (SII). The sEPSM applies the signal-tonoise ratio in the envelope domain (SNRenv), which was demonstrated...... to successfully predict speech intelligibility in conditions with nonlinearly processed noisy speech, such as processing with spectral subtraction. Moreover, a multiresolution version (mr-sEPSM) was demonstrated to account for speech intelligibility in various conditions with stationary and fluctuating...

  19. Recruitment of Language-, Emotion- and Speech-Timing Associated Brain Regions for Expressing Emotional Prosody: Investigation of Functional Neuroanatomy with fMRI.

    Science.gov (United States)

    Mitchell, Rachel L C; Jazdzyk, Agnieszka; Stets, Manuela; Kotz, Sonja A

    2016-01-01

    We aimed to progress understanding of prosodic emotion expression by establishing brain regions active when expressing specific emotions, those activated irrespective of the target emotion, and those whose activation intensity varied depending on individual performance. BOLD contrast data were acquired whilst participants spoke non-sense words in happy, angry or neutral tones, or performed jaw-movements. Emotion-specific analyses demonstrated that when expressing angry prosody, activated brain regions included the inferior frontal and superior temporal gyri, the insula, and the basal ganglia. When expressing happy prosody, the activated brain regions also included the superior temporal gyrus, insula, and basal ganglia, with additional activation in the anterior cingulate. Conjunction analysis confirmed that the superior temporal gyrus and basal ganglia were activated regardless of the specific emotion concerned. Nevertheless, disjunctive comparisons between the expression of angry and happy prosody established that anterior cingulate activity was significantly higher for angry prosody than for happy prosody production. Degree of inferior frontal gyrus activity correlated with the ability to express the target emotion through prosody. We conclude that expressing prosodic emotions (vs. neutral intonation) requires generic brain regions involved in comprehending numerous aspects of language, emotion-related processes such as experiencing emotions, and in the time-critical integration of speech information.

  20. Recruitment of language-, emotion- and speech timing associated brain regions for expressing emotional prosody: Investigation of functional neuroanatomy with fMRI.

    Directory of Open Access Journals (Sweden)

    Rachel L. C. Mitchell

    2016-10-01

    Full Text Available We aimed to progress understanding of prosodic emotion expression by establishing brain regions active when expressing specific emotions, those activated irrespective of the target emotion, and those whose activation intensity varied depending on individual performance. BOLD contrast data were acquired whilst participants spoke nonsense words in happy, angry or neutral tones, or performed jaw-movements. Emotion-specific analyses demonstrated that when expressing angry prosody, activated brain regions included the inferior frontal and superior temporal gyri, the insula, and the basal ganglia. When expressing happy prosody, the activated brain regions also included the superior temporal gyrus, insula, and basal ganglia, with additional activation in the anterior cingulate. Conjunction analysis confirmed that the superior temporal gyrus and basal ganglia were activated regardless of the specific emotion concerned. Nevertheless, disjunctive comparisons between the expression of angry and happy prosody established that anterior cingulate activity was significantly higher for angry prosody than for happy prosody production. Degree of inferior frontal gyrus activity correlated with the ability to express the target emotion through prosody. We conclude that expressing prosodic emotions (vs neutral intonation requires generic brain regions involved in comprehending numerous aspects of language, emotion-related processes such as experiencing emotions, and in the time-critical integration of speech information.

  1. Neural correlates of quality perception for complex speech signals

    CERN Document Server

    Antons, Jan-Niklas

    2015-01-01

    This book interconnects two essential disciplines to study the perception of speech: Neuroscience and Quality of Experience, which to date have rarely been used together for the purposes of research on speech quality perception. In five key experiments, the book demonstrates the application of standard clinical methods in neurophysiology on the one hand, and of methods used in fields of research concerned with speech quality perception on the other. Using this combination, the book shows that speech stimuli with different lengths and different quality impairments are accompanied by physiological reactions related to quality variations, e.g., a positive peak in an event-related potential. Furthermore, it demonstrates that – in most cases – quality impairment intensity has an impact on the intensity of physiological reactions.

  2. Teaching intonation to deaf persons through visual displays

    NARCIS (Netherlands)

    Spaai, G.W.G.; Elsendoorn, B.A.G.; Coninx, F.

    1993-01-01

    The broken auditory feedback loop of a deaf person inhibits the monitoring of his or her speech. This, in turn, frequently creates deficiencies in the segmental and suprasegmental aspects of speech. The incorrect suprasegmental aspects of speech, for example, often involve the generation of

  3. Relationship between Speech Intelligibility and Speech Comprehension in Babble Noise

    Science.gov (United States)

    Fontan, Lionel; Tardieu, Julien; Gaillard, Pascal; Woisard, Virginie; Ruiz, Robert

    2015-01-01

    Purpose: The authors investigated the relationship between the intelligibility and comprehension of speech presented in babble noise. Method: Forty participants listened to French imperative sentences (commands for moving objects) in a multitalker babble background for which intensity was experimentally controlled. Participants were instructed to…

  4. Typical versus delayed speech onset influences verbal reporting of autistic interests.

    Science.gov (United States)

    Chiodo, Liliane; Majerus, Steve; Mottron, Laurent

    2017-01-01

    The distinction between autism and Asperger syndrome has been abandoned in the DSM-5. However, this clinical categorization largely overlaps with the presence or absence of a speech onset delay which is associated with clinical, cognitive, and neural differences. It is unknown whether these different speech development pathways and associated cognitive differences are involved in the heterogeneity of the restricted interests that characterize autistic adults. This study tested the hypothesis that speech onset delay, or conversely, early mastery of speech, orients the nature and verbal reporting of adult autistic interests. The occurrence of a priori defined descriptors for perceptual and thematic dimensions were determined, as well as the perceived function and benefits, in the response of autistic people to a semi-structured interview on their intense interests. The number of words, grammatical categories, and proportion of perceptual / thematic descriptors were computed and compared between groups by variance analyses. The participants comprised 40 autistic adults grouped according to the presence ( N  = 20) or absence ( N  = 20) of speech onset delay, as well as 20 non-autistic adults, also with intense interests, matched for non-verbal intelligence using Raven's Progressive Matrices. The overall nature, function, and benefit of intense interests were similar across autistic subgroups, and between autistic and non-autistic groups. However, autistic participants with a history of speech onset delay used more perceptual than thematic descriptors when talking about their interests, whereas the opposite was true for autistic individuals without speech onset delay. This finding remained significant after controlling for linguistic differences observed between the two groups. Verbal reporting, but not the nature or positive function, of intense interests differed between adult autistic individuals depending on their speech acquisition history: oral reporting of

  5. From where to what: a neuroanatomically based evolutionary model of the emergence of speech in humans [version 2; referees: 1 approved, 2 approved with reservations

    Directory of Open Access Journals (Sweden)

    Oren Poliva

    2016-01-01

    Full Text Available In the brain of primates, the auditory cortex connects with the frontal lobe via the temporal pole (auditory ventral stream; AVS and via the inferior parietal lobe (auditory dorsal stream; ADS. The AVS is responsible for sound recognition, and the ADS for sound-localization, voice detection and integration of calls with faces. I propose that the primary role of the ADS in non-human primates is the detection and response to contact calls. These calls are exchanged between tribe members (e.g., mother-offspring and are used for monitoring location. Detection of contact calls occurs by the ADS identifying a voice, localizing it, and verifying that the corresponding face is out of sight. Once a contact call is detected, the primate produces a contact call in return via descending connections from the frontal lobe to a network of limbic and brainstem regions. Because the ADS of present day humans also performs speech production, I further propose an evolutionary course for the transition from contact call exchange to an early form of speech. In accordance with this model, structural changes to the ADS endowed early members of the genus Homo with partial vocal control. This development was beneficial as it enabled offspring to modify their contact calls with intonations for signaling high or low levels of distress to their mother. Eventually, individuals were capable of participating in yes-no question-answer conversations. In these conversations the offspring emitted a low-level distress call for inquiring about the safety of objects (e.g., food, and his/her mother responded with a high- or low-level distress call to signal approval or disapproval of the interaction. Gradually, the ADS and its connections with brainstem motor regions became more robust and vocal control became more volitional. Speech emerged once vocal control was sufficient for inventing novel calls.

  6. From where to what: a neuroanatomically based evolutionary model of the emergence of speech in humans [version 3; referees: 1 approved, 2 approved with reservations

    Directory of Open Access Journals (Sweden)

    Oren Poliva

    2017-09-01

    Full Text Available In the brain of primates, the auditory cortex connects with the frontal lobe via the temporal pole (auditory ventral stream; AVS and via the inferior parietal lobe (auditory dorsal stream; ADS. The AVS is responsible for sound recognition, and the ADS for sound-localization, voice detection and integration of calls with faces. I propose that the primary role of the ADS in non-human primates is the detection and response to contact calls. These calls are exchanged between tribe members (e.g., mother-offspring and are used for monitoring location. Detection of contact calls occurs by the ADS identifying a voice, localizing it, and verifying that the corresponding face is out of sight. Once a contact call is detected, the primate produces a contact call in return via descending connections from the frontal lobe to a network of limbic and brainstem regions. Because the ADS of present day humans also performs speech production, I further propose an evolutionary course for the transition from contact call exchange to an early form of speech. In accordance with this model, structural changes to the ADS endowed early members of the genus Homo with partial vocal control. This development was beneficial as it enabled offspring to modify their contact calls with intonations for signaling high or low levels of distress to their mother. Eventually, individuals were capable of participating in yes-no question-answer conversations. In these conversations the offspring emitted a low-level distress call for inquiring about the safety of objects (e.g., food, and his/her mother responded with a high- or low-level distress call to signal approval or disapproval of the interaction. Gradually, the ADS and its connections with brainstem motor regions became more robust and vocal control became more volitional. Speech emerged once vocal control was sufficient for inventing novel calls.

  7. Speech emotion recognition based on statistical pitch model

    Institute of Scientific and Technical Information of China (English)

    WANG Zhiping; ZHAO Li; ZOU Cairong

    2006-01-01

    A modified Parzen-window method, which keep high resolution in low frequencies and keep smoothness in high frequencies, is proposed to obtain statistical model. Then, a gender classification method utilizing the statistical model is proposed, which have a 98% accuracy of gender classification while long sentence is dealt with. By separation the male voice and female voice, the mean and standard deviation of speech training samples with different emotion are used to create the corresponding emotion models. Then the Bhattacharyya distance between the test sample and statistical models of pitch, are utilized for emotion recognition in speech.The normalization of pitch for the male voice and female voice are also considered, in order to illustrate them into a uniform space. Finally, the speech emotion recognition experiment based on K Nearest Neighbor shows that, the correct rate of 81% is achieved, where it is only 73.85%if the traditional parameters are utilized.

  8. Perceptual evaluation of corpus-based speech synthesis techniques in under-resourced environments

    CSIR Research Space (South Africa)

    Van Niekerk, DR

    2009-11-01

    Full Text Available With the increasing prominence and maturity of corpus-based techniques for speech synthesis, the process of system development has in some ways been simplified considerably. However, the dependence on sufficient amounts of relevant speech data...

  9. Proměny v produkci a recepci řeči v pozdním věku

    Czech Academy of Sciences Publication Activity Database

    Müllerová, Olga

    2007-01-01

    Roč. 16, - (2007), s. 107-115 ISSN 1230-2287 R&D Projects: GA ČR GA405/03/0133; GA ČR GA405/06/1468 Institutional research plan: CEZ:AV0Z90610518 Keywords : communication * production of speech * reception of speech * pronunciation * intonation Subject RIV: AI - Linguistics

  10. A Decision-Tree-Based Algorithm for Speech/Music Classification and Segmentation

    Directory of Open Access Journals (Sweden)

    Lavner Yizhar

    2009-01-01

    Full Text Available We present an efficient algorithm for segmentation of audio signals into speech or music. The central motivation to our study is consumer audio applications, where various real-time enhancements are often applied. The algorithm consists of a learning phase and a classification phase. In the learning phase, predefined training data is used for computing various time-domain and frequency-domain features, for speech and music signals separately, and estimating the optimal speech/music thresholds, based on the probability density functions of the features. An automatic procedure is employed to select the best features for separation. In the test phase, initial classification is performed for each segment of the audio signal, using a three-stage sieve-like approach, applying both Bayesian and rule-based methods. To avoid erroneous rapid alternations in the classification, a smoothing technique is applied, averaging the decision on each segment with past segment decisions. Extensive evaluation of the algorithm, on a database of more than 12 hours of speech and more than 22 hours of music showed correct identification rates of 99.4% and 97.8%, respectively, and quick adjustment to alternating speech/music sections. In addition to its accuracy and robustness, the algorithm can be easily adapted to different audio types, and is suitable for real-time operation.

  11. [Non-speech oral motor treatment efficacy for children with developmental speech sound disorders].

    Science.gov (United States)

    Ygual-Fernandez, A; Cervera-Merida, J F

    2016-01-01

    In the treatment of speech disorders by means of speech therapy two antagonistic methodological approaches are applied: non-verbal ones, based on oral motor exercises (OME), and verbal ones, which are based on speech processing tasks with syllables, phonemes and words. In Spain, OME programmes are called 'programas de praxias', and are widely used and valued by speech therapists. To review the studies conducted on the effectiveness of OME-based treatments applied to children with speech disorders and the theoretical arguments that could justify, or not, their usefulness. Over the last few decades evidence has been gathered about the lack of efficacy of this approach to treat developmental speech disorders and pronunciation problems in populations without any neurological alteration of motor functioning. The American Speech-Language-Hearing Association has advised against its use taking into account the principles of evidence-based practice. The knowledge gathered to date on motor control shows that the pattern of mobility and its corresponding organisation in the brain are different in speech and other non-verbal functions linked to nutrition and breathing. Neither the studies on their effectiveness nor the arguments based on motor control studies recommend the use of OME-based programmes for the treatment of pronunciation problems in children with developmental language disorders.

  12. Automatic speech signal segmentation based on the innovation adaptive filter

    Directory of Open Access Journals (Sweden)

    Makowski Ryszard

    2014-06-01

    Full Text Available Speech segmentation is an essential stage in designing automatic speech recognition systems and one can find several algorithms proposed in the literature. It is a difficult problem, as speech is immensely variable. The aim of the authors’ studies was to design an algorithm that could be employed at the stage of automatic speech recognition. This would make it possible to avoid some problems related to speech signal parametrization. Posing the problem in such a way requires the algorithm to be capable of working in real time. The only such algorithm was proposed by Tyagi et al., (2006, and it is a modified version of Brandt’s algorithm. The article presents a new algorithm for unsupervised automatic speech signal segmentation. It performs segmentation without access to information about the phonetic content of the utterances, relying exclusively on second-order statistics of a speech signal. The starting point for the proposed method is time-varying Schur coefficients of an innovation adaptive filter. The Schur algorithm is known to be fast, precise, stable and capable of rapidly tracking changes in second order signal statistics. A transfer from one phoneme to another in the speech signal always indicates a change in signal statistics caused by vocal track changes. In order to allow for the properties of human hearing, detection of inter-phoneme boundaries is performed based on statistics defined on the mel spectrum determined from the reflection coefficients. The paper presents the structure of the algorithm, defines its properties, lists parameter values, describes detection efficiency results, and compares them with those for another algorithm. The obtained segmentation results, are satisfactory.

  13. Patient-Reported Voice and Speech Outcomes After Whole-Neck Intensity Modulated Radiation Therapy and Chemotherapy for Oropharyngeal Cancer: Prospective Longitudinal Study

    Energy Technology Data Exchange (ETDEWEB)

    Vainshtein, Jeffrey M. [Department of Radiation Oncology, University of Michigan, Ann Arbor, Michigan (United States); Griffith, Kent A. [Center for Cancer Biostatistics, University of Michigan School of Public Health, Ann Arbor, Michigan (United States); Feng, Felix Y.; Vineberg, Karen A. [Department of Radiation Oncology, University of Michigan, Ann Arbor, Michigan (United States); Chepeha, Douglas B. [Department of Otolaryngology–Head and Neck Surgery, University of Michigan, Ann Arbor, Michigan (United States); Eisbruch, Avraham, E-mail: eisbruch@umich.edu [Department of Radiation Oncology, University of Michigan, Ann Arbor, Michigan (United States)

    2014-08-01

    Purpose: To describe voice and speech quality changes and their predictors in patients with locally advanced oropharyngeal cancer treated on prospective clinical studies of organ-preserving chemotherapy–intensity modulated radiation therapy (chemo-IMRT). Methods and Materials: Ninety-one patients with stage III/IV oropharyngeal cancer were treated on 2 consecutive prospective studies of definitive chemoradiation using whole-field IMRT from 2003 to 2011. Patient-reported voice and speech quality were longitudinally assessed from before treatment through 24 months using the Communication Domain of the Head and Neck Quality of Life (HNQOL-C) instrument and the Speech question of the University of Washington Quality of Life (UWQOL-S) instrument, respectively. Factors associated with patient-reported voice quality worsening from baseline and speech impairment were assessed. Results: Voice quality decreased maximally at 1 month, with 68% and 41% of patients reporting worse HNQOL-C and UWQOL-S scores compared with before treatment, and improved thereafter, recovering to baseline by 12-18 months on average. In contrast, observer-rated larynx toxicity was rare (7% at 3 months; 5% at 6 months). Among patients with mean glottic larynx (GL) dose ≤20 Gy, >20-30 Gy, >30-40 Gy, >40-50 Gy, and >50 Gy, 10%, 32%, 25%, 30%, and 63%, respectively, reported worse voice quality at 12 months compared with before treatment (P=.011). Results for speech impairment were similar. Glottic larynx dose, N stage, neck dissection, oral cavity dose, and time since chemo-IMRT were univariately associated with either voice worsening or speech impairment. On multivariate analysis, mean GL dose remained independently predictive for both voice quality worsening (8.1%/Gy) and speech impairment (4.3%/Gy). Conclusions: Voice quality worsening and speech impairment after chemo-IMRT for locally advanced oropharyngeal cancer were frequently reported by patients, underrecognized by clinicians, and

  14. Patient-reported voice and speech outcomes after whole-neck intensity modulated radiation therapy and chemotherapy for oropharyngeal cancer: prospective longitudinal study.

    Science.gov (United States)

    Vainshtein, Jeffrey M; Griffith, Kent A; Feng, Felix Y; Vineberg, Karen A; Chepeha, Douglas B; Eisbruch, Avraham

    2014-08-01

    To describe voice and speech quality changes and their predictors in patients with locally advanced oropharyngeal cancer treated on prospective clinical studies of organ-preserving chemotherapy-intensity modulated radiation therapy (chemo-IMRT). Ninety-one patients with stage III/IV oropharyngeal cancer were treated on 2 consecutive prospective studies of definitive chemoradiation using whole-field IMRT from 2003 to 2011. Patient-reported voice and speech quality were longitudinally assessed from before treatment through 24 months using the Communication Domain of the Head and Neck Quality of Life (HNQOL-C) instrument and the Speech question of the University of Washington Quality of Life (UWQOL-S) instrument, respectively. Factors associated with patient-reported voice quality worsening from baseline and speech impairment were assessed. Voice quality decreased maximally at 1 month, with 68% and 41% of patients reporting worse HNQOL-C and UWQOL-S scores compared with before treatment, and improved thereafter, recovering to baseline by 12-18 months on average. In contrast, observer-rated larynx toxicity was rare (7% at 3 months; 5% at 6 months). Among patients with mean glottic larynx (GL) dose ≤20 Gy, >20-30 Gy, >30-40 Gy, >40-50 Gy, and >50 Gy, 10%, 32%, 25%, 30%, and 63%, respectively, reported worse voice quality at 12 months compared with before treatment (P=.011). Results for speech impairment were similar. Glottic larynx dose, N stage, neck dissection, oral cavity dose, and time since chemo-IMRT were univariately associated with either voice worsening or speech impairment. On multivariate analysis, mean GL dose remained independently predictive for both voice quality worsening (8.1%/Gy) and speech impairment (4.3%/Gy). Voice quality worsening and speech impairment after chemo-IMRT for locally advanced oropharyngeal cancer were frequently reported by patients, underrecognized by clinicians, and independently associated with GL dose. These findings support

  15. Patient-Reported Voice and Speech Outcomes After Whole-Neck Intensity Modulated Radiation Therapy and Chemotherapy for Oropharyngeal Cancer: Prospective Longitudinal Study

    International Nuclear Information System (INIS)

    Vainshtein, Jeffrey M.; Griffith, Kent A.; Feng, Felix Y.; Vineberg, Karen A.; Chepeha, Douglas B.; Eisbruch, Avraham

    2014-01-01

    Purpose: To describe voice and speech quality changes and their predictors in patients with locally advanced oropharyngeal cancer treated on prospective clinical studies of organ-preserving chemotherapy–intensity modulated radiation therapy (chemo-IMRT). Methods and Materials: Ninety-one patients with stage III/IV oropharyngeal cancer were treated on 2 consecutive prospective studies of definitive chemoradiation using whole-field IMRT from 2003 to 2011. Patient-reported voice and speech quality were longitudinally assessed from before treatment through 24 months using the Communication Domain of the Head and Neck Quality of Life (HNQOL-C) instrument and the Speech question of the University of Washington Quality of Life (UWQOL-S) instrument, respectively. Factors associated with patient-reported voice quality worsening from baseline and speech impairment were assessed. Results: Voice quality decreased maximally at 1 month, with 68% and 41% of patients reporting worse HNQOL-C and UWQOL-S scores compared with before treatment, and improved thereafter, recovering to baseline by 12-18 months on average. In contrast, observer-rated larynx toxicity was rare (7% at 3 months; 5% at 6 months). Among patients with mean glottic larynx (GL) dose ≤20 Gy, >20-30 Gy, >30-40 Gy, >40-50 Gy, and >50 Gy, 10%, 32%, 25%, 30%, and 63%, respectively, reported worse voice quality at 12 months compared with before treatment (P=.011). Results for speech impairment were similar. Glottic larynx dose, N stage, neck dissection, oral cavity dose, and time since chemo-IMRT were univariately associated with either voice worsening or speech impairment. On multivariate analysis, mean GL dose remained independently predictive for both voice quality worsening (8.1%/Gy) and speech impairment (4.3%/Gy). Conclusions: Voice quality worsening and speech impairment after chemo-IMRT for locally advanced oropharyngeal cancer were frequently reported by patients, underrecognized by clinicians, and

  16. Online Speech/Music Segmentation Based on the Variance Mean of Filter Bank Energy

    Directory of Open Access Journals (Sweden)

    Zdravko Kačič

    2009-01-01

    Full Text Available This paper presents a novel feature for online speech/music segmentation based on the variance mean of filter bank energy (VMFBE. The idea that encouraged the feature's construction is energy variation in a narrow frequency sub-band. The energy varies more rapidly, and to a greater extent for speech than for music. Therefore, an energy variance in such a sub-band is greater for speech than for music. The radio broadcast database and the BNSI broadcast news database were used for feature discrimination and segmentation ability evaluation. The calculation procedure of the VMFBE feature has 4 out of 6 steps in common with the MFCC feature calculation procedure. Therefore, it is a very convenient speech/music discriminator for use in real-time automatic speech recognition systems based on MFCC features, because valuable processing time can be saved, and computation load is only slightly increased. Analysis of the feature's speech/music discriminative ability shows an average error rate below 10% for radio broadcast material and it outperforms other features used for comparison, by more than 8%. The proposed feature as a stand-alone speech/music discriminator in a segmentation system achieves an overall accuracy of over 94% on radio broadcast material.

  17. Online Speech/Music Segmentation Based on the Variance Mean of Filter Bank Energy

    Science.gov (United States)

    Kos, Marko; Grašič, Matej; Kačič, Zdravko

    2009-12-01

    This paper presents a novel feature for online speech/music segmentation based on the variance mean of filter bank energy (VMFBE). The idea that encouraged the feature's construction is energy variation in a narrow frequency sub-band. The energy varies more rapidly, and to a greater extent for speech than for music. Therefore, an energy variance in such a sub-band is greater for speech than for music. The radio broadcast database and the BNSI broadcast news database were used for feature discrimination and segmentation ability evaluation. The calculation procedure of the VMFBE feature has 4 out of 6 steps in common with the MFCC feature calculation procedure. Therefore, it is a very convenient speech/music discriminator for use in real-time automatic speech recognition systems based on MFCC features, because valuable processing time can be saved, and computation load is only slightly increased. Analysis of the feature's speech/music discriminative ability shows an average error rate below 10% for radio broadcast material and it outperforms other features used for comparison, by more than 8%. The proposed feature as a stand-alone speech/music discriminator in a segmentation system achieves an overall accuracy of over 94% on radio broadcast material.

  18. Enhancing Speech Recognition Using Improved Particle Swarm Optimization Based Hidden Markov Model

    Directory of Open Access Journals (Sweden)

    Lokesh Selvaraj

    2014-01-01

    Full Text Available Enhancing speech recognition is the primary intention of this work. In this paper a novel speech recognition method based on vector quantization and improved particle swarm optimization (IPSO is suggested. The suggested methodology contains four stages, namely, (i denoising, (ii feature mining (iii, vector quantization, and (iv IPSO based hidden Markov model (HMM technique (IP-HMM. At first, the speech signals are denoised using median filter. Next, characteristics such as peak, pitch spectrum, Mel frequency Cepstral coefficients (MFCC, mean, standard deviation, and minimum and maximum of the signal are extorted from the denoised signal. Following that, to accomplish the training process, the extracted characteristics are given to genetic algorithm based codebook generation in vector quantization. The initial populations are created by selecting random code vectors from the training set for the codebooks for the genetic algorithm process and IP-HMM helps in doing the recognition. At this point the creativeness will be done in terms of one of the genetic operation crossovers. The proposed speech recognition technique offers 97.14% accuracy.

  19. The Neural Bases of Difficult Speech Comprehension and Speech Production: Two Activation Likelihood Estimation (ALE) Meta-Analyses

    Science.gov (United States)

    Adank, Patti

    2012-01-01

    The role of speech production mechanisms in difficult speech comprehension is the subject of on-going debate in speech science. Two Activation Likelihood Estimation (ALE) analyses were conducted on neuroimaging studies investigating difficult speech comprehension or speech production. Meta-analysis 1 included 10 studies contrasting comprehension…

  20. Infant speech-sound discrimination testing: effects of stimulus intensity and procedural model on measures of performance.

    Science.gov (United States)

    Nozza, R J

    1987-06-01

    Performance of infants in a speech-sound discrimination task (/ba/ vs /da/) was measured at three stimulus intensity levels (50, 60, and 70 dB SPL) using the operant head-turn procedure. The procedure was modified so that data could be treated as though from a single-interval (yes-no) procedure, as is commonly done, as well as if from a sustained attention (vigilance) task. Discrimination performance changed significantly with increase in intensity, suggesting caution in the interpretation of results from infant discrimination studies in which only single stimulus intensity levels within this range are used. The assumptions made about the underlying methodological model did not change the performance-intensity relationships. However, infants demonstrated response decrement, typical of vigilance tasks, which supports the notion that the head-turn procedure is represented best by the vigilance model. Analysis then was done according to a method designed for tasks with undefined observation intervals [C. S. Watson and T. L. Nichols, J. Acoust. Soc. Am. 59, 655-668 (1976)]. Results reveal that, while group data are reasonably well represented across levels of difficulty by the fixed-interval model, there is a variation in performance as a function of time following trial onset that could lead to underestimation of performance in some cases.

  1. Developing a corpus of spoken language variability

    Science.gov (United States)

    Carmichael, Lesley; Wright, Richard; Wassink, Alicia Beckford

    2003-10-01

    We are developing a novel, searchable corpus as a research tool for investigating phonetic and phonological phenomena across various speech styles. Five speech styles have been well studied independently in previous work: reduced (casual), careful (hyperarticulated), citation (reading), Lombard effect (speech in noise), and ``motherese'' (child-directed speech). Few studies to date have collected a wide range of styles from a single set of speakers, and fewer yet have provided publicly available corpora. The pilot corpus includes recordings of (1) a set of speakers participating in a variety of tasks designed to elicit the five speech styles, and (2) casual peer conversations and wordlists to illustrate regional vowels. The data include high-quality recordings and time-aligned transcriptions linked to text files that can be queried. Initial measures drawn from the database provide comparison across speech styles along the following acoustic dimensions: MLU (changes in unit duration); relative intra-speaker intensity changes (mean and dynamic range); and intra-speaker pitch values (minimum, maximum, mean, range). The corpus design will allow for a variety of analyses requiring control of demographic and style factors, including hyperarticulation variety, disfluencies, intonation, discourse analysis, and detailed spectral measures.

  2. Temporal modulations in speech and music.

    Science.gov (United States)

    Ding, Nai; Patel, Aniruddh D; Chen, Lin; Butler, Henry; Luo, Cheng; Poeppel, David

    2017-10-01

    Speech and music have structured rhythms. Here we discuss a major acoustic correlate of spoken and musical rhythms, the slow (0.25-32Hz) temporal modulations in sound intensity and compare the modulation properties of speech and music. We analyze these modulations using over 25h of speech and over 39h of recordings of Western music. We show that the speech modulation spectrum is highly consistent across 9 languages (including languages with typologically different rhythmic characteristics). A different, but similarly consistent modulation spectrum is observed for music, including classical music played by single instruments of different types, symphonic, jazz, and rock. The temporal modulations of speech and music show broad but well-separated peaks around 5 and 2Hz, respectively. These acoustically dominant time scales may be intrinsic features of speech and music, a possibility which should be investigated using more culturally diverse samples in each domain. Distinct modulation timescales for speech and music could facilitate their perceptual analysis and its neural processing. Copyright © 2017 Elsevier Ltd. All rights reserved.

  3. Intelligibility of speech of children with speech and sound disorders

    OpenAIRE

    Ivetac, Tina

    2014-01-01

    The purpose of this study is to examine speech intelligibility of children with primary speech and sound disorders aged 3 to 6 years in everyday life. The research problem is based on the degree to which parents or guardians, immediate family members (sister, brother, grandparents), extended family members (aunt, uncle, cousin), child's friends, other acquaintances, child's teachers and strangers understand the speech of children with speech sound disorders. We examined whether the level ...

  4. Practical speech user interface design

    CERN Document Server

    Lewis, James R

    2010-01-01

    Although speech is the most natural form of communication between humans, most people find using speech to communicate with machines anything but natural. Drawing from psychology, human-computer interaction, linguistics, and communication theory, Practical Speech User Interface Design provides a comprehensive yet concise survey of practical speech user interface (SUI) design. It offers practice-based and research-based guidance on how to design effective, efficient, and pleasant speech applications that people can really use. Focusing on the design of speech user interfaces for IVR application

  5. Speech masking and cancelling and voice obscuration

    Science.gov (United States)

    Holzrichter, John F.

    2013-09-10

    A non-acoustic sensor is used to measure a user's speech and then broadcasts an obscuring acoustic signal diminishing the user's vocal acoustic output intensity and/or distorting the voice sounds making them unintelligible to persons nearby. The non-acoustic sensor is positioned proximate or contacting a user's neck or head skin tissue for sensing speech production information.

  6. Enhancement of speech signals - with a focus on voiced speech models

    DEFF Research Database (Denmark)

    Nørholm, Sidsel Marie

    This thesis deals with speech enhancement, i.e., noise reduction in speech signals. This has applications in, e.g., hearing aids and teleconference systems. We consider a signal-driven approach to speech enhancement where a model of the speech is assumed and filters are generated based...... on this model. The basic model used in this thesis is the harmonic model which is a commonly used model for describing the voiced part of the speech signal. We show that it can be beneficial to extend the model to take inharmonicities or the non-stationarity of speech into account. Extending the model...

  7. Logopenic and nonfluent variants of primary progressive aphasia are differentiated by acoustic measures of speech production.

    Directory of Open Access Journals (Sweden)

    Kirrie J Ballard

    Full Text Available Differentiation of logopenic (lvPPA and nonfluent/agrammatic (nfvPPA variants of Primary Progressive Aphasia is important yet remains challenging since it hinges on expert based evaluation of speech and language production. In this study acoustic measures of speech in conjunction with voxel-based morphometry were used to determine the success of the measures as an adjunct to diagnosis and to explore the neural basis of apraxia of speech in nfvPPA. Forty-one patients (21 lvPPA, 20 nfvPPA were recruited from a consecutive sample with suspected frontotemporal dementia. Patients were diagnosed using the current gold-standard of expert perceptual judgment, based on presence/absence of particular speech features during speaking tasks. Seventeen healthy age-matched adults served as controls. MRI scans were available for 11 control and 37 PPA cases; 23 of the PPA cases underwent amyloid ligand PET imaging. Measures, corresponding to perceptual features of apraxia of speech, were periods of silence during reading and relative vowel duration and intensity in polysyllable word repetition. Discriminant function analyses revealed that a measure of relative vowel duration differentiated nfvPPA cases from both control and lvPPA cases (r(2 = 0.47 with 88% agreement with expert judgment of presence of apraxia of speech in nfvPPA cases. VBM analysis showed that relative vowel duration covaried with grey matter intensity in areas critical for speech motor planning and programming: precentral gyrus, supplementary motor area and inferior frontal gyrus bilaterally, only affected in the nfvPPA group. This bilateral involvement of frontal speech networks in nfvPPA potentially affects access to compensatory mechanisms involving right hemisphere homologues. Measures of silences during reading also discriminated the PPA and control groups, but did not increase predictive accuracy. Findings suggest that a measure of relative vowel duration from of a polysyllable word

  8. Binaural speech enhancement using a codebook based approach

    DEFF Research Database (Denmark)

    Kavalekalam, Mathew Shaji; Christensen, Mads Græsbøll; Boldt, Jesper B.

    2016-01-01

    term predictor (STP) parameters using a codebook based approach, when we have access to binaural noisy signals. The estimated STP parameters are subsequently used for enhancement in a dual channel scenario. Objective measures indicate, that the proposed method is able to improve the speech...

  9. Speech-Based Information Retrieval for Digital Libraries

    National Research Council Canada - National Science Library

    Oard, Douglas W

    1997-01-01

    Libraries and archives collect recorded speech and multimedia objects that contain recorded speech, and such material may comprise a substantial portion of the collection in future digital libraries...

  10. Neural Spike-Train Analyses of the Speech-Based Envelope Power Spectrum Model

    Science.gov (United States)

    Rallapalli, Varsha H.

    2016-01-01

    Diagnosing and treating hearing impairment is challenging because people with similar degrees of sensorineural hearing loss (SNHL) often have different speech-recognition abilities. The speech-based envelope power spectrum model (sEPSM) has demonstrated that the signal-to-noise ratio (SNRENV) from a modulation filter bank provides a robust speech-intelligibility measure across a wider range of degraded conditions than many long-standing models. In the sEPSM, noise (N) is assumed to: (a) reduce S + N envelope power by filling in dips within clean speech (S) and (b) introduce an envelope noise floor from intrinsic fluctuations in the noise itself. While the promise of SNRENV has been demonstrated for normal-hearing listeners, it has not been thoroughly extended to hearing-impaired listeners because of limited physiological knowledge of how SNHL affects speech-in-noise envelope coding relative to noise alone. Here, envelope coding to speech-in-noise stimuli was quantified from auditory-nerve model spike trains using shuffled correlograms, which were analyzed in the modulation-frequency domain to compute modulation-band estimates of neural SNRENV. Preliminary spike-train analyses show strong similarities to the sEPSM, demonstrating feasibility of neural SNRENV computations. Results suggest that individual differences can occur based on differential degrees of outer- and inner-hair-cell dysfunction in listeners currently diagnosed into the single audiological SNHL category. The predicted acoustic-SNR dependence in individual differences suggests that the SNR-dependent rate of susceptibility could be an important metric in diagnosing individual differences. Future measurements of the neural SNRENV in animal studies with various forms of SNHL will provide valuable insight for understanding individual differences in speech-in-noise intelligibility.

  11. Development of a speech-based dialogue system for report dictation and machine control in the endoscopic laboratory.

    Science.gov (United States)

    Molnar, B; Gergely, J; Toth, G; Pronai, L; Zagoni, T; Papik, K; Tulassay, Z

    2000-01-01

    Reporting and machine control based on speech technology can enhance work efficiency in the gastrointestinal endoscopy laboratory. The status and activation of endoscopy laboratory equipment were described as a multivariate parameter and function system. Speech recognition, text evaluation and action definition engines were installed. Special programs were developed for the grammatical analysis of command sentences, and a rule-based expert system for the definition of machine answers. A speech backup engine provides feedback to the user. Techniques were applied based on the "Hidden Markov" model of discrete word, user-independent speech recognition and on phoneme-based speech synthesis. Speech samples were collected from three male low-tone investigators. The dictation module and machine control modules were incorporated in a personal computer (PC) simulation program. Altogether 100 unidentified patient records were analyzed. The sentences were grouped according to keywords, which indicate the main topics of a gastrointestinal endoscopy report. They were: "endoscope", "esophagus", "cardia", "fundus", "corpus", "antrum", "pylorus", "bulbus", and "postbulbar section", in addition to the major pathological findings: "erosion", "ulceration", and "malignancy". "Biopsy" and "diagnosis" were also included. We implemented wireless speech communication control commands for equipment including an endoscopy unit, video, monitor, printer, and PC. The recognition rate was 95%. Speech technology may soon become an integrated part of our daily routine in the endoscopy laboratory. A central speech and laboratory computer could be the most efficient alternative to having separate speech recognition units in all items of equipment.

  12. 78 FR 49693 - Speech-to-Speech and Internet Protocol (IP) Speech-to-Speech Telecommunications Relay Services...

    Science.gov (United States)

    2013-08-15

    ...-Speech Services for Individuals with Hearing and Speech Disabilities, Report and Order (Order), document...] Speech-to-Speech and Internet Protocol (IP) Speech-to-Speech Telecommunications Relay Services; Telecommunications Relay Services and Speech-to-Speech Services for Individuals With Hearing and Speech Disabilities...

  13. Predicting speech intelligibility based on the signal-to-noise envelope power ratio after modulation-frequency selective processing

    DEFF Research Database (Denmark)

    Jørgensen, Søren; Dau, Torsten

    2011-01-01

    A model for predicting the intelligibility of processed noisy speech is proposed. The speech-based envelope power spectrum model has a similar structure as the model of Ewert and Dau [(2000). J. Acoust. Soc. Am. 108, 1181-1196], developed to account for modulation detection and masking data. The ...... process provides a key measure of speech intelligibility. © 2011 Acoustical Society of America.......A model for predicting the intelligibility of processed noisy speech is proposed. The speech-based envelope power spectrum model has a similar structure as the model of Ewert and Dau [(2000). J. Acoust. Soc. Am. 108, 1181-1196], developed to account for modulation detection and masking data....... The model estimates the speech-to-noise envelope power ratio, SNR env, at the output of a modulation filterbank and relates this metric to speech intelligibility using the concept of an ideal observer. Predictions were compared to data on the intelligibility of speech presented in stationary speech...

  14. The role of temporal resolution in modulation-based speech segregation

    DEFF Research Database (Denmark)

    May, Tobias; Bentsen, Thomas; Dau, Torsten

    2015-01-01

    speech and noise activity on the basis of individual time-frequency (T-F) units. One important parameter of the segregation system is the window duration of the analysis-synthesis stage, which determines the lower limit of modulation frequencies that can be represented but also the temporal acuity...... with which the segregation system can manipulate individual T-F units. To clarify the consequences of this trade-off on modulation-based speech segregation performance, the influence of the window duration was systematically investigated...

  15. Intonational meaning in institutional settings: the role of syntagmatic relations

    Science.gov (United States)

    Wichmann, Anne

    2010-12-01

    This paper addresses the power of intonation to convey interpersonal or attitudinal meaning. Speakers have been shown to accommodate to each other in the course of conversation, and this convergence may be perceived as a sign of empathy. Accommodation often involves paradigmatic choices—choosing the same words, gestures, regional accent or melodic pattern, but this paper suggests that affective meaning can also be conveyed syntagmatically through the relationship between prosodic features in successive utterances. The paper also addresses the use of prosody in situations of conflict, particularly in institutional settings. The requirement of the more powerful participant to exercise control may conflict with the expression of empathy. Situations are described where divergent rather than convergent behaviour is more successful both in keeping control and in maintaining rapport.

  16. Multi-Stage Recognition of Speech Emotion Using Sequential Forward Feature Selection

    Directory of Open Access Journals (Sweden)

    Liogienė Tatjana

    2016-07-01

    Full Text Available The intensive research of speech emotion recognition introduced a huge collection of speech emotion features. Large feature sets complicate the speech emotion recognition task. Among various feature selection and transformation techniques for one-stage classification, multiple classifier systems were proposed. The main idea of multiple classifiers is to arrange the emotion classification process in stages. Besides parallel and serial cases, the hierarchical arrangement of multi-stage classification is most widely used for speech emotion recognition. In this paper, we present a sequential-forward-feature-selection-based multi-stage classification scheme. The Sequential Forward Selection (SFS and Sequential Floating Forward Selection (SFFS techniques were employed for every stage of the multi-stage classification scheme. Experimental testing of the proposed scheme was performed using the German and Lithuanian emotional speech datasets. Sequential-feature-selection-based multi-stage classification outperformed the single-stage scheme by 12–42 % for different emotion sets. The multi-stage scheme has shown higher robustness to the growth of emotion set. The decrease in recognition rate with the increase in emotion set for multi-stage scheme was lower by 10–20 % in comparison with the single-stage case. Differences in SFS and SFFS employment for feature selection were negligible.

  17. Childhood apraxia of speech: A survey of praxis and typical speech characteristics.

    Science.gov (United States)

    Malmenholt, Ann; Lohmander, Anette; McAllister, Anita

    2017-07-01

    The purpose of this study was to investigate current knowledge of the diagnosis childhood apraxia of speech (CAS) in Sweden and compare speech characteristics and symptoms to those of earlier survey findings in mainly English-speakers. In a web-based questionnaire 178 Swedish speech-language pathologists (SLPs) anonymously answered questions about their perception of typical speech characteristics for CAS. They graded own assessment skills and estimated clinical occurrence. The seven top speech characteristics reported as typical for children with CAS were: inconsistent speech production (85%), sequencing difficulties (71%), oro-motor deficits (63%), vowel errors (62%), voicing errors (61%), consonant cluster deletions (54%), and prosodic disturbance (53%). Motor-programming deficits described as lack of automatization of speech movements were perceived by 82%. All listed characteristics were consistent with the American Speech-Language-Hearing Association (ASHA) consensus-based features, Strand's 10-point checklist, and the diagnostic model proposed by Ozanne. The mode for clinical occurrence was 5%. Number of suspected cases of CAS in the clinical caseload was approximately one new patient/year and SLP. The results support and add to findings from studies of CAS in English-speaking children with similar speech characteristics regarded as typical. Possibly, these findings could contribute to cross-linguistic consensus on CAS characteristics.

  18. Analysis of Acoustic Features in Speakers with Cognitive Disorders and Speech Impairments

    Science.gov (United States)

    Saz, Oscar; Simón, Javier; Rodríguez, W. Ricardo; Lleida, Eduardo; Vaquero, Carlos

    2009-12-01

    This work presents the results in the analysis of the acoustic features (formants and the three suprasegmental features: tone, intensity and duration) of the vowel production in a group of 14 young speakers suffering different kinds of speech impairments due to physical and cognitive disorders. A corpus with unimpaired children's speech is used to determine the reference values for these features in speakers without any kind of speech impairment within the same domain of the impaired speakers; this is 57 isolated words. The signal processing to extract the formant and pitch values is based on a Linear Prediction Coefficients (LPCs) analysis of the segments considered as vowels in a Hidden Markov Model (HMM) based Viterbi forced alignment. Intensity and duration are also based in the outcome of the automated segmentation. As main conclusion of the work, it is shown that intelligibility of the vowel production is lowered in impaired speakers even when the vowel is perceived as correct by human labelers. The decrease in intelligibility is due to a 30% of increase in confusability in the formants map, a reduction of 50% in the discriminative power in energy between stressed and unstressed vowels and to a 50% increase of the standard deviation in the length of the vowels. On the other hand, impaired speakers keep good control of tone in the production of stressed and unstressed vowels.

  19. Factors contributing to speech perception scores in long-term pediatric cochlear implant users.

    Science.gov (United States)

    Davidson, Lisa S; Geers, Ann E; Blamey, Peter J; Tobey, Emily A; Brenner, Christine A

    2011-02-01

    The objectives of this report are to (1) describe the speech perception abilities of long-term pediatric cochlear implant (CI) recipients by comparing scores obtained at elementary school (CI-E, 8 to 9 yrs) with scores obtained at high school (CI-HS, 15 to 18 yrs); (2) evaluate speech perception abilities in demanding listening conditions (i.e., noise and lower intensity levels) at adolescence; and (3) examine the relation of speech perception scores to speech and language development over this longitudinal timeframe. All 112 teenagers were part of a previous nationwide study of 8- and 9-yr-olds (N = 181) who received a CI between 2 and 5 yrs of age. The test battery included (1) the Lexical Neighborhood Test (LNT; hard and easy word lists); (2) the Bamford Kowal Bench sentence test; (3) the Children's Auditory-Visual Enhancement Test; (4) the Test of Auditory Comprehension of Language at CI-E; (5) the Peabody Picture Vocabulary Test at CI-HS; and (6) the McGarr sentences (consonants correct) at CI-E and CI-HS. CI-HS speech perception was measured in both optimal and demanding listening conditions (i.e., background noise and low-intensity level). Speech perception scores were compared based on age at test, lexical difficulty of stimuli, listening environment (optimal and demanding), input mode (visual and auditory-visual), and language age. All group mean scores significantly increased with age across the two test sessions. Scores of adolescents significantly decreased in demanding listening conditions. The effect of lexical difficulty on the LNT scores, as evidenced by the difference in performance between easy versus hard lists, increased with age and decreased for adolescents in challenging listening conditions. Calculated curves for percent correct speech perception scores (LNT and Bamford Kowal Bench) and consonants correct on the McGarr sentences plotted against age-equivalent language scores on the Test of Auditory Comprehension of Language and Peabody

  20. Man-system interface based on automatic speech recognition: integration to a virtual control desk

    Energy Technology Data Exchange (ETDEWEB)

    Jorge, Carlos Alexandre F.; Mol, Antonio Carlos A.; Pereira, Claudio M.N.A.; Aghina, Mauricio Alves C., E-mail: calexandre@ien.gov.b, E-mail: mol@ien.gov.b, E-mail: cmnap@ien.gov.b, E-mail: mag@ien.gov.b [Instituto de Engenharia Nuclear (IEN/CNEN-RJ), Rio de Janeiro, RJ (Brazil); Nomiya, Diogo V., E-mail: diogonomiya@gmail.co [Universidade Federal do Rio de Janeiro (UFRJ), RJ (Brazil)

    2009-07-01

    This work reports the implementation of a man-system interface based on automatic speech recognition, and its integration to a virtual nuclear power plant control desk. The later is aimed to reproduce a real control desk using virtual reality technology, for operator training and ergonomic evaluation purpose. An automatic speech recognition system was developed to serve as a new interface with users, substituting computer keyboard and mouse. They can operate this virtual control desk in front of a computer monitor or a projection screen through spoken commands. The automatic speech recognition interface developed is based on a well-known signal processing technique named cepstral analysis, and on artificial neural networks. The speech recognition interface is described, along with its integration with the virtual control desk, and results are presented. (author)

  1. Man-system interface based on automatic speech recognition: integration to a virtual control desk

    International Nuclear Information System (INIS)

    Jorge, Carlos Alexandre F.; Mol, Antonio Carlos A.; Pereira, Claudio M.N.A.; Aghina, Mauricio Alves C.; Nomiya, Diogo V.

    2009-01-01

    This work reports the implementation of a man-system interface based on automatic speech recognition, and its integration to a virtual nuclear power plant control desk. The later is aimed to reproduce a real control desk using virtual reality technology, for operator training and ergonomic evaluation purpose. An automatic speech recognition system was developed to serve as a new interface with users, substituting computer keyboard and mouse. They can operate this virtual control desk in front of a computer monitor or a projection screen through spoken commands. The automatic speech recognition interface developed is based on a well-known signal processing technique named cepstral analysis, and on artificial neural networks. The speech recognition interface is described, along with its integration with the virtual control desk, and results are presented. (author)

  2. Constrained versus Unconstrained Intensive Language Therapy in Two Individuals with Chronic, Moderate-to-Severe Aphasia and Apraxia of Speech: Behavioral and fMRI Outcomes

    Science.gov (United States)

    Kurland, Jacquie; Pulvermuller, Friedemann; Silva, Nicole; Burke, Katherine; Andrianopoulos, Mary

    2012-01-01

    Purpose: This Phase I study investigated behavioral and functional MRI (fMRI) outcomes of 2 intensive treatment programs to improve naming in 2 participants with chronic moderate-to-severe aphasia with comorbid apraxia of speech (AOS). Constraint-induced aphasia therapy (CIAT; Pulvermuller et al., 2001) has demonstrated positive outcomes in some…

  3. Speech-to-Speech Relay Service

    Science.gov (United States)

    Consumer Guide Speech to Speech Relay Service Speech-to-Speech (STS) is one form of Telecommunications Relay Service (TRS). TRS is a service that allows persons with hearing and speech disabilities ...

  4. Speech Enhancement by MAP Spectral Amplitude Estimation Using a Super-Gaussian Speech Model

    Directory of Open Access Journals (Sweden)

    Lotter Thomas

    2005-01-01

    Full Text Available This contribution presents two spectral amplitude estimators for acoustical background noise suppression based on maximum a posteriori estimation and super-Gaussian statistical modelling of the speech DFT amplitudes. The probability density function of the speech spectral amplitude is modelled with a simple parametric function, which allows a high approximation accuracy for Laplace- or Gamma-distributed real and imaginary parts of the speech DFT coefficients. Also, the statistical model can be adapted to optimally fit the distribution of the speech spectral amplitudes for a specific noise reduction system. Based on the super-Gaussian statistical model, computationally efficient maximum a posteriori speech estimators are derived, which outperform the commonly applied Ephraim-Malah algorithm.

  5. Auditory-motor mapping training as an intervention to facilitate speech output in non-verbal children with autism: a proof of concept study.

    Directory of Open Access Journals (Sweden)

    Catherine Y Wan

    Full Text Available Although up to 25% of children with autism are non-verbal, there are very few interventions that can reliably produce significant improvements in speech output. Recently, a novel intervention called Auditory-Motor Mapping Training (AMMT has been developed, which aims to promote speech production directly by training the association between sounds and articulatory actions using intonation and bimanual motor activities. AMMT capitalizes on the inherent musical strengths of children with autism, and offers activities that they intrinsically enjoy. It also engages and potentially stimulates a network of brain regions that may be dysfunctional in autism. Here, we report an initial efficacy study to provide 'proof of concept' for AMMT. Six non-verbal children with autism participated. Prior to treatment, the children had no intelligible words. They each received 40 individual sessions of AMMT 5 times per week, over an 8-week period. Probe assessments were conducted periodically during baseline, therapy, and follow-up sessions. After therapy, all children showed significant improvements in their ability to articulate words and phrases, with generalization to items that were not practiced during therapy sessions. Because these children had no or minimal vocal output prior to treatment, the acquisition of speech sounds and word approximations through AMMT represents a critical step in expressive language development in children with autism.

  6. Predicting automatic speech recognition performance over communication channels from instrumental speech quality and intelligibility scores

    NARCIS (Netherlands)

    Gallardo, L.F.; Möller, S.; Beerends, J.

    2017-01-01

    The performance of automatic speech recognition based on coded-decoded speech heavily depends on the quality of the transmitted signals, determined by channel impairments. This paper examines relationships between speech recognition performance and measurements of speech quality and intelligibility

  7. Intensive speech and language therapy in patients with chronic aphasia after stroke: a randomised, open-label, blinded-endpoint, controlled trial in a health-care setting:A randomised, open-label, blinded-endpoint, controlled trial in a health-care setting

    OpenAIRE

    Caterina, Breitenstein; Grewe, Tanja; Flöel, Agnes; Ziegler, Wolfram; Springer, Luise; Martus, Peter; Huber, Walter; Willmes, Klaus; Ringelstein, E. Bernd; Haeusler, Karl Georg; Abel, Steffie; Glindemann, Ralf; Domahs, Frank; Regenbrecht, Frank; Schlenck, Klaus-Jürgen

    2017-01-01

    BackgroundTreatment guidelines for aphasia recommend intensive speech and language therapy for chronic (≥6 months) aphasia after stroke, but large-scale, class 1 randomised controlled trials on treatment effectiveness are scarce. We aimed to examine whether 3 weeks of intensive speech and language therapy under routine clinical conditions improved verbal communication in daily-life situations in people with chronic aphasia after stroke.MethodsIn this multicentre, parallel group, superiority, ...

  8. Functional characteristics of a new electrolarynx "Evada" having a force sensing resistor sensor.

    Science.gov (United States)

    Choi, H S; Park, Y J; Lee, S M; Kim, K M

    2001-12-01

    Electrolarynxes have been used as one of the rehabilitation methods for laryngectomees. Earlier electrolarynxes could not alter frequency and intensity simultaneously during conversation. Recently, we developed an electrolarynx named "Evada" (prototype so far) using a force sensing resistor (FSR) sensor that can control both frequency and intensity simultaneously during conversation. Employing three types of electrolarynxes (Evada, Servox-inton, Nu-vois), this study was undertaken to examine the functional characteristics of Evada for the normal control group and for laryngectomess. Five laryngectomees and five normal adults were asked to express three sentences (declarative sentence, "You stay here.", interrogative sentence, "You stay here?", and imperative sentence, "You! Stay here.") using three types of electrolarynxes. Frequency and intensity changes between the first and last vowels in the three sentences were calculated and analyzed statistically by paired t test. The frequency changes in the interrogative and imperative sentences were more prominent in Evada than in Servox-inton and Nu-vois. The intensity changes in the interrogative and imperative sentences were also more prominent in Evada than in Servox-inton and Nu-vois. Evada controls frequency and/or intensity by having the subject press the control button(s). Therefore, Evada appears to be better at producing intonation and contrastive stress than Nu-vois and Servox-inton.

  9. Bridging the Gap Between Speech and Language: Using Multimodal Treatment in a Child With Apraxia.

    Science.gov (United States)

    Tierney, Cheryl D; Pitterle, Kathleen; Kurtz, Marie; Nakhla, Mark; Todorow, Carlyn

    2016-09-01

    Childhood apraxia of speech is a neurologic speech sound disorder in which children have difficulty constructing words and sounds due to poor motor planning and coordination of the articulators required for speech sound production. We report the case of a 3-year-old boy strongly suspected to have childhood apraxia of speech at 18 months of age who used multimodal communication to facilitate language development throughout his work with a speech language pathologist. In 18 months of an intensive structured program, he exhibited atypical rapid improvement, progressing from having no intelligible speech to achieving age-appropriate articulation. We suspect that early introduction of sign language by family proved to be a highly effective form of language development, that when coupled with intensive oro-motor and speech sound therapy, resulted in rapid resolution of symptoms. Copyright © 2016 by the American Academy of Pediatrics.

  10. Speech characteristics of miners with black lung disease (pneumoconiosis).

    Science.gov (United States)

    Gilbert, H R

    1975-06-01

    Speech samples were obtained from 10 miners with diagnosed black lung disease and 10 nonminers who had never worked in a dusty environment and who had no history of respiratory diseases. Frequency, intensity and durational measures were used as a basis upon which to compare the two groups. Results indicated that four of the six pausal measures, vowel duration, vowel intensity variation and vowel perturbation differentiated the miners from the nonminers. The results indicate that black lung disease may affect not only respiratory physiology associated with speech production but also laryngeal physiology.

  11. Reliability of Interaural Time Difference-Based Localization Training in Elderly Individuals with Speech-in-Noise Perception Disorder.

    Science.gov (United States)

    Delphi, Maryam; Lotfi, M-Yones; Moossavi, Abdollah; Bakhshi, Enayatollah; Banimostafa, Maryam

    2017-09-01

    Previous studies have shown that interaural-time-difference (ITD) training can improve localization ability. Surprisingly little is, however, known about localization training vis-à-vis speech perception in noise based on interaural time difference in the envelope (ITD ENV). We sought to investigate the reliability of an ITD ENV-based training program in speech-in-noise perception among elderly individuals with normal hearing and speech-in-noise disorder. The present interventional study was performed during 2016. Sixteen elderly men between 55 and 65 years of age with the clinical diagnosis of normal hearing up to 2000 Hz and speech-in-noise perception disorder participated in this study. The training localization program was based on changes in ITD ENV. In order to evaluate the reliability of the training program, we performed speech-in-noise tests before the training program, immediately afterward, and then at 2 months' follow-up. The reliability of the training program was analyzed using the Friedman test and the SPSS software. Significant statistical differences were shown in the mean scores of speech-in-noise perception between the 3 time points (P=0.001). The results also indicated no difference in the mean scores of speech-in-noise perception between the 2 time points of immediately after the training program and 2 months' follow-up (P=0.212). The present study showed the reliability of an ITD ENV-based localization training in elderly individuals with speech-in-noise perception disorder.

  12. Modeling speech intelligibility based on the signal-to-noise envelope power ratio

    DEFF Research Database (Denmark)

    Jørgensen, Søren

    of modulation frequency selectivity in the auditory processing of sound with a decision metric for intelligibility that is based on the signal-to-noise envelope power ratio (SNRenv). The proposed speech-based envelope power spectrum model (sEPSM) is demonstrated to account for the effects of stationary...... through three commercially available mobile phones. The model successfully accounts for the performance across the phones in conditions with a stationary speech-shaped background noise, whereas deviations were observed in conditions with “Traffic” and “Pub” noise. Overall, the results of this thesis...

  13. Structural brain aging and speech production: a surface-based brain morphometry study.

    Science.gov (United States)

    Tremblay, Pascale; Deschamps, Isabelle

    2016-07-01

    While there has been a growing number of studies examining the neurofunctional correlates of speech production over the past decade, the neurostructural correlates of this immensely important human behaviour remain less well understood, despite the fact that previous studies have established links between brain structure and behaviour, including speech and language. In the present study, we thus examined, for the first time, the relationship between surface-based cortical thickness (CT) and three different behavioural indexes of sublexical speech production: response duration, reaction times and articulatory accuracy, in healthy young and older adults during the production of simple and complex meaningless sequences of syllables (e.g., /pa-pa-pa/ vs. /pa-ta-ka/). The results show that each behavioural speech measure was sensitive to the complexity of the sequences, as indicated by slower reaction times, longer response durations and decreased articulatory accuracy in both groups for the complex sequences. Older adults produced longer speech responses, particularly during the production of complex sequence. Unique age-independent and age-dependent relationships between brain structure and each of these behavioural measures were found in several cortical and subcortical regions known for their involvement in speech production, including the bilateral anterior insula, the left primary motor area, the rostral supramarginal gyrus, the right inferior frontal sulcus, the bilateral putamen and caudate, and in some region less typically associated with speech production, such as the posterior cingulate cortex.

  14. Subspace-Based Noise Reduction for Speech Signals via Diagonal and Triangular Matrix Decompositions

    DEFF Research Database (Denmark)

    Hansen, Per Christian; Jensen, Søren Holdt

    2007-01-01

    We survey the definitions and use of rank-revealing matrix decompositions in single-channel noise reduction algorithms for speech signals. Our algorithms are based on the rank-reduction paradigm and, in particular, signal subspace techniques. The focus is on practical working algorithms, using both...... with working Matlab code and applications in speech processing....

  15. Cortical thickness in children receiving intensive therapy for idiopathic apraxia of speech.

    Science.gov (United States)

    Kadis, Darren S; Goshulak, Debra; Namasivayam, Aravind; Pukonen, Margit; Kroll, Robert; De Nil, Luc F; Pang, Elizabeth W; Lerch, Jason P

    2014-03-01

    Children with idiopathic apraxia experience difficulties planning the movements necessary for intelligible speech. There is increasing evidence that targeted early interventions, such as Prompts for Restructuring Oral Muscular Phonetic Targets (PROMPT), can be effective in treating these disorders. In this study, we investigate possible cortical thickness correlates of idiopathic apraxia of speech in childhood, and changes associated with participation in an 8-week block of PROMPT therapy. We found that children with idiopathic apraxia (n = 11), aged 3-6 years, had significantly thicker left supramarginal gyri than a group of typically-developing age-matched controls (n = 11), t(20) = 2.84, p ≤ 0.05. Over the course of therapy, the children with apraxia (n = 9) experienced significant thinning of the left posterior superior temporal gyrus (canonical Wernicke's area), t(8) = 2.42, p ≤ 0.05. This is the first study to demonstrate experience-dependent structural plasticity in children receiving therapy for speech sound disorders.

  16. Speech and Speech-Related Quality of Life After Late Palate Repair: A Patient's Perspective.

    Science.gov (United States)

    Schönmeyr, Björn; Wendby, Lisa; Sharma, Mitali; Jacobson, Lia; Restrepo, Carolina; Campbell, Alex

    2015-07-01

    Many patients with cleft palate deformities worldwide receive treatment at a later age than is recommended for normal speech to develop. The outcomes after late palate repairs in terms of speech and quality of life (QOL) still remain largely unstudied. In the current study, questionnaires were used to assess the patients' perception of speech and QOL before and after primary palate repair. All of the patients were operated at a cleft center in northeast India and had a cleft palate with a normal lip or with a cleft lip that had been previously repaired. A total of 134 patients (7-35 years) were interviewed preoperatively and 46 patients (7-32 years) were assessed in the postoperative survey. The survey showed that scores based on the speech handicap index, concerning speech and speech-related QOL, did not improve postoperatively. In fact, the questionnaires indicated that the speech became more unpredictable (P reported that their self-confidence had improved after the operation. Thus, the majority of interviewed patients who underwent late primary palate repair were satisfied with the surgery. At the same time, speech and speech-related QOL did not improve according to the speech handicap index-based survey. Speech predictability may even become worse and nasal regurgitation may increase after late palate repair, according to these results.

  17. "Would you like RED or WHITE wine? - I would like red WINE" : Native speaker perception of (non-)native intonation patterns

    NARCIS (Netherlands)

    van Maastricht, L.J.; Swerts, M.G.J.; Krahmer, E.J.

    2014-01-01

    The importance of intonation in communication cannot be denied (Gussenhoven, 2004; Ladd, 2008). Since the goal of most foreign language learners is to successfully communicate in a language other than their mother tongue, more research on the way they are perceived by native speakers and the role of

  18. Effects of human fatigue on speech signals

    Science.gov (United States)

    Stamoulis, Catherine

    2004-05-01

    Cognitive performance may be significantly affected by fatigue. In the case of critical personnel, such as pilots, monitoring human fatigue is essential to ensure safety and success of a given operation. One of the modalities that may be used for this purpose is speech, which is sensitive to respiratory changes and increased muscle tension of vocal cords, induced by fatigue. Age, gender, vocal tract length, physical and emotional state may significantly alter speech intensity, duration, rhythm, and spectral characteristics. In addition to changes in speech rhythm, fatigue may also affect the quality of speech, such as articulation. In a noisy environment, detecting fatigue-related changes in speech signals, particularly subtle changes at the onset of fatigue, may be difficult. Therefore, in a performance-monitoring system, speech parameters which are significantly affected by fatigue need to be identified and extracted from input signals. For this purpose, a series of experiments was performed under slowly varying cognitive load conditions and at different times of the day. The results of the data analysis are presented here.

  19. Towards Contactless Silent Speech Recognition Based on Detection of Active and Visible Articulators Using IR-UWB Radar.

    Science.gov (United States)

    Shin, Young Hoon; Seo, Jiwon

    2016-10-29

    People with hearing or speaking disabilities are deprived of the benefits of conventional speech recognition technology because it is based on acoustic signals. Recent research has focused on silent speech recognition systems that are based on the motions of a speaker's vocal tract and articulators. Because most silent speech recognition systems use contact sensors that are very inconvenient to users or optical systems that are susceptible to environmental interference, a contactless and robust solution is hence required. Toward this objective, this paper presents a series of signal processing algorithms for a contactless silent speech recognition system using an impulse radio ultra-wide band (IR-UWB) radar. The IR-UWB radar is used to remotely and wirelessly detect motions of the lips and jaw. In order to extract the necessary features of lip and jaw motions from the received radar signals, we propose a feature extraction algorithm. The proposed algorithm noticeably improved speech recognition performance compared to the existing algorithm during our word recognition test with five speakers. We also propose a speech activity detection algorithm to automatically select speech segments from continuous input signals. Thus, speech recognition processing is performed only when speech segments are detected. Our testbed consists of commercial off-the-shelf radar products, and the proposed algorithms are readily applicable without designing specialized radar hardware for silent speech processing.

  20. Reliability of Interaural Time Difference-Based Localization Training in Elderly Individuals with Speech-in-Noise Perception Disorder

    Directory of Open Access Journals (Sweden)

    Maryam Delphi

    2017-09-01

    Full Text Available Background: Previous studies have shown that interaural-time-difference (ITD training can improve localization ability. Surprisingly little is, however, known about localization training vis-à-vis speech perception in noise based on interaural time difference in the envelope (ITD ENV. We sought to investigate the reliability of an ITD ENV-based training program in speech-in-noise perception among elderly individuals with normal hearing and speech-in-noise disorder. Methods: The present interventional study was performed during 2016. Sixteen elderly men between 55 and 65 years of age with the clinical diagnosis of normal hearing up to 2000 Hz and speech-in-noise perception disorder participated in this study. The training localization program was based on changes in ITD ENV. In order to evaluate the reliability of the training program, we performed speech-in-noise tests before the training program, immediately afterward, and then at 2 months’ follow-up. The reliability of the training program was analyzed using the Friedman test and the SPSS software. Results: Significant statistical differences were shown in the mean scores of speech-in-noise perception between the 3 time points (P=0.001. The results also indicated no difference in the mean scores of speech-in-noise perception between the 2 time points of immediately after the training program and 2 months’ follow-up (P=0.212. Conclusion: The present study showed the reliability of an ITD ENV-based localization training in elderly individuals with speech-in-noise perception disorder.

  1. Experimental comparison between speech transmission index, rapid speech transmission index, and speech intelligibility index.

    Science.gov (United States)

    Larm, Petra; Hongisto, Valtteri

    2006-02-01

    During the acoustical design of, e.g., auditoria or open-plan offices, it is important to know how speech can be perceived in various parts of the room. Different objective methods have been developed to measure and predict speech intelligibility, and these have been extensively used in various spaces. In this study, two such methods were compared, the speech transmission index (STI) and the speech intelligibility index (SII). Also the simplification of the STI, the room acoustics speech transmission index (RASTI), was considered. These quantities are all based on determining an apparent speech-to-noise ratio on selected frequency bands and summing them using a specific weighting. For comparison, some data were needed on the possible differences of these methods resulting from the calculation scheme and also measuring equipment. Their prediction accuracy was also of interest. Measurements were made in a laboratory having adjustable noise level and absorption, and in a real auditorium. It was found that the measurement equipment, especially the selection of the loudspeaker, can greatly affect the accuracy of the results. The prediction accuracy of the RASTI was found acceptable, if the input values for the prediction are accurately known, even though the studied space was not ideally diffuse.

  2. The effects of choir spacing and choir formation on the tuning accuracy and intonation tendencies of a mixed choir

    Science.gov (United States)

    Daugherty, James F.

    2005-09-01

    The tuning accuracy and intonation tendencies of a high school mixed choir (N=46) were measured from digital recordings obtained as the ensemble performed an a cappella motet under concert conditions in N=3 singer spacing configurations (close, lateral, circumambient) and N=2 choir formations (sectional and mixed). Methods of analysis were modeled on Howard's (2004) pitch-based measurements of the tuning accuracy of crowds of football fans. Results were discussed in terms of (a) previous studies on choir spacing (Daugherty, 1999, 2003) and self-to-other singer ratios (Ternstrm, 1995, 1999); (b) contributions of choir spacing to vocal/choral pedagogy; and (c) potential ramifications for the design and use of auditoria and portable standing risers for choral performances.

  3. Major and minor music compared to excited and subdued speech.

    Science.gov (United States)

    Bowling, Daniel L; Gill, Kamraan; Choi, Jonathan D; Prinz, Joseph; Purves, Dale

    2010-01-01

    The affective impact of music arises from a variety of factors, including intensity, tempo, rhythm, and tonal relationships. The emotional coloring evoked by intensity, tempo, and rhythm appears to arise from association with the characteristics of human behavior in the corresponding condition; however, how and why particular tonal relationships in music convey distinct emotional effects are not clear. The hypothesis examined here is that major and minor tone collections elicit different affective reactions because their spectra are similar to the spectra of voiced speech uttered in different emotional states. To evaluate this possibility the spectra of the intervals that distinguish major and minor music were compared to the spectra of voiced segments in excited and subdued speech using fundamental frequency and frequency ratios as measures. Consistent with the hypothesis, the spectra of major intervals are more similar to spectra found in excited speech, whereas the spectra of particular minor intervals are more similar to the spectra of subdued speech. These results suggest that the characteristic affective impact of major and minor tone collections arises from associations routinely made between particular musical intervals and voiced speech.

  4. Pseudo-Coherence-Based MVDR Beamformer for Speech Enhancement with Ad Hoc Microphone Arrays

    DEFF Research Database (Denmark)

    Tavakoli, Vincent Mohammad; Jensen, Jesper Rindom; Christensen, Mads Græsbøll

    2015-01-01

    Speech enhancement with distributed arrays has been met with various methods. On the one hand, data independent methods require information about the position of sensors, so they are not suitable for dynamic geometries. On the other hand, Wiener-based methods cannot assure a distortionless output....... This paper proposes minimum variance distortionless response filtering based on multichannel pseudo-coherence for speech enhancement with ad hoc microphone arrays. This method requires neither position information nor control of the trade-off used in the distortion weighted methods. Furthermore, certain...

  5. Predicting speech intelligibility based on a correlation metric in the envelope power spectrum domain

    DEFF Research Database (Denmark)

    Relaño-Iborra, Helia; May, Tobias; Zaar, Johannes

    2016-01-01

    A speech intelligibility prediction model is proposed that combines the auditory processing front end of the multi-resolution speech-based envelope power spectrum model [mr-sEPSM; Jørgensen, Ewert, and Dau (2013). J. Acoust. Soc. Am. 134(1), 436–446] with a correlation back end inspired by the sh...

  6. Speech neglect: A strange educational blind spot

    Science.gov (United States)

    Harris, Katherine Safford

    2005-09-01

    Speaking is universally acknowledged as an important human talent, yet as a topic of educated common knowledge, it is peculiarly neglected. Partly, this is a consequence of the relatively recent growth of research on speech perception, production, and development, but also a function of the way that information is sliced up by undergraduate colleges. Although the basic acoustic mechanism of vowel production was known to Helmholtz, the ability to view speech production as a physiological event is evolving even now with such techniques as fMRI. Intensive research on speech perception emerged only in the early 1930s as Fletcher and the engineers at Bell Telephone Laboratories developed the transmission of speech over telephone lines. The study of speech development was revolutionized by the papers of Eimas and his colleagues on speech perception in infants in the 1970s. Dissemination of knowledge in these fields is the responsibility of no single academic discipline. It forms a center for two departments, Linguistics, and Speech and Hearing, but in the former, there is a heavy emphasis on other aspects of language than speech and, in the latter, a focus on clinical practice. For psychologists, it is a rather minor component of a very diverse assembly of topics. I will focus on these three fields in proposing possible remedies.

  7. Speech coding

    Energy Technology Data Exchange (ETDEWEB)

    Ravishankar, C., Hughes Network Systems, Germantown, MD

    1998-05-08

    Speech is the predominant means of communication between human beings and since the invention of the telephone by Alexander Graham Bell in 1876, speech services have remained to be the core service in almost all telecommunication systems. Original analog methods of telephony had the disadvantage of speech signal getting corrupted by noise, cross-talk and distortion Long haul transmissions which use repeaters to compensate for the loss in signal strength on transmission links also increase the associated noise and distortion. On the other hand digital transmission is relatively immune to noise, cross-talk and distortion primarily because of the capability to faithfully regenerate digital signal at each repeater purely based on a binary decision. Hence end-to-end performance of the digital link essentially becomes independent of the length and operating frequency bands of the link Hence from a transmission point of view digital transmission has been the preferred approach due to its higher immunity to noise. The need to carry digital speech became extremely important from a service provision point of view as well. Modem requirements have introduced the need for robust, flexible and secure services that can carry a multitude of signal types (such as voice, data and video) without a fundamental change in infrastructure. Such a requirement could not have been easily met without the advent of digital transmission systems, thereby requiring speech to be coded digitally. The term Speech Coding is often referred to techniques that represent or code speech signals either directly as a waveform or as a set of parameters by analyzing the speech signal. In either case, the codes are transmitted to the distant end where speech is reconstructed or synthesized using the received set of codes. A more generic term that is applicable to these techniques that is often interchangeably used with speech coding is the term voice coding. This term is more generic in the sense that the

  8. An analysis of machine translation and speech synthesis in speech-to-speech translation system

    OpenAIRE

    Hashimoto, K.; Yamagishi, J.; Byrne, W.; King, S.; Tokuda, K.

    2011-01-01

    This paper provides an analysis of the impacts of machine translation and speech synthesis on speech-to-speech translation systems. The speech-to-speech translation system consists of three components: speech recognition, machine translation and speech synthesis. Many techniques for integration of speech recognition and machine translation have been proposed. However, speech synthesis has not yet been considered. Therefore, in this paper, we focus on machine translation and speech synthesis, ...

  9. End-User Recommendations on LOGOMON - a Computer Based Speech Therapy System for Romanian Language

    Directory of Open Access Journals (Sweden)

    SCHIPOR, O. A.

    2010-11-01

    Full Text Available In this paper we highlight the relations between LOGOMON - a Computer Based Speech Therapy System and dyslalia's training steps. Dyslalia is a speech disorder that affects pronunciation of one or many sounds. This presentation of the system is completed by a research regarding end-user (i.e. teachers and parents attitude about the speech assisted therapy in general and about LOGOMON System in particular. The results of this research allow the improvement of our CBST system because the obtained information can be a source of adaptability to different expectations of the beneficiaries.

  10. Using the Speech Transmission Index for predicting non-native speech intelligibility

    NARCIS (Netherlands)

    Wijngaarden, S.J. van; Bronkhorst, A.W.; Houtgast, T.; Steeneken, H.J.M.

    2004-01-01

    While the Speech Transmission Index ~STI! is widely applied for prediction of speech intelligibility in room acoustics and telecommunication engineering, it is unclear how to interpret STI values when non-native talkers or listeners are involved. Based on subjectively measured psychometric functions

  11. Filled pause refinement based on the pronunciation probability for lecture speech.

    Directory of Open Access Journals (Sweden)

    Yan-Hua Long

    Full Text Available Nowadays, although automatic speech recognition has become quite proficient in recognizing or transcribing well-prepared fluent speech, the transcription of speech that contains many disfluencies remains problematic, such as spontaneous conversational and lecture speech. Filled pauses (FPs are the most frequently occurring disfluencies in this type of speech. Most recent studies have shown that FPs are widely believed to increase the error rates for state-of-the-art speech transcription, primarily because most FPs are not well annotated or provided in training data transcriptions and because of the similarities in acoustic characteristics between FPs and some common non-content words. To enhance the speech transcription system, we propose a new automatic refinement approach to detect FPs in British English lecture speech transcription. This approach combines the pronunciation probabilities for each word in the dictionary and acoustic language model scores for FP refinement through a modified speech recognition forced-alignment framework. We evaluate the proposed approach on the Reith Lectures speech transcription task, in which only imperfect training transcriptions are available. Successful results are achieved for both the development and evaluation datasets. Acoustic models trained on different styles of speech genres have been investigated with respect to FP refinement. To further validate the effectiveness of the proposed approach, speech transcription performance has also been examined using systems built on training data transcriptions with and without FP refinement.

  12. Subspace-Based Noise Reduction for Speech Signals via Diagonal and Triangular Matrix Decompositions

    DEFF Research Database (Denmark)

    Hansen, Per Christian; Jensen, Søren Holdt

    We survey the definitions and use of rank-revealing matrix decompositions in single-channel noise reduction algorithms for speech signals. Our algorithms are based on the rank-reduction paradigm and, in particular, signal subspace techniques. The focus is on practical working algorithms, using both...... diagonal (eigenvalue and singular value) decompositions and rank-revealing triangular decompositions (ULV, URV, VSV, ULLV and ULLIV). In addition we show how the subspace-based algorithms can be evaluated and compared by means of simple FIR filter interpretations. The algorithms are illustrated...... with working Matlab code and applications in speech processing....

  13. Web-Based Live Speech-Driven Lip-Sync

    OpenAIRE

    Llorach, Gerard; Evans, Alun; Blat, Josep; Grimm, Giso; Hohmann, Volker

    2016-01-01

    Virtual characters are an integral part of many games and virtual worlds. The ability to accurately synchronize lip movement to audio speech is an important aspect in the believability of the character. In this paper we propose a simple rule-based lip-syncing algorithm for virtual agents using the web browser. It works in real-time with live input, unlike most current lip-syncing proposals, which may require considerable amounts of computation, expertise and time to set up. Our method gen...

  14. The effect of viewing speech on auditory speech processing is different in the left and right hemispheres.

    Science.gov (United States)

    Davis, Chris; Kislyuk, Daniel; Kim, Jeesun; Sams, Mikko

    2008-11-25

    We used whole-head magnetoencephalograpy (MEG) to record changes in neuromagnetic N100m responses generated in the left and right auditory cortex as a function of the match between visual and auditory speech signals. Stimuli were auditory-only (AO) and auditory-visual (AV) presentations of /pi/, /ti/ and /vi/. Three types of intensity matched auditory stimuli were used: intact speech (Normal), frequency band filtered speech (Band) and speech-shaped white noise (Noise). The behavioural task was to detect the /vi/ syllables which comprised 12% of stimuli. N100m responses were measured to averaged /pi/ and /ti/ stimuli. Behavioural data showed that identification of the stimuli was faster and more accurate for Normal than for Band stimuli, and for Band than for Noise stimuli. Reaction times were faster for AV than AO stimuli. MEG data showed that in the left hemisphere, N100m to both AO and AV stimuli was largest for the Normal, smaller for Band and smallest for Noise stimuli. In the right hemisphere, Normal and Band AO stimuli elicited N100m responses of quite similar amplitudes, but N100m amplitude to Noise was about half of that. There was a reduction in N100m for the AV compared to the AO conditions. The size of this reduction for each stimulus type was same in the left hemisphere but graded in the right (being largest to the Normal, smaller to the Band and smallest to the Noise stimuli). The N100m decrease for the Normal stimuli was significantly larger in the right than in the left hemisphere. We suggest that the effect of processing visual speech seen in the right hemisphere likely reflects suppression of the auditory response based on AV cues for place of articulation.

  15. Modeling Driving Performance Using In-Vehicle Speech Data From a Naturalistic Driving Study.

    Science.gov (United States)

    Kuo, Jonny; Charlton, Judith L; Koppel, Sjaan; Rudin-Brown, Christina M; Cross, Suzanne

    2016-09-01

    We aimed to (a) describe the development and application of an automated approach for processing in-vehicle speech data from a naturalistic driving study (NDS), (b) examine the influence of child passenger presence on driving performance, and (c) model this relationship using in-vehicle speech data. Parent drivers frequently engage in child-related secondary behaviors, but the impact on driving performance is unknown. Applying automated speech-processing techniques to NDS audio data would facilitate the analysis of in-vehicle driver-child interactions and their influence on driving performance. Speech activity detection and speaker diarization algorithms were applied to audio data from a Melbourne-based NDS involving 42 families. Multilevel models were developed to evaluate the effect of speech activity and the presence of child passengers on driving performance. Speech activity was significantly associated with velocity and steering angle variability. Child passenger presence alone was not associated with changes in driving performance. However, speech activity in the presence of two child passengers was associated with the most variability in driving performance. The effects of in-vehicle speech on driving performance in the presence of child passengers appear to be heterogeneous, and multiple factors may need to be considered in evaluating their impact. This goal can potentially be achieved within large-scale NDS through the automated processing of observational data, including speech. Speech-processing algorithms enable new perspectives on driving performance to be gained from existing NDS data, and variables that were once labor-intensive to process can be readily utilized in future research. © 2016, Human Factors and Ergonomics Society.

  16. Mild developmental foreign accent syndrome and psychiatric comorbidity: Altered white matter integrity in speech and emotion regulation networks

    Directory of Open Access Journals (Sweden)

    Marcelo L Berthier

    2016-08-01

    Full Text Available Foreign accent syndrome (FAS is a speech disorder that is defined by the emergence of a peculiar manner of articulation and intonation which is perceived as foreign. In most cases of acquired FAS (AFAS the new accent is secondary to small focal lesions involving components of the bilaterally distributed neural network for speech production. In the past few years FAS has also been described in different psychiatric conditions (conversion disorder, bipolar disorder, schizophrenia as well as in developmental disorders (specific language impairment, apraxia of speech. In the present study, two adult males, one with atypical phonetic production and the other one with cluttering, reported having developmental FAS (DFAS since their adolescence. Perceptual analysis by naïve judges could not confirm the presence of foreign accent, possibly due to the mildness of the speech disorder. However, detailed linguistic analysis provided evidence of prosodic and segmental errors previously reported in AFAS cases. Cognitive testing showed reduced communication in activities of daily living and mild deficits related to psychiatric disorders. Psychiatric evaluation revealed long-lasting internalizing disorders (neuroticism, anxiety, obsessive-compulsive disorder, social phobia, depression, alexithymia, hopelessness, and apathy in both subjects. Diffusion tensor imaging (DTI data from each subject with DFAS were compared with data from a group of 21 age- and gender-matched healthy control subjects. Diffusion parameters (MD, AD, and RD in predefined regions of interest showed changes of white matter microstructure in regions previously related with AFAS and psychiatric disorders. In conclusion, the present findings militate against the possibility that these two subjects have FAS of psychogenic origin. Rather, our findings provide evidence that mild DFAS occurring in the context of subtle, yet persistent, developmental speech disorders may be associated with

  17. Effectiveness of speech language therapy either alone or with add-on computer-based language therapy software (Malayalam version) for early post stroke aphasia: A feasibility study.

    Science.gov (United States)

    Kesav, Praveen; Vrinda, S L; Sukumaran, Sajith; Sarma, P S; Sylaja, P N

    2017-09-15

    This study aimed to assess the feasibility of professional based conventional speech language therapy (SLT) either alone (Group A/less intensive) or assisted by novel computer based local language software (Group B/more intensive) for rehabilitation in early post stroke aphasia. Comprehensive Stroke Care Center of a tertiary health care institute situated in South India, with the study design being prospective open randomised controlled trial with blinded endpoint evaluation. This study recruited 24 right handed first ever acute ischemic stroke patients above 15years of age affecting middle cerebral artery territory within 90days of stroke onset with baseline Western Aphasia Battery (WAB) Aphasia Quotient (AQ) score of aphasia. Copyright © 2017 Elsevier B.V. All rights reserved.

  18. Recognizing emotional speech in Persian: a validated database of Persian emotional speech (Persian ESD).

    Science.gov (United States)

    Keshtiari, Niloofar; Kuhlmann, Michael; Eslami, Moharram; Klann-Delius, Gisela

    2015-03-01

    Research on emotional speech often requires valid stimuli for assessing perceived emotion through prosody and lexical content. To date, no comprehensive emotional speech database for Persian is officially available. The present article reports the process of designing, compiling, and evaluating a comprehensive emotional speech database for colloquial Persian. The database contains a set of 90 validated novel Persian sentences classified in five basic emotional categories (anger, disgust, fear, happiness, and sadness), as well as a neutral category. These sentences were validated in two experiments by a group of 1,126 native Persian speakers. The sentences were articulated by two native Persian speakers (one male, one female) in three conditions: (1) congruent (emotional lexical content articulated in a congruent emotional voice), (2) incongruent (neutral sentences articulated in an emotional voice), and (3) baseline (all emotional and neutral sentences articulated in neutral voice). The speech materials comprise about 470 sentences. The validity of the database was evaluated by a group of 34 native speakers in a perception test. Utterances recognized better than five times chance performance (71.4 %) were regarded as valid portrayals of the target emotions. Acoustic analysis of the valid emotional utterances revealed differences in pitch, intensity, and duration, attributes that may help listeners to correctly classify the intended emotion. The database is designed to be used as a reliable material source (for both text and speech) in future cross-cultural or cross-linguistic studies of emotional speech, and it is available for academic research purposes free of charge. To access the database, please contact the first author.

  19. Deep Learning-Based Noise Reduction Approach to Improve Speech Intelligibility for Cochlear Implant Recipients.

    Science.gov (United States)

    Lai, Ying-Hui; Tsao, Yu; Lu, Xugang; Chen, Fei; Su, Yu-Ting; Chen, Kuang-Chao; Chen, Yu-Hsuan; Chen, Li-Ching; Po-Hung Li, Lieber; Lee, Chin-Hui

    2018-01-20

    We investigate the clinical effectiveness of a novel deep learning-based noise reduction (NR) approach under noisy conditions with challenging noise types at low signal to noise ratio (SNR) levels for Mandarin-speaking cochlear implant (CI) recipients. The deep learning-based NR approach used in this study consists of two modules: noise classifier (NC) and deep denoising autoencoder (DDAE), thus termed (NC + DDAE). In a series of comprehensive experiments, we conduct qualitative and quantitative analyses on the NC module and the overall NC + DDAE approach. Moreover, we evaluate the speech recognition performance of the NC + DDAE NR and classical single-microphone NR approaches for Mandarin-speaking CI recipients under different noisy conditions. The testing set contains Mandarin sentences corrupted by two types of maskers, two-talker babble noise, and a construction jackhammer noise, at 0 and 5 dB SNR levels. Two conventional NR techniques and the proposed deep learning-based approach are used to process the noisy utterances. We qualitatively compare the NR approaches by the amplitude envelope and spectrogram plots of the processed utterances. Quantitative objective measures include (1) normalized covariance measure to test the intelligibility of the utterances processed by each of the NR approaches; and (2) speech recognition tests conducted by nine Mandarin-speaking CI recipients. These nine CI recipients use their own clinical speech processors during testing. The experimental results of objective evaluation and listening test indicate that under challenging listening conditions, the proposed NC + DDAE NR approach yields higher intelligibility scores than the two compared classical NR techniques, under both matched and mismatched training-testing conditions. When compared to the two well-known conventional NR techniques under challenging listening condition, the proposed NC + DDAE NR approach has superior noise suppression capabilities and gives less distortion

  20. Principles of speech coding

    CERN Document Server

    Ogunfunmi, Tokunbo

    2010-01-01

    It is becoming increasingly apparent that all forms of communication-including voice-will be transmitted through packet-switched networks based on the Internet Protocol (IP). Therefore, the design of modern devices that rely on speech interfaces, such as cell phones and PDAs, requires a complete and up-to-date understanding of the basics of speech coding. Outlines key signal processing algorithms used to mitigate impairments to speech quality in VoIP networksOffering a detailed yet easily accessible introduction to the field, Principles of Speech Coding provides an in-depth examination of the

  1. Optimal pattern synthesis for speech recognition based on principal component analysis

    Science.gov (United States)

    Korsun, O. N.; Poliyev, A. V.

    2018-02-01

    The algorithm for building an optimal pattern for the purpose of automatic speech recognition, which increases the probability of correct recognition, is developed and presented in this work. The optimal pattern forming is based on the decomposition of an initial pattern to principal components, which enables to reduce the dimension of multi-parameter optimization problem. At the next step the training samples are introduced and the optimal estimates for principal components decomposition coefficients are obtained by a numeric parameter optimization algorithm. Finally, we consider the experiment results that show the improvement in speech recognition introduced by the proposed optimization algorithm.

  2. The neural basis of sublexical speech and corresponding nonspeech processing: a combined EEG-MEG study.

    Science.gov (United States)

    Kuuluvainen, Soila; Nevalainen, Päivi; Sorokin, Alexander; Mittag, Maria; Partanen, Eino; Putkinen, Vesa; Seppänen, Miia; Kähkönen, Seppo; Kujala, Teija

    2014-03-01

    We addressed the neural organization of speech versus nonspeech sound processing by investigating preattentive cortical auditory processing of changes in five features of a consonant-vowel syllable (consonant, vowel, sound duration, frequency, and intensity) and their acoustically matched nonspeech counterparts in a simultaneous EEG-MEG recording of mismatch negativity (MMN/MMNm). Overall, speech-sound processing was enhanced compared to nonspeech sound processing. This effect was strongest for changes which affect word meaning (consonant, vowel, and vowel duration) in the left and for the vowel identity change in the right hemisphere also. Furthermore, in the right hemisphere, speech-sound frequency and intensity changes were processed faster than their nonspeech counterparts, and there was a trend for speech-enhancement in frequency processing. In summary, the results support the proposed existence of long-term memory traces for speech sounds in the auditory cortices, and indicate at least partly distinct neural substrates for speech and nonspeech sound processing. Copyright © 2014 Elsevier Inc. All rights reserved.

  3. Multimodal analysis of quotation in oral narratives

    NARCIS (Netherlands)

    Stec, Kashmiri; Huiskes, Mike; Redeker, Gisela

    2015-01-01

    We investigate direct speech quotation in informal oral narratives by analyzing the contribution of bodily articulators (character viewpoint gestures, character facial expression, character intonation, and the meaningful use of gaze) in three quote environments, or quote sequences – single quotes,

  4. 'That doesn't translate': the role of evidence-based practice in disempowering speech pathologists in acute aphasia management.

    Science.gov (United States)

    Foster, Abby; Worrall, Linda; Rose, Miranda; O'Halloran, Robyn

    2015-07-01

    An evidence-practice gap has been identified in current acute aphasia management practice, with the provision of services to people with aphasia in the acute hospital widely considered in the literature to be inconsistent with best-practice recommendations. The reasons for this evidence-practice gap are unclear; however, speech pathologists practising in this setting have articulated a sense of dissonance regarding their limited service provision to this population. A clearer understanding of why this evidence-practice gap exists is essential in order to support and promote evidence-based approaches to the care of people with aphasia in acute care settings. To provide an understanding of speech pathologists' conceptualization of evidence-based practice for acute post-stroke aphasia, and its implementation. This study adopted a phenomenological approach, underpinned by a social constructivist paradigm. In-depth interviews were conducted with 14 Australian speech pathologists, recruited using a purposive sampling technique. An inductive thematic analysis of the data was undertaken. A single, overarching theme emerged from the data. Speech pathologists demonstrated a sense of disempowerment as a result of their relationship with evidence-based practice for acute aphasia management. Three subthemes contributed to this theme. The first described a restricted conceptualization of evidence-based practice. The second revealed speech pathologists' strained relationships with the research literature. The third elucidated a sense of professional unease over their perceived inability to enact evidence-based clinical recommendations, despite their desire to do so. Speech pathologists identified a current knowledge-practice gap in their management of aphasia in acute hospital settings. Speech pathologists place significant emphasis on the research evidence; however, their engagement with the research is limited, in part because it is perceived to lack clinical utility. A sense

  5. Automated speech quality monitoring tool based on perceptual evaluation

    OpenAIRE

    Vozňák, Miroslav; Rozhon, Jan

    2010-01-01

    The paper deals with a speech quality monitoring tool which we have developed in accordance with PESQ (Perceptual Evaluation of Speech Quality) and is automatically running and calculating the MOS (Mean Opinion Score). Results are stored into database and used in a research project investigating how meteorological conditions influence the speech quality in a GSM network. The meteorological station, which is located in our university campus provides information about a temperature,...

  6. Particle swarm optimization based feature enhancement and feature selection for improved emotion recognition in speech and glottal signals.

    Science.gov (United States)

    Muthusamy, Hariharan; Polat, Kemal; Yaacob, Sazali

    2015-01-01

    In the recent years, many research works have been published using speech related features for speech emotion recognition, however, recent studies show that there is a strong correlation between emotional states and glottal features. In this work, Mel-frequency cepstralcoefficients (MFCCs), linear predictive cepstral coefficients (LPCCs), perceptual linear predictive (PLP) features, gammatone filter outputs, timbral texture features, stationary wavelet transform based timbral texture features and relative wavelet packet energy and entropy features were extracted from the emotional speech (ES) signals and its glottal waveforms(GW). Particle swarm optimization based clustering (PSOC) and wrapper based particle swarm optimization (WPSO) were proposed to enhance the discerning ability of the features and to select the discriminating features respectively. Three different emotional speech databases were utilized to gauge the proposed method. Extreme learning machine (ELM) was employed to classify the different types of emotions. Different experiments were conducted and the results show that the proposed method significantly improves the speech emotion recognition performance compared to previous works published in the literature.

  7. The design of a device for hearer and feeler differentiation, part A. [speech modulated hearing device

    Science.gov (United States)

    Creecy, R.

    1974-01-01

    A speech modulated white noise device is reported that gives the rhythmic characteristics of a speech signal for intelligible reception by deaf persons. The signal is composed of random amplitudes and frequencies as modulated by the speech envelope characteristics of rhythm and stress. Time intensity parameters of speech are conveyed through the vibro-tactile sensation stimuli.

  8. Speech Intelligibility Potential of General and Specialized Deep Neural Network Based Speech Enhancement Systems

    DEFF Research Database (Denmark)

    Kolbæk, Morten; Tan, Zheng-Hua; Jensen, Jesper

    2017-01-01

    In this paper, we study aspects of single microphone speech enhancement (SE) based on deep neural networks (DNNs). Specifically, we explore the generalizability capabilities of state-of-the-art DNN-based SE systems with respect to the background noise type, the gender of the target speaker...... general. Finally, we compare how a DNN-based SE system trained to be noise type general, speaker general, and SNR general performs relative to a state-of-the-art short-time spectral amplitude minimum mean square error (STSA-MMSE) based SE algorithm. We show that DNN-based SE systems, when trained...... a state-of-the-art STSA-MMSE based SE method, when tested using a range of unseen speakers and noise types. Finally, a listening test using several DNN-based SE systems tested in unseen speaker conditions show that these systems can improve SI for some SNR and noise type configurations but degrade SI...

  9. Intra-oral pressure-based voicing control of electrolaryngeal speech with intra-oral vibrator.

    Science.gov (United States)

    Takahashi, Hirokazu; Nakao, Masayuki; Kikuchi, Yataro; Kaga, Kimitaka

    2008-07-01

    In normal speech, coordinated activities of intrinsic laryngeal muscles suspend a glottal sound at utterance of voiceless consonants, automatically realizing a voicing control. In electrolaryngeal speech, however, the lack of voicing control is one of the causes of unclear voice, voiceless consonants tending to be misheard as the corresponding voiced consonants. In the present work, we developed an intra-oral vibrator with an intra-oral pressure sensor that detected utterance of voiceless phonemes during the intra-oral electrolaryngeal speech, and demonstrated that an intra-oral pressure-based voicing control could improve the intelligibility of the speech. The test voices were obtained from one electrolaryngeal speaker and one normal speaker. We first investigated on the speech analysis software how a voice onset time (VOT) and first formant (F1) transition of the test consonant-vowel syllables contributed to voiceless/voiced contrasts, and developed an adequate voicing control strategy. We then compared the intelligibility of consonant-vowel syllables among the intra-oral electrolaryngeal speech with and without online voicing control. The increase of intra-oral pressure, typically with a peak ranging from 10 to 50 gf/cm2, could reliably identify utterance of voiceless consonants. The speech analysis and intelligibility test then demonstrated that a short VOT caused the misidentification of the voiced consonants due to a clear F1 transition. Finally, taking these results together, the online voicing control, which suspended the prosthetic tone while the intra-oral pressure exceeded 2.5 gf/cm2 and during the 35 milliseconds that followed, proved efficient to improve the voiceless/voiced contrast.

  10. High-frequency energy in singing and speech

    Science.gov (United States)

    Monson, Brian Bruce

    While human speech and the human voice generate acoustical energy up to (and beyond) 20 kHz, the energy above approximately 5 kHz has been largely neglected. Evidence is accruing that this high-frequency energy contains perceptual information relevant to speech and voice, including percepts of quality, localization, and intelligibility. The present research was an initial step in the long-range goal of characterizing high-frequency energy in singing voice and speech, with particular regard for its perceptual role and its potential for modification during voice and speech production. In this study, a database of high-fidelity recordings of talkers was created and used for a broad acoustical analysis and general characterization of high-frequency energy, as well as specific characterization of phoneme category, voice and speech intensity level, and mode of production (speech versus singing) by high-frequency energy content. Directionality of radiation of high-frequency energy from the mouth was also examined. The recordings were used for perceptual experiments wherein listeners were asked to discriminate between speech and voice samples that differed only in high-frequency energy content. Listeners were also subjected to gender discrimination tasks, mode-of-production discrimination tasks, and transcription tasks with samples of speech and singing that contained only high-frequency content. The combination of these experiments has revealed that (1) human listeners are able to detect very subtle level changes in high-frequency energy, and (2) human listeners are able to extract significant perceptual information from high-frequency energy.

  11. Visual speech information: a help or hindrance in perceptual processing of dysarthric speech.

    Science.gov (United States)

    Borrie, Stephanie A

    2015-03-01

    This study investigated the influence of visual speech information on perceptual processing of neurologically degraded speech. Fifty listeners identified spastic dysarthric speech under both audio (A) and audiovisual (AV) conditions. Condition comparisons revealed that the addition of visual speech information enhanced processing of the neurologically degraded input in terms of (a) acuity (percent phonemes correct) of vowels and consonants and (b) recognition (percent words correct) of predictive and nonpredictive phrases. Listeners exploited stress-based segmentation strategies more readily in AV conditions, suggesting that the perceptual benefit associated with adding visual speech information to the auditory signal-the AV advantage-has both segmental and suprasegmental origins. Results also revealed that the magnitude of the AV advantage can be predicted, to some degree, by the extent to which an individual utilizes syllabic stress cues to inform word recognition in AV conditions. Findings inform the development of a listener-specific model of speech perception that applies to processing of dysarthric speech in everyday communication contexts.

  12. Neural Entrainment to Speech Modulates Speech Intelligibility

    NARCIS (Netherlands)

    Riecke, Lars; Formisano, Elia; Sorger, Bettina; Baskent, Deniz; Gaudrain, Etienne

    2018-01-01

    Speech is crucial for communication in everyday life. Speech-brain entrainment, the alignment of neural activity to the slow temporal fluctuations (envelope) of acoustic speech input, is a ubiquitous element of current theories of speech processing. Associations between speech-brain entrainment and

  13. Improving Understanding of Emotional Speech Acoustic Content

    Science.gov (United States)

    Tinnemore, Anna

    Children with cochlear implants show deficits in identifying emotional intent of utterances without facial or body language cues. A known limitation to cochlear implants is the inability to accurately portray the fundamental frequency contour of speech which carries the majority of information needed to identify emotional intent. Without reliable access to the fundamental frequency, other methods of identifying vocal emotion, if identifiable, could be used to guide therapies for training children with cochlear implants to better identify vocal emotion. The current study analyzed recordings of adults speaking neutral sentences with a set array of emotions in a child-directed and adult-directed manner. The goal was to identify acoustic cues that contribute to emotion identification that may be enhanced in child-directed speech, but are also present in adult-directed speech. Results of this study showed that there were significant differences in the variation of the fundamental frequency, the variation of intensity, and the rate of speech among emotions and between intended audiences.

  14. Categorical speech processing in Broca's area: an fMRI study using multivariate pattern-based analysis.

    Science.gov (United States)

    Lee, Yune-Sang; Turkeltaub, Peter; Granger, Richard; Raizada, Rajeev D S

    2012-03-14

    Although much effort has been directed toward understanding the neural basis of speech processing, the neural processes involved in the categorical perception of speech have been relatively less studied, and many questions remain open. In this functional magnetic resonance imaging (fMRI) study, we probed the cortical regions mediating categorical speech perception using an advanced brain-mapping technique, whole-brain multivariate pattern-based analysis (MVPA). Normal healthy human subjects (native English speakers) were scanned while they listened to 10 consonant-vowel syllables along the /ba/-/da/ continuum. Outside of the scanner, individuals' own category boundaries were measured to divide the fMRI data into /ba/ and /da/ conditions per subject. The whole-brain MVPA revealed that Broca's area and the left pre-supplementary motor area evoked distinct neural activity patterns between the two perceptual categories (/ba/ vs /da/). Broca's area was also found when the same analysis was applied to another dataset (Raizada and Poldrack, 2007), which previously yielded the supramarginal gyrus using a univariate adaptation-fMRI paradigm. The consistent MVPA findings from two independent datasets strongly indicate that Broca's area participates in categorical speech perception, with a possible role of translating speech signals into articulatory codes. The difference in results between univariate and multivariate pattern-based analyses of the same data suggest that processes in different cortical areas along the dorsal speech perception stream are distributed on different spatial scales.

  15. The speech perception skills of children with and without speech sound disorder.

    Science.gov (United States)

    Hearnshaw, Stephanie; Baker, Elise; Munro, Natalie

    To investigate whether Australian-English speaking children with and without speech sound disorder (SSD) differ in their overall speech perception accuracy. Additionally, to investigate differences in the perception of specific phonemes and the association between speech perception and speech production skills. Twenty-five Australian-English speaking children aged 48-60 months participated in this study. The SSD group included 12 children and the typically developing (TD) group included 13 children. Children completed routine speech and language assessments in addition to an experimental Australian-English lexical and phonetic judgement task based on Rvachew's Speech Assessment and Interactive Learning System (SAILS) program (Rvachew, 2009). This task included eight words across four word-initial phonemes-/k, ɹ, ʃ, s/. Children with SSD showed significantly poorer perceptual accuracy on the lexical and phonetic judgement task compared with TD peers. The phonemes /ɹ/ and /s/ were most frequently perceived in error across both groups. Additionally, the phoneme /ɹ/ was most commonly produced in error. There was also a positive correlation between overall speech perception and speech production scores. Children with SSD perceived speech less accurately than their typically developing peers. The findings suggest that an Australian-English variation of a lexical and phonetic judgement task similar to the SAILS program is promising and worthy of a larger scale study. Copyright © 2017 Elsevier Inc. All rights reserved.

  16. Acquiring native-like intonation in Dutch and Spanish : Comparing the L1 and L2 of native speakers and second language learners

    NARCIS (Netherlands)

    van Maastricht, L.J.; Swerts, M.G.J.; Krahmer, E.J.

    2013-01-01

    ACQUIRING NATIVE-LIKE INTONATION IN DUTCH AND SPANISH Comparing the L1 and L2 of native speakers and second language learners Introduction Learning more about the interaction between the native language (L1) and the target language (L2) has been the aim of many studies on second language acquisition

  17. Speech-based emotion detection in a resource-scarce environment

    CSIR Research Space (South Africa)

    Martirosian, O

    2007-11-01

    Full Text Available , happiness and frustration; passive emotion encompasses sadness and dis- appointment, and neutral encompasses speech with a negligible amount of emotional content. Because a study on the expression of emotion in speech has not been done in the South... seconds long and the segments labelled with the dominant emotion of the speech contained in them. The fine emotional labels used were angry, frustrated, happy, friendly, neutral, sad and depressed. These fine labels were combined into three broad...

  18. Speech-Language Pathologists' and Teachers' Perceptions of Classroom-Based Interventions.

    Science.gov (United States)

    Beck, Ann R.; Dennis, Marcia

    1997-01-01

    Speech-language pathologists (N=21) and teachers (N=54) were surveyed regarding their perceptions of classroom-based interventions. The two groups agreed about the primary advantages and disadvantages of most interventions, the primary areas of difference being classroom management and ease of data collection. Other findings indicated few…

  19. 78 FR 63152 - Telecommunications Relay Services and Speech-to-Speech Services for Individuals With Hearing and...

    Science.gov (United States)

    2013-10-23

    ...] Telecommunications Relay Services and Speech-to-Speech Services for Individuals With Hearing and Speech Disabilities... for telecommunications relay services (TRS) by eliminating standards for Internet-based relay services... comments, identified by CG Docket No. 03-123, by any of the following methods: Electronic Filers: Comments...

  20. The speech-based envelope power spectrum model (sEPSM) family: Development, achievements, and current challenges

    DEFF Research Database (Denmark)

    Relano-Iborra, Helia; Chabot-Leclerc, Alexandre; Scheidiger, Christoph

    2017-01-01

    have extended the predictive power of the original model to a broad range of conditions. This contribution presents the most recent developments within the sEPSM “family:” (i) A binaural extension, the B-sEPSM [Chabot-Leclerc et al. (2016). J. Acoust. Soc. Am. 140(1), 192-205] which combines better......Intelligibility models provide insights regarding the effects of target speech characteristics, transmission channels and/or auditory processing on the speech perception performance of listeners. In 2011, Jørgensen and Dau proposed the speech-based envelope power spectrum model [sEPSM, Jørgensen...

  1. Hate speech

    Directory of Open Access Journals (Sweden)

    Anne Birgitta Nilsen

    2014-12-01

    Full Text Available The manifesto of the Norwegian terrorist Anders Behring Breivik is based on the “Eurabia” conspiracy theory. This theory is a key starting point for hate speech amongst many right-wing extremists in Europe, but also has ramifications beyond these environments. In brief, proponents of the Eurabia theory claim that Muslims are occupying Europe and destroying Western culture, with the assistance of the EU and European governments. By contrast, members of Al-Qaeda and other extreme Islamists promote the conspiracy theory “the Crusade” in their hate speech directed against the West. Proponents of the latter theory argue that the West is leading a crusade to eradicate Islam and Muslims, a crusade that is similarly facilitated by their governments. This article presents analyses of texts written by right-wing extremists and Muslim extremists in an effort to shed light on how hate speech promulgates conspiracy theories in order to spread hatred and intolerance.The aim of the article is to contribute to a more thorough understanding of hate speech’s nature by applying rhetorical analysis. Rhetorical analysis is chosen because it offers a means of understanding the persuasive power of speech. It is thus a suitable tool to describe how hate speech works to convince and persuade. The concepts from rhetorical theory used in this article are ethos, logos and pathos. The concept of ethos is used to pinpoint factors that contributed to Osama bin Laden's impact, namely factors that lent credibility to his promotion of the conspiracy theory of the Crusade. In particular, Bin Laden projected common sense, good morals and good will towards his audience. He seemed to have coherent and relevant arguments; he appeared to possess moral credibility; and his use of language demonstrated that he wanted the best for his audience.The concept of pathos is used to define hate speech, since hate speech targets its audience's emotions. In hate speech it is the

  2. Improving Language Models in Speech-Based Human-Machine Interaction

    Directory of Open Access Journals (Sweden)

    Raquel Justo

    2013-02-01

    Full Text Available This work focuses on speech-based human-machine interaction. Specifically, a Spoken Dialogue System (SDS that could be integrated into a robot is considered. Since Automatic Speech Recognition is one of the most sensitive tasks that must be confronted in such systems, the goal of this work is to improve the results obtained by this specific module. In order to do so, a hierarchical Language Model (LM is considered. Different series of experiments were carried out using the proposed models over different corpora and tasks. The results obtained show that these models provide greater accuracy in the recognition task. Additionally, the influence of the Acoustic Modelling (AM in the improvement percentage of the Language Models has also been explored. Finally the use of hierarchical Language Models in a language understanding task has been successfully employed, as shown in an additional series of experiments.

  3. An exploratory study on the driving method of speech synthesis based on the human eye reading imaging data

    Science.gov (United States)

    Gao, Pei-pei; Liu, Feng

    2016-10-01

    With the development of information technology and artificial intelligence, speech synthesis plays a significant role in the fields of Human-Computer Interaction Techniques. However, the main problem of current speech synthesis techniques is lacking of naturalness and expressiveness so that it is not yet close to the standard of natural language. Another problem is that the human-computer interaction based on the speech synthesis is too monotonous to realize mechanism of user subjective drive. This thesis introduces the historical development of speech synthesis and summarizes the general process of this technique. It is pointed out that prosody generation module is an important part in the process of speech synthesis. On the basis of further research, using eye activity rules when reading to control and drive prosody generation was introduced as a new human-computer interaction method to enrich the synthetic form. In this article, the present situation of speech synthesis technology is reviewed in detail. Based on the premise of eye gaze data extraction, using eye movement signal in real-time driving, a speech synthesis method which can express the real speech rhythm of the speaker is proposed. That is, when reader is watching corpora with its eyes in silent reading, capture the reading information such as the eye gaze duration per prosodic unit, and establish a hierarchical prosodic pattern of duration model to determine the duration parameters of synthesized speech. At last, after the analysis, the feasibility of the above method is verified.

  4. OLIVE: Speech-Based Video Retrieval

    NARCIS (Netherlands)

    de Jong, Franciska M.G.; Gauvain, Jean-Luc; den Hartog, Jurgen; den Hartog, Jeremy; Netter, Klaus

    1999-01-01

    This paper describes the Olive project which aims to support automated indexing of video material by use of human language technologies. Olive is making use of speech recognition to automatically derive transcriptions of the sound tracks, generating time-coded linguistic elements which serve as the

  5. Magnified Neural Envelope Coding Predicts Deficits in Speech Perception in Noise.

    Science.gov (United States)

    Millman, Rebecca E; Mattys, Sven L; Gouws, André D; Prendergast, Garreth

    2017-08-09

    Verbal communication in noisy backgrounds is challenging. Understanding speech in background noise that fluctuates in intensity over time is particularly difficult for hearing-impaired listeners with a sensorineural hearing loss (SNHL). The reduction in fast-acting cochlear compression associated with SNHL exaggerates the perceived fluctuations in intensity in amplitude-modulated sounds. SNHL-induced changes in the coding of amplitude-modulated sounds may have a detrimental effect on the ability of SNHL listeners to understand speech in the presence of modulated background noise. To date, direct evidence for a link between magnified envelope coding and deficits in speech identification in modulated noise has been absent. Here, magnetoencephalography was used to quantify the effects of SNHL on phase locking to the temporal envelope of modulated noise (envelope coding) in human auditory cortex. Our results show that SNHL enhances the amplitude of envelope coding in posteromedial auditory cortex, whereas it enhances the fidelity of envelope coding in posteromedial and posterolateral auditory cortex. This dissociation was more evident in the right hemisphere, demonstrating functional lateralization in enhanced envelope coding in SNHL listeners. However, enhanced envelope coding was not perceptually beneficial. Our results also show that both hearing thresholds and, to a lesser extent, magnified cortical envelope coding in left posteromedial auditory cortex predict speech identification in modulated background noise. We propose a framework in which magnified envelope coding in posteromedial auditory cortex disrupts the segregation of speech from background noise, leading to deficits in speech perception in modulated background noise. SIGNIFICANCE STATEMENT People with hearing loss struggle to follow conversations in noisy environments. Background noise that fluctuates in intensity over time poses a particular challenge. Using magnetoencephalography, we demonstrate

  6. Commercial speech in crisis: Crisis Pregnancy Center regulations and definitions of commercial speech.

    Science.gov (United States)

    Gilbert, Kathryn E

    2013-02-01

    Recent attempts to regulate Crisis Pregnancy Centers, pseudoclinics that surreptitiously aim to dissuade pregnant women from choosing abortion, have confronted the thorny problem of how to define commercial speech. The Supreme Court has offered three potential answers to this definitional quandary. This Note uses the Crisis Pregnancy Center cases to demonstrate that courts should use one of these solutions, the factor-based approach of Bolger v. Youngs Drugs Products Corp., to define commercial speech in the Crisis Pregnancy Center cases and elsewhere. In principle and in application, the Bolger factor-based approach succeeds in structuring commercial speech analysis at the margins of the doctrine.

  7. Statistical investigations into isiZulu intonation

    CSIR Research Space (South Africa)

    Kuun, C

    2005-11-01

    Full Text Available in Japanese and English. In Phonology Yearbook 3, pages 255?309, 1986. [2] A. Black, P. Taylor, and R. Caley. The Festival speech synthesis system, 1999. http://festvox.org/festival/. [3] Paul Boersma. Accurate short-term analysis of the fun- damental...

  8. 75 FR 67333 - Telecommunications Relay Services and Speech-to-Speech Services for Individuals With Hearing and...

    Science.gov (United States)

    2010-11-02

    ... Docket No. 10-191; FCC 10-161] Telecommunications Relay Services and Speech-to-Speech Services for...-Based Telecommunications Relay Service Numbering AGENCY: Federal Communications Commission. ACTION... Internet-based Telecommunications Relay Service (iTRS), specifically, Video Relay Service (VRS) and IP...

  9. A multi-resolution envelope-power based model for speech intelligibility

    DEFF Research Database (Denmark)

    Jørgensen, Søren; Ewert, Stephan D.; Dau, Torsten

    2013-01-01

    The speech-based envelope power spectrum model (sEPSM) presented by Jørgensen and Dau [(2011). J. Acoust. Soc. Am. 130, 1475-1487] estimates the envelope power signal-to-noise ratio (SNRenv) after modulation-frequency selective processing. Changes in this metric were shown to account well...... to conditions with stationary interferers, due to the long-term integration of the envelope power, and cannot account for increased intelligibility typically obtained with fluctuating maskers. Here, a multi-resolution version of the sEPSM is presented where the SNRenv is estimated in temporal segments...... with a modulation-filter dependent duration. The multi-resolution sEPSM is demonstrated to account for intelligibility obtained in conditions with stationary and fluctuating interferers, and noisy speech distorted by reverberation or spectral subtraction. The results support the hypothesis that the SNRenv...

  10. Nonverbal Communication in the Contemporary Operating Environment

    Science.gov (United States)

    2009-01-01

    relationship to speech and be affected by issues of stress, intonation, and pacing (Kendon, 1981). Bilinguals generally gesture more than monolinguals ...a stranger: Some commonly used nonverbal signals of aversiveness. Semiotica, 22(3/4), 351-367. Givens, D. B. (2006). The nonverbal dictionary of

  11. Inner Speech's Relationship With Overt Speech in Poststroke Aphasia.

    Science.gov (United States)

    Stark, Brielle C; Geva, Sharon; Warburton, Elizabeth A

    2017-09-18

    Relatively preserved inner speech alongside poor overt speech has been documented in some persons with aphasia (PWA), but the relationship of overt speech with inner speech is still largely unclear, as few studies have directly investigated these factors. The present study investigates the relationship of relatively preserved inner speech in aphasia with selected measures of language and cognition. Thirty-eight persons with chronic aphasia (27 men, 11 women; average age 64.53 ± 13.29 years, time since stroke 8-111 months) were classified as having relatively preserved inner and overt speech (n = 21), relatively preserved inner speech with poor overt speech (n = 8), or not classified due to insufficient measurements of inner and/or overt speech (n = 9). Inner speech scores (by group) were correlated with selected measures of language and cognition from the Comprehensive Aphasia Test (Swinburn, Porter, & Al, 2004). The group with poor overt speech showed a significant relationship of inner speech with overt naming (r = .95, p speech and language and cognition factors were not significant for the group with relatively good overt speech. As in previous research, we show that relatively preserved inner speech is found alongside otherwise severe production deficits in PWA. PWA with poor overt speech may rely more on preserved inner speech for overt picture naming (perhaps due to shared resources with verbal working memory) and for written picture description (perhaps due to reliance on inner speech due to perceived task difficulty). Assessments of inner speech may be useful as a standard component of aphasia screening, and therapy focused on improving and using inner speech may prove clinically worthwhile. https://doi.org/10.23641/asha.5303542.

  12. Lexical Tones in Mandarin Chinese Infant-Directed Speech: Age-Related Changes in the Second Year of Life

    Directory of Open Access Journals (Sweden)

    Mengru Han

    2018-04-01

    Full Text Available Tonal information is essential to early word learning in tone languages. Although numerous studies have investigated the intonational and segmental properties of infant-directed speech (IDS, only a few studies have explored the properties of lexical tones in IDS. These studies mostly focused on the first year of life; thus little is known about how lexical tones in IDS change as children’s vocabulary acquisition accelerates in the second year (Goldfield and Reznick, 1990; Bloom, 2001. The present study examines whether Mandarin Chinese mothers hyperarticulate lexical tones in IDS addressing 18- and 24-month-old children—at which age children are learning words at a rapid speed—vs. adult-directed speech (ADS. Thirty-nine Mandarin Chinese–speaking mothers were tested in a semi-spontaneous picture-book-reading task, in which they told the same story to their child (IDS condition and to an adult (ADS condition. Results for the F0 measurements (minimum F0, maximum F0, and F0 range of tone in the speech data revealed a continuum of differences among IDS addressing 18-month-olds, IDS addressing 24-month-olds, and ADS. Lexical tones in IDS addressing 18-month-old children had a higher minimum F0, higher maximum F0, and larger pitch range than lexical tones in ADS. Lexical tones in IDS addressing 24-month-old children showed more similarity to ADS tones with respect to pitch height: there were no differences in minimum F0 and maximum F0 between ADS and IDS. However, F0 range was still larger. These results suggest that lexical tones are generally hyperarticulated in Mandarin Chinese IDS addressing 18- and 24- month-old children despite the change in pitch level over time. Mandarin Chinese mothers hyperarticulate lexical tones in IDS when talking to toddlers and potentially facilitate tone acquisition and word learning.

  13. The Effect of English Verbal Songs on Connected Speech Aspects of Adult English Learners’ Speech Production

    Directory of Open Access Journals (Sweden)

    Farshid Tayari Ashtiani

    2015-02-01

    Full Text Available The present study was an attempt to investigate the impact of English verbal songs on connected speech aspects of adult English learners’ speech production. 40 participants were selected based on the results of their performance in a piloted and validated version of NELSON test given to 60 intermediate English learners in a language institute in Tehran. Then they were equally distributed in two control and experimental groups and received a validated pretest of reading aloud and speaking in English. Afterward, the treatment was performed in 18 sessions by singing preselected songs culled based on some criteria such as popularity, familiarity, amount, and speed of speech delivery, etc. In the end, the posttests of reading aloud and speaking in English were administered. The results revealed that the treatment had statistically positive effects on the connected speech aspects of English learners’ speech production at statistical .05 level of significance. Meanwhile, the results represented that there was not any significant difference between the experimental group’s mean scores on the posttests of reading aloud and speaking. It was thus concluded that providing the EFL learners with English verbal songs could positively affect connected speech aspects of both modes of speech production, reading aloud and speaking. The Findings of this study have pedagogical implications for language teachers to be more aware and knowledgeable of the benefits of verbal songs to promote speech production of language learners in terms of naturalness and fluency. Keywords: English Verbal Songs, Connected Speech, Speech Production, Reading Aloud, Speaking

  14. HMM Adaptation for child speech synthesis

    CSIR Research Space (South Africa)

    Govender, Avashna

    2015-09-01

    Full Text Available Hidden Markov Model (HMM)-based synthesis in combination with speaker adaptation has proven to be an approach that is well-suited for child speech synthesis. This paper describes the development and evaluation of different HMM-based child speech...

  15. Predicting Speech Intelligibility Decline in Amyotrophic Lateral Sclerosis Based on the Deterioration of Individual Speech Subsystems

    Science.gov (United States)

    Yunusova, Yana; Wang, Jun; Zinman, Lorne; Pattee, Gary L.; Berry, James D.; Perry, Bridget; Green, Jordan R.

    2016-01-01

    Purpose To determine the mechanisms of speech intelligibility impairment due to neurologic impairments, intelligibility decline was modeled as a function of co-occurring changes in the articulatory, resonatory, phonatory, and respiratory subsystems. Method Sixty-six individuals diagnosed with amyotrophic lateral sclerosis (ALS) were studied longitudinally. The disease-related changes in articulatory, resonatory, phonatory, and respiratory subsystems were quantified using multiple instrumental measures, which were subjected to a principal component analysis and mixed effects models to derive a set of speech subsystem predictors. A stepwise approach was used to select the best set of subsystem predictors to model the overall decline in intelligibility. Results Intelligibility was modeled as a function of five predictors that corresponded to velocities of lip and jaw movements (articulatory), number of syllable repetitions in the alternating motion rate task (articulatory), nasal airflow (resonatory), maximum fundamental frequency (phonatory), and speech pauses (respiratory). The model accounted for 95.6% of the variance in intelligibility, among which the articulatory predictors showed the most substantial independent contribution (57.7%). Conclusion Articulatory impairments characterized by reduced velocities of lip and jaw movements and resonatory impairments characterized by increased nasal airflow served as the subsystem predictors of the longitudinal decline of speech intelligibility in ALS. Declines in maximum performance tasks such as the alternating motion rate preceded declines in intelligibility, thus serving as early predictors of bulbar dysfunction. Following the rapid decline in speech intelligibility, a precipitous decline in maximum performance tasks subsequently occurred. PMID:27148967

  16. Speech Emotion Recognition Based on Power Normalized Cepstral Coefficients in Noisy Conditions

    Directory of Open Access Journals (Sweden)

    M. Bashirpour

    2016-09-01

    Full Text Available Automatic recognition of speech emotional states in noisy conditions has become an important research topic in the emotional speech recognition area, in recent years. This paper considers the recognition of emotional states via speech in real environments. For this task, we employ the power normalized cepstral coefficients (PNCC in a speech emotion recognition system. We investigate its performance in emotion recognition using clean and noisy speech materials and compare it with the performances of the well-known MFCC, LPCC, RASTA-PLP, and also TEMFCC features. Speech samples are extracted from the Berlin emotional speech database (Emo DB and Persian emotional speech database (Persian ESD which are corrupted with 4 different noise types under various SNR levels. The experiments are conducted in clean train/noisy test scenarios to simulate practical conditions with noise sources. Simulation results show that higher recognition rates are achieved for PNCC as compared with the conventional features under noisy conditions.

  17. Musician advantage for speech-on-speech perception

    NARCIS (Netherlands)

    Başkent, Deniz; Gaudrain, Etienne

    Evidence for transfer of musical training to better perception of speech in noise has been mixed. Unlike speech-in-noise, speech-on-speech perception utilizes many of the skills that musical training improves, such as better pitch perception and stream segregation, as well as use of higher-level

  18. Evidence-based speech-language pathology practices in schools: findings from a national survey.

    Science.gov (United States)

    Hoffman, Lavae M; Ireland, Marie; Hall-Mills, Shannon; Flynn, Perry

    2013-07-01

    This study documented evidence-based practice (EBP) patterns as reported by speech-language pathologists (SLPs) employed in public schools during 2010-2011. Using an online survey, practioners reported their EBP training experiences, resources available in their workplaces, and the frequency with which they engage in specific EBP activities, as well as their resource needs and future training format preferences. A total of 2,762 SLPs in 28 states participated in the online survey, 85% of whom reported holding the Certificate of Clinical Competence in Speech-Language Pathology credential. Results revealed that one quarter of survey respondents had no formal training in EBP, 11% of SLPs worked in school districts with official EBP procedural guidelines, and 91% had no scheduled time to support EBP activities. The majority of SLPs posed and researched 0 to 2 EBP questions per year and read 0 to 4 American Speech-Language-Hearing Association (ASHA) journal articles per year on either assessment or intervention topics. Use of ASHA online resources and engagement in EBP activities were documented to be low. However, results also revealed that school-based SLPs have high interest in additional training and resources to support scientifically based practices. Suggestions for enhancing EBP support in public schools and augmenting knowledge transfer are provided.

  19. Teaching evidence-based practice (EBP) to speech-language therapy students : are students competent and confident EBP users?

    NARCIS (Netherlands)

    N. van Dijk; B. Spek; M Wieringa-de Waard; C. Lucas

    2013-01-01

    BACKGROUND: The importance and value of the principles of evidence-based practice (EBP) in the decision-making process is recognized by speech-language therapists (SLTs) worldwide and as a result curricula for speech-language therapy students incorporated EBP principles. However, the willingness

  20. Singing in groups for Parkinson's disease (SING-PD): a pilot study of group singing therapy for PD-related voice/speech disorders.

    Science.gov (United States)

    Shih, Ludy C; Piel, Jordan; Warren, Amanda; Kraics, Lauren; Silver, Althea; Vanderhorst, Veronique; Simon, David K; Tarsy, Daniel

    2012-06-01

    Parkinson's disease related speech and voice impairment have significant impact on quality of life measures. LSVT(®)LOUD voice and speech therapy (Lee Silverman Voice Therapy) has demonstrated scientific efficacy and clinical effectiveness, but musically based voice and speech therapy has been underexplored as a potentially useful method of rehabilitation. We undertook a pilot, open-label study of a group-based singing intervention, consisting of twelve 90-min weekly sessions led by a voice and speech therapist/singing instructor. The primary outcome measure of vocal loudness as measured by sound pressure level (SPL) at 50 cm during connected speech was not significantly different one week after the intervention or at 13 weeks after the intervention. A number of secondary measures reflecting pitch range, phonation time and maximum loudness also were unchanged. Voice related quality of life (VRQOL) and voice handicap index (VHI) also were unchanged. This study suggests that a group singing therapy intervention at this intensity and frequency does not result in significant improvement in objective and subject-rated measures of voice and speech impairment. Copyright © 2012 Elsevier Ltd. All rights reserved.

  1. Joint Dictionary Learning-Based Non-Negative Matrix Factorization for Voice Conversion to Improve Speech Intelligibility After Oral Surgery.

    Science.gov (United States)

    Fu, Szu-Wei; Li, Pei-Chun; Lai, Ying-Hui; Yang, Cheng-Chien; Hsieh, Li-Chun; Tsao, Yu

    2017-11-01

    Objective: This paper focuses on machine learning based voice conversion (VC) techniques for improving the speech intelligibility of surgical patients who have had parts of their articulators removed. Because of the removal of parts of the articulator, a patient's speech may be distorted and difficult to understand. To overcome this problem, VC methods can be applied to convert the distorted speech such that it is clear and more intelligible. To design an effective VC method, two key points must be considered: 1) the amount of training data may be limited (because speaking for a long time is usually difficult for postoperative patients); 2) rapid conversion is desirable (for better communication). Methods: We propose a novel joint dictionary learning based non-negative matrix factorization (JD-NMF) algorithm. Compared to conventional VC techniques, JD-NMF can perform VC efficiently and effectively with only a small amount of training data. Results: The experimental results demonstrate that the proposed JD-NMF method not only achieves notably higher short-time objective intelligibility (STOI) scores (a standardized objective intelligibility evaluation metric) than those obtained using the original unconverted speech but is also significantly more efficient and effective than a conventional exemplar-based NMF VC method. Conclusion: The proposed JD-NMF method may outperform the state-of-the-art exemplar-based NMF VC method in terms of STOI scores under the desired scenario. Significance: We confirmed the advantages of the proposed joint training criterion for the NMF-based VC. Moreover, we verified that the proposed JD-NMF can effectively improve the speech intelligibility scores of oral surgery patients. Objective: This paper focuses on machine learning based voice conversion (VC) techniques for improving the speech intelligibility of surgical patients who have had parts of their articulators removed. Because of the removal of parts of the articulator, a patient

  2. Speech Synthesis Applied to Language Teaching.

    Science.gov (United States)

    Sherwood, Bruce

    1981-01-01

    The experimental addition of speech output to computer-based Esperanto lessons using speech synthesized from text is described. Because of Esperanto's phonetic spelling and simple rhythm, it is particularly easy to describe the mechanisms of Esperanto synthesis. Attention is directed to how the text-to-speech conversion is performed and the ways…

  3. Cortical activity patterns predict robust speech discrimination ability in noise

    Science.gov (United States)

    Shetake, Jai A.; Wolf, Jordan T.; Cheung, Ryan J.; Engineer, Crystal T.; Ram, Satyananda K.; Kilgard, Michael P.

    2012-01-01

    The neural mechanisms that support speech discrimination in noisy conditions are poorly understood. In quiet conditions, spike timing information appears to be used in the discrimination of speech sounds. In this study, we evaluated the hypothesis that spike timing is also used to distinguish between speech sounds in noisy conditions that significantly degrade neural responses to speech sounds. We tested speech sound discrimination in rats and recorded primary auditory cortex (A1) responses to speech sounds in background noise of different intensities and spectral compositions. Our behavioral results indicate that rats, like humans, are able to accurately discriminate consonant sounds even in the presence of background noise that is as loud as the speech signal. Our neural recordings confirm that speech sounds evoke degraded but detectable responses in noise. Finally, we developed a novel neural classifier that mimics behavioral discrimination. The classifier discriminates between speech sounds by comparing the A1 spatiotemporal activity patterns evoked on single trials with the average spatiotemporal patterns evoked by known sounds. Unlike classifiers in most previous studies, this classifier is not provided with the stimulus onset time. Neural activity analyzed with the use of relative spike timing was well correlated with behavioral speech discrimination in quiet and in noise. Spike timing information integrated over longer intervals was required to accurately predict rat behavioral speech discrimination in noisy conditions. The similarity of neural and behavioral discrimination of speech in noise suggests that humans and rats may employ similar brain mechanisms to solve this problem. PMID:22098331

  4. Role of the middle ear muscle apparatus in mechanisms of speech signal discrimination

    Science.gov (United States)

    Moroz, B. S.; Bazarov, V. G.; Sachenko, S. V.

    1980-01-01

    A method of impedance reflexometry was used to examine 101 students with hearing impairment in order to clarify the interrelation between speech discrimination and the state of the middle ear muscles. Ability to discriminate speech signals depends to some extent on the functional state of intraaural muscles. Speech discrimination was greatly impaired in the absence of stapedial muscle acoustic reflex, in the presence of low thresholds of stimulation and in very small values of reflex amplitude increase. Discrimination was not impeded in positive AR, high values of relative thresholds and normal increase of reflex amplitude in response to speech signals with augmenting intensity.

  5. Teaching evidence-based practice (EBP) to speech-language therapy students: are students competent and confident EBP users?

    NARCIS (Netherlands)

    Spek, B.; Wieringa-de Waard, M.; Lucas, C.; van Dijk, N.

    2013-01-01

    The importance and value of the principles of evidence-based practice (EBP) in the decision-making process is recognized by speech-language therapists (SLTs) worldwide and as a result curricula for speech-language therapy students incorporated EBP principles. However, the willingness actually to use

  6. Speech-Language Dissociations, Distractibility, and Childhood Stuttering

    Science.gov (United States)

    Conture, Edward G.; Walden, Tedra A.; Lambert, Warren E.

    2015-01-01

    Purpose This study investigated the relation among speech-language dissociations, attentional distractibility, and childhood stuttering. Method Participants were 82 preschool-age children who stutter (CWS) and 120 who do not stutter (CWNS). Correlation-based statistics (Bates, Appelbaum, Salcedo, Saygin, & Pizzamiglio, 2003) identified dissociations across 5 norm-based speech-language subtests. The Behavioral Style Questionnaire Distractibility subscale measured attentional distractibility. Analyses addressed (a) between-groups differences in the number of children exhibiting speech-language dissociations; (b) between-groups distractibility differences; (c) the relation between distractibility and speech-language dissociations; and (d) whether interactions between distractibility and dissociations predicted the frequency of total, stuttered, and nonstuttered disfluencies. Results More preschool-age CWS exhibited speech-language dissociations compared with CWNS, and more boys exhibited dissociations compared with girls. In addition, male CWS were less distractible than female CWS and female CWNS. For CWS, but not CWNS, less distractibility (i.e., greater attention) was associated with more speech-language dissociations. Last, interactions between distractibility and dissociations did not predict speech disfluencies in CWS or CWNS. Conclusions The present findings suggest that for preschool-age CWS, attentional processes are associated with speech-language dissociations. Future investigations are warranted to better understand the directionality of effect of this association (e.g., inefficient attentional processes → speech-language dissociations vs. inefficient attentional processes ← speech-language dissociations). PMID:26126203

  7. Speech endpoint detection with non-language speech sounds for generic speech processing applications

    Science.gov (United States)

    McClain, Matthew; Romanowski, Brian

    2009-05-01

    Non-language speech sounds (NLSS) are sounds produced by humans that do not carry linguistic information. Examples of these sounds are coughs, clicks, breaths, and filled pauses such as "uh" and "um" in English. NLSS are prominent in conversational speech, but can be a significant source of errors in speech processing applications. Traditionally, these sounds are ignored by speech endpoint detection algorithms, where speech regions are identified in the audio signal prior to processing. The ability to filter NLSS as a pre-processing step can significantly enhance the performance of many speech processing applications, such as speaker identification, language identification, and automatic speech recognition. In order to be used in all such applications, NLSS detection must be performed without the use of language models that provide knowledge of the phonology and lexical structure of speech. This is especially relevant to situations where the languages used in the audio are not known apriori. We present the results of preliminary experiments using data from American and British English speakers, in which segments of audio are classified as language speech sounds (LSS) or NLSS using a set of acoustic features designed for language-agnostic NLSS detection and a hidden-Markov model (HMM) to model speech generation. The results of these experiments indicate that the features and model used are capable of detection certain types of NLSS, such as breaths and clicks, while detection of other types of NLSS such as filled pauses will require future research.

  8. Speech recognition systems on the Cell Broadband Engine

    Energy Technology Data Exchange (ETDEWEB)

    Liu, Y; Jones, H; Vaidya, S; Perrone, M; Tydlitat, B; Nanda, A

    2007-04-20

    In this paper we describe our design, implementation, and first results of a prototype connected-phoneme-based speech recognition system on the Cell Broadband Engine{trademark} (Cell/B.E.). Automatic speech recognition decodes speech samples into plain text (other representations are possible) and must process samples at real-time rates. Fortunately, the computational tasks involved in this pipeline are highly data-parallel and can receive significant hardware acceleration from vector-streaming architectures such as the Cell/B.E. Identifying and exploiting these parallelism opportunities is challenging, but also critical to improving system performance. We observed, from our initial performance timings, that a single Cell/B.E. processor can recognize speech from thousands of simultaneous voice channels in real time--a channel density that is orders-of-magnitude greater than the capacity of existing software speech recognizers based on CPUs (central processing units). This result emphasizes the potential for Cell/B.E.-based speech recognition and will likely lead to the future development of production speech systems using Cell/B.E. clusters.

  9. Microscopic prediction of speech intelligibility in spatially distributed speech-shaped noise for normal-hearing listeners.

    Science.gov (United States)

    Geravanchizadeh, Masoud; Fallah, Ali

    2015-12-01

    A binaural and psychoacoustically motivated intelligibility model, based on a well-known monaural microscopic model is proposed. This model simulates a phoneme recognition task in the presence of spatially distributed speech-shaped noise in anechoic scenarios. In the proposed model, binaural advantage effects are considered by generating a feature vector for a dynamic-time-warping speech recognizer. This vector consists of three subvectors incorporating two monaural subvectors to model the better-ear hearing, and a binaural subvector to simulate the binaural unmasking effect. The binaural unit of the model is based on equalization-cancellation theory. This model operates blindly, which means separate recordings of speech and noise are not required for the predictions. Speech intelligibility tests were conducted with 12 normal hearing listeners by collecting speech reception thresholds (SRTs) in the presence of single and multiple sources of speech-shaped noise. The comparison of the model predictions with the measured binaural SRTs, and with the predictions of a macroscopic binaural model called extended equalization-cancellation, shows that this approach predicts the intelligibility in anechoic scenarios with good precision. The square of the correlation coefficient (r(2)) and the mean-absolute error between the model predictions and the measurements are 0.98 and 0.62 dB, respectively.

  10. Atomoxetine administration combined with intensive speech therapy for post-stroke aphasia: evaluation by a novel SPECT method.

    Science.gov (United States)

    Yamada, Naoki; Kakuda, Wataru; Yamamoto, Kazuma; Momosaki, Ryo; Abo, Masahiro

    2016-09-01

    We clarified the safety, feasibility, and efficacy of atomoxetine administration combined with intensive speech therapy (ST) for patients with post-stroke aphasia. In addition, we investigated the effect of atomoxetine treatment on neural activity of surrounding lesioned brain areas. Four adult patients with motor-dominant aphasia and a history of left hemispheric stroke were studied. We have registered on the clinical trials database (ID: JMA-IIA00215). Daily atomoxetine administration of 40 mg was initiated two weeks before admission and raised to 80 mg 1 week before admission. During the subsequent 13-day hospitalization, administration of atomoxetine was raised to 120 mg and daily intensive ST (120 min/day, one-on-one training) was provided. Language function was assessed using the Japanese version of The Western Aphasia Battery (WAB) and the Token test two weeks prior to admission, on the day of admission, and at discharge. At two weeks prior to admission and at discharge, each patient's cortical blood flow was measured using (123)I-IMP-single photon emission computed tomography (SPECT). This protocol was successfully completed by all patients without any adverse effects. Four patients showed improved language function with the median of the Token Test increasing from 141 to 149, and the repetition score of WAB increasing from 88 to 99. In addition, cortical blood flow surrounding lesioned brain areas was found to increase following intervention in all patients. Atomoxetine administration and intensive ST were safe and feasible for post-stroke aphasia, suggesting their potential usefulness in the treatment of this patient population.

  11. Beginning to Disentangle the Prosody-Literacy Relationship: A Multi-Component Measure of Prosodic Sensitivity

    Science.gov (United States)

    Holliman, A. J.; Williams, G. J.; Mundy, I. R.; Wood, C.; Hart, L.; Waldron, S.

    2014-01-01

    A growing number of studies now suggest that sensitivity to the rhythmic patterning of speech (prosody) is implicated in successful reading acquisition. However, recent evidence suggests that prosody is not a unitary construct and that the different components of prosody (stress, intonation, and timing) operating at different linguistic levels…

  12. Time-Frequency Feature Representation Using Multi-Resolution Texture Analysis and Acoustic Activity Detector for Real-Life Speech Emotion Recognition

    Directory of Open Access Journals (Sweden)

    Kun-Ching Wang

    2015-01-01

    Full Text Available The classification of emotional speech is mostly considered in speech-related research on human-computer interaction (HCI. In this paper, the purpose is to present a novel feature extraction based on multi-resolutions texture image information (MRTII. The MRTII feature set is derived from multi-resolution texture analysis for characterization and classification of different emotions in a speech signal. The motivation is that we have to consider emotions have different intensity values in different frequency bands. In terms of human visual perceptual, the texture property on multi-resolution of emotional speech spectrogram should be a good feature set for emotion classification in speech. Furthermore, the multi-resolution analysis on texture can give a clearer discrimination between each emotion than uniform-resolution analysis on texture. In order to provide high accuracy of emotional discrimination especially in real-life, an acoustic activity detection (AAD algorithm must be applied into the MRTII-based feature extraction. Considering the presence of many blended emotions in real life, in this paper make use of two corpora of naturally-occurring dialogs recorded in real-life call centers. Compared with the traditional Mel-scale Frequency Cepstral Coefficients (MFCC and the state-of-the-art features, the MRTII features also can improve the correct classification rates of proposed systems among different language databases. Experimental results show that the proposed MRTII-based feature information inspired by human visual perception of the spectrogram image can provide significant classification for real-life emotional recognition in speech.

  13. SynFace—Speech-Driven Facial Animation for Virtual Speech-Reading Support

    Directory of Open Access Journals (Sweden)

    Giampiero Salvi

    2009-01-01

    Full Text Available This paper describes SynFace, a supportive technology that aims at enhancing audio-based spoken communication in adverse acoustic conditions by providing the missing visual information in the form of an animated talking head. Firstly, we describe the system architecture, consisting of a 3D animated face model controlled from the speech input by a specifically optimised phonetic recogniser. Secondly, we report on speech intelligibility experiments with focus on multilinguality and robustness to audio quality. The system, already available for Swedish, English, and Flemish, was optimised for German and for Swedish wide-band speech quality available in TV, radio, and Internet communication. Lastly, the paper covers experiments with nonverbal motions driven from the speech signal. It is shown that turn-taking gestures can be used to affect the flow of human-human dialogues. We have focused specifically on two categories of cues that may be extracted from the acoustic signal: prominence/emphasis and interactional cues (turn-taking/back-channelling.

  14. Gender classification in children based on speech characteristics: using fundamental and formant frequencies of Malay vowels.

    Science.gov (United States)

    Zourmand, Alireza; Ting, Hua-Nong; Mirhassani, Seyed Mostafa

    2013-03-01

    Speech is one of the prevalent communication mediums for humans. Identifying the gender of a child speaker based on his/her speech is crucial in telecommunication and speech therapy. This article investigates the use of fundamental and formant frequencies from sustained vowel phonation to distinguish the gender of Malay children aged between 7 and 12 years. The Euclidean minimum distance and multilayer perceptron were used to classify the gender of 360 Malay children based on different combinations of fundamental and formant frequencies (F0, F1, F2, and F3). The Euclidean minimum distance with normalized frequency data achieved a classification accuracy of 79.44%, which was higher than that of the nonnormalized frequency data. Age-dependent modeling was used to improve the accuracy of gender classification. The Euclidean distance method obtained 84.17% based on the optimal classification accuracy for all age groups. The accuracy was further increased to 99.81% using multilayer perceptron based on mel-frequency cepstral coefficients. Copyright © 2013 The Voice Foundation. Published by Mosby, Inc. All rights reserved.

  15. Translation of incremental talk test responses to steady-state exercise training intensity.

    Science.gov (United States)

    Lyon, Ellen; Menke, Miranda; Foster, Carl; Porcari, John P; Gibson, Mark; Bubbers, Terresa

    2014-01-01

    The Talk Test (TT) is a submaximal, incremental exercise test that has been shown to be useful in prescribing exercise training intensity. It is based on a subject's ability to speak comfortably during exercise. This study defined the amount of reduction in absolute workload intensity from an incremental exercise test using the TT to give appropriate absolute training intensity for cardiac rehabilitation patients. Patients in an outpatient rehabilitation program (N = 30) performed an incremental exercise test with the TT given every 2-minute stage. Patients rated their speech comfort after reciting a standardized paragraph. Anything other than a "yes" response was considered the "equivocal" stage, while all preceding stages were "positive" stages. The last stage with the unequivocally positive ability to speak was the Last Positive (LP), and the preceding stages were (LP-1 and LP-2). Subsequently, three 20-minute steady-state training bouts were performed in random order at the absolute workload at the LP, LP-1, and LP-2 stages of the incremental test. Speech comfort, heart rate (HR), and rating of perceived exertion (RPE) were recorded every 5 minutes. The 20-minute exercise training bout was completed fully by LP (n = 19), LP-1 (n = 28), and LP-2 (n = 30). Heart rate, RPE, and speech comfort were similar through the LP-1 and LP-2 tests, but the LP stage was markedly more difficult. Steady-state exercise training intensity was easily and appropriately prescribed at intensity associated with the LP-1 and LP-2 stages of the TT. The LP stage may be too difficult for patients in a cardiac rehabilitation program.

  16. Interventions for Speech Sound Disorders in Children

    Science.gov (United States)

    Williams, A. Lynn, Ed.; McLeod, Sharynne, Ed.; McCauley, Rebecca J., Ed.

    2010-01-01

    With detailed discussion and invaluable video footage of 23 treatment interventions for speech sound disorders (SSDs) in children, this textbook and DVD set should be part of every speech-language pathologist's professional preparation. Focusing on children with functional or motor-based speech disorders from early childhood through the early…

  17. Apraxia of Speech

    Science.gov (United States)

    ... Health Info » Voice, Speech, and Language Apraxia of Speech On this page: What is apraxia of speech? ... about apraxia of speech? What is apraxia of speech? Apraxia of speech (AOS)—also known as acquired ...

  18. On speech recognition during anaesthesia

    DEFF Research Database (Denmark)

    Alapetite, Alexandre

    2007-01-01

    This PhD thesis in human-computer interfaces (informatics) studies the case of the anaesthesia record used during medical operations and the possibility to supplement it with speech recognition facilities. Problems and limitations have been identified with the traditional paper-based anaesthesia...... and inaccuracies in the anaesthesia record. Supplementing the electronic anaesthesia record interface with speech input facilities is proposed as one possible solution to a part of the problem. The testing of the various hypotheses has involved the development of a prototype of an electronic anaesthesia record...... interface with speech input facilities in Danish. The evaluation of the new interface was carried out in a full-scale anaesthesia simulator. This has been complemented by laboratory experiments on several aspects of speech recognition for this type of use, e.g. the effects of noise on speech recognition...

  19. Predicting the effect of spectral subtraction on the speech recognition threshold based on the signal-to-noise ratio in the envelope domain

    DEFF Research Database (Denmark)

    Jørgensen, Søren; Dau, Torsten

    2011-01-01

    rarely been evaluated perceptually in terms of speech intelligibility. This study analyzed the effects of the spectral subtraction strategy proposed by Berouti at al. [ICASSP 4 (1979), 208-211] on the speech recognition threshold (SRT) obtained with sentences presented in stationary speech-shaped noise....... The SRT was measured in five normal-hearing listeners in six conditions of spectral subtraction. The results showed an increase of the SRT after processing, i.e. a decreased speech intelligibility, in contrast to what is predicted by the Speech Transmission Index (STI). Here, another approach is proposed......, denoted the speech-based envelope power spectrum model (sEPSM) which predicts the intelligibility based on the signal-to-noise ratio in the envelope domain. In contrast to the STI, the sEPSM is sensitive to the increased amount of the noise envelope power as a consequence of the spectral subtraction...

  20. Through the magnifying glass: Underlying literacy deficits and remediation potential in childhood apraxia of speech.

    Science.gov (United States)

    Zaretsky, Elena; Velleman, Shelley L; Curro, Kristina

    2010-02-01

    Interactions among psycholinguistic deficits and literacy difficulties in childhood apraxia of speech (CAS) have been inadequately studied. Comparisons with other disorders (Specific Language Impairment (SLI) and phonological dyslexia) and the possibility of reading remediation in CAS warrant further research. This case study describes the speech, language, cognitive, and literacy deficits and therapy gains in a girl aged 11;6 with severe CAS and borderline IQ. A comprehensive assessment of literacy-related cognitive skills, including phonological memory and working memory capacity, language, speech production and reading skills, was administered. Treatment from 6;0 to 11;6 targeted speech sounds, oral sequencing, phonological awareness (PA), speech-print connections, syllabic structure, and real and non-word decoding. Phonological memory was similar to that of children with SLI, but working memory was significantly worse. Unlike children with phonological dyslexia, our participant demonstrated relative strength in letter-sound correspondence rules. Despite deficits, she made progress in literacy with intensive long-term intervention. Results suggest that the underlying cognitive-linguistic profile of children with CAS may differ from those of children with SLI or dyslexia. Our results also show that long-term intensive intervention promotes acquisition of adequate literacy skills even in a child with a severe motor speech disorder and borderline IQ.

  1. Acoustic Changes in the Speech of Children with Cerebral Palsy Following an Intensive Program of Dysarthria Therapy

    Science.gov (United States)

    Pennington, Lindsay; Lombardo, Eftychia; Steen, Nick; Miller, Nick

    2018-01-01

    Background: The speech intelligibility of children with dysarthria and cerebral palsy has been observed to increase following therapy focusing on respiration and phonation. Aims: To determine if speech intelligibility change following intervention is associated with change in acoustic measures of voice. Methods & Procedures: We recorded 16…

  2. Effects of Musicality on the Perception of Rhythmic Structure in Speech

    Directory of Open Access Journals (Sweden)

    Natalie Boll-Avetisyan

    2017-04-01

    Full Text Available Language and music share many rhythmic properties, such as variations in intensity and duration leading to repeating patterns. Perception of rhythmic properties may rely on cognitive networks that are shared between the two domains. If so, then variability in speech rhythm perception may relate to individual differences in musicality. To examine this possibility, the present study focuses on rhythmic grouping, which is assumed to be guided by a domain-general principle, the Iambic/Trochaic law, stating that sounds alternating in intensity are grouped as strong-weak, and sounds alternating in duration are grouped as weak-strong. German listeners completed a grouping task: They heard streams of syllables alternating in intensity, duration, or neither, and had to indicate whether they perceived a strong-weak or weak-strong pattern. Moreover, their music perception abilities were measured, and they filled out a questionnaire reporting their productive musical experience. Results showed that better musical rhythm perception ability was associated with more consistent rhythmic grouping of speech, while melody perception ability and productive musical experience were not. This suggests shared cognitive procedures in the perception of rhythm in music and speech. Also, the results highlight the relevance of considering individual differences in musicality when aiming to explain variability in prosody perception.

  3. Deriving prosodic structures

    NARCIS (Netherlands)

    Günes, Güliz

    2015-01-01

    When we speak, we speak in prosodic chunks. That is, in the speech flow, we produce sound strings that are systematically parsed into intonational units. The parsing procedure not only eases the production of the speaker, but it also provides the hearer clues about how units of meaning interact in

  4. Comparison of two speech privacy measurements, articulation index (AI) and speech privacy noise isolation class (NIC'), in open workplaces

    Science.gov (United States)

    Yoon, Heakyung C.; Loftness, Vivian

    2002-05-01

    Lack of speech privacy has been reported to be the main dissatisfaction among occupants in open workplaces, according to workplace surveys. Two speech privacy measurements, Articulation Index (AI), standardized by the American National Standards Institute in 1969, and Speech Privacy Noise Isolation Class (NIC', Noise Isolation Class Prime), adapted from Noise Isolation Class (NIC) by U. S. General Services Administration (GSA) in 1979, have been claimed as objective tools to measure speech privacy in open offices. To evaluate which of them, normal privacy for AI or satisfied privacy for NIC', is a better tool in terms of speech privacy in a dynamic open office environment, measurements were taken in the field. AIs and NIC's in the different partition heights and workplace configurations have been measured following ASTM E1130 (Standard Test Method for Objective Measurement of Speech Privacy in Open Offices Using Articulation Index) and GSA test PBS-C.1 (Method for the Direct Measurement of Speech-Privacy Potential (SPP) Based on Subjective Judgments) and PBS-C.2 (Public Building Service Standard Method of Test Method for the Sufficient Verification of Speech-Privacy Potential (SPP) Based on Objective Measurements Including Methods for the Rating of Functional Interzone Attenuation and NC-Background), respectively.

  5. A Research of Speech Emotion Recognition Based on Deep Belief Network and SVM

    Directory of Open Access Journals (Sweden)

    Chenchen Huang

    2014-01-01

    Full Text Available Feature extraction is a very important part in speech emotion recognition, and in allusion to feature extraction in speech emotion recognition problems, this paper proposed a new method of feature extraction, using DBNs in DNN to extract emotional features in speech signal automatically. By training a 5 layers depth DBNs, to extract speech emotion feature and incorporate multiple consecutive frames to form a high dimensional feature. The features after training in DBNs were the input of nonlinear SVM classifier, and finally speech emotion recognition multiple classifier system was achieved. The speech emotion recognition rate of the system reached 86.5%, which was 7% higher than the original method.

  6. Common neural substrates support speech and non-speech vocal tract gestures.

    Science.gov (United States)

    Chang, Soo-Eun; Kenney, Mary Kay; Loucks, Torrey M J; Poletto, Christopher J; Ludlow, Christy L

    2009-08-01

    The issue of whether speech is supported by the same neural substrates as non-speech vocal tract gestures has been contentious. In this fMRI study we tested whether producing non-speech vocal tract gestures in humans shares the same functional neuroanatomy as non-sense speech syllables. Production of non-speech vocal tract gestures, devoid of phonological content but similar to speech in that they had familiar acoustic and somatosensory targets, was compared to the production of speech syllables without meaning. Brain activation related to overt production was captured with BOLD fMRI using a sparse sampling design for both conditions. Speech and non-speech were compared using voxel-wise whole brain analyses, and ROI analyses focused on frontal and temporoparietal structures previously reported to support speech production. Results showed substantial activation overlap between speech and non-speech function in regions. Although non-speech gesture production showed greater extent and amplitude of activation in the regions examined, both speech and non-speech showed comparable left laterality in activation for both target perception and production. These findings posit a more general role of the previously proposed "auditory dorsal stream" in the left hemisphere--to support the production of vocal tract gestures that are not limited to speech processing.

  7. Current trends in stroke rehabilitation. A review with focus on brain plasticity.

    Science.gov (United States)

    Johansson, B B

    2011-03-01

    Current understanding of brain plasticity has lead to new approaches in ischemic stroke rehabilitation. Stroke units that combine good medical and nursing care with task-oriented intense training in an environment that provides confidence, stimulation and motivation significantly improve outcome. Repetitive trans-cranial magnetic stimulation (rTMS), and trans-cranial direct current stimulation (tDCS) are applied in rehabilitation of motor function. The long-term effect, optimal way of stimulation and possibly efficacy in cognitive rehabilitation need evaluation. Methods based on multisensory integration of motor, cognitive, and perceptual processes including action observation, mental training, and virtual reality are being tested. Different approaches of intensive aphasia training are described. Recent data on intensive melodic intonation therapy indicate that even patients with very severe non-fluent aphasia can regain speech through homotopic white matter tract plasticity. Music therapy is applied in motor and cognitive rehabilitation. To avoid the confounding effect of spontaneous improvement, most trials are preformed ≥3 months post stroke. Randomized controlled trials starting earlier after strokes are needed. More attention should be given to stroke heterogeneity, cognitive rehabilitation, and social adjustment and to genetic differences, including the role of BDNF polymorphism in brain plasticity. © 2010 John Wiley & Sons A/S.

  8. Audiovisual Speech Synchrony Measure: Application to Biometrics

    Directory of Open Access Journals (Sweden)

    Gérard Chollet

    2007-01-01

    Full Text Available Speech is a means of communication which is intrinsically bimodal: the audio signal originates from the dynamics of the articulators. This paper reviews recent works in the field of audiovisual speech, and more specifically techniques developed to measure the level of correspondence between audio and visual speech. It overviews the most common audio and visual speech front-end processing, transformations performed on audio, visual, or joint audiovisual feature spaces, and the actual measure of correspondence between audio and visual speech. Finally, the use of synchrony measure for biometric identity verification based on talking faces is experimented on the BANCA database.

  9. Recognition of Emotions in Mexican Spanish Speech: An Approach Based on Acoustic Modelling of Emotion-Specific Vowels

    Directory of Open Access Journals (Sweden)

    Santiago-Omar Caballero-Morales

    2013-01-01

    Full Text Available An approach for the recognition of emotions in speech is presented. The target language is Mexican Spanish, and for this purpose a speech database was created. The approach consists in the phoneme acoustic modelling of emotion-specific vowels. For this, a standard phoneme-based Automatic Speech Recognition (ASR system was built with Hidden Markov Models (HMMs, where different phoneme HMMs were built for the consonants and emotion-specific vowels associated with four emotional states (anger, happiness, neutral, sadness. Then, estimation of the emotional state from a spoken sentence is performed by counting the number of emotion-specific vowels found in the ASR’s output for the sentence. With this approach, accuracy of 87–100% was achieved for the recognition of emotional state of Mexican Spanish speech.

  10. Effect of cognitive load on speech prosody in aviation: Evidence from military simulator flights.

    Science.gov (United States)

    Huttunen, Kerttu; Keränen, Heikki; Väyrynen, Eero; Pääkkönen, Rauno; Leino, Tuomo

    2011-01-01

    Mental overload directly affects safety in aviation and needs to be alleviated. Speech recordings are obtained non-invasively and as such are feasible for monitoring cognitive load. We recorded speech of 13 military pilots while they were performing a simulator task. Three types of cognitive load (load on situation awareness, information processing and decision making) were rated by a flight instructor separately for each flight phase and participant. As a function of increased cognitive load, the mean utterance-level fundamental frequency (F0) increased, on average, by 7 Hz and the mean vocal intensity increased by 1 dB. In the most intensive simulator flight phases, mean F0 increased by 12 Hz and mean intensity, by 1.5 dB. At the same time, the mean F0 range decreased by 5 Hz, on average. Our results showed that prosodic features of speech can be used to monitor speaker state and support pilot training in a simulator environment. Copyright © 2010 Elsevier Ltd and The Ergonomics Society. All rights reserved.

  11. Histogram Equalization to Model Adaptation for Robust Speech Recognition

    Directory of Open Access Journals (Sweden)

    Suh Youngjoo

    2010-01-01

    Full Text Available We propose a new model adaptation method based on the histogram equalization technique for providing robustness in noisy environments. The trained acoustic mean models of a speech recognizer are adapted into environmentally matched conditions by using the histogram equalization algorithm on a single utterance basis. For more robust speech recognition in the heavily noisy conditions, trained acoustic covariance models are efficiently adapted by the signal-to-noise ratio-dependent linear interpolation between trained covariance models and utterance-level sample covariance models. Speech recognition experiments on both the digit-based Aurora2 task and the large vocabulary-based task showed that the proposed model adaptation approach provides significant performance improvements compared to the baseline speech recognizer trained on the clean speech data.

  12. Psychosocial stress based on public speech in humans: is there a real life/laboratory setting cross-adaptation?

    Science.gov (United States)

    Jezova, D; Hlavacova, N; Dicko, I; Solarikova, P; Brezina, I

    2016-07-01

    Repeated or chronic exposure to stressors is associated with changes in neuroendocrine responses depending on the type, intensity, number and frequency of stress exposure as well as previous stress experience. The aim of the study was to test the hypothesis that salivary cortisol and cardiovascular responses to real-life psychosocial stressors related to public performance can cross-adapt with responses to psychosocial stress induced by public speech under laboratory setting. The sample consisted of 22 healthy male volunteers, which were either actors, more precisely students of dramatic arts or non-actors, students of other fields. The stress task consisted of 15 min anticipatory preparation phase and 15 min of public speech on an emotionally charged topic. The actors, who were accustomed to public speaking, responded with a rise in salivary cortisol as well as blood pressure to laboratory public speech. The values of salivary cortisol, systolic blood pressure and state anxiety were lower in actors compared to non-actors. Unlike non-actors, subjects with experience in public speaking did not show stress-induced rise in the heart rate. Evaluation of personality traits revealed that actors scored significantly higher in extraversion than the subjects in the non-actor group. In conclusion, neuroendocrine responses to real-life stressors in actors can partially cross-adapt with responses to psychosocial stress under laboratory setting. The most evident adaptation was at the level of heart rate responses. The public speech tasks may be of help in evaluation of the ability to cope with stress in real life in artists by simple laboratory testing.

  13. On the Influence of Inharmonicities in Model-Based Speech Enhancement

    DEFF Research Database (Denmark)

    Nørholm, Sidsel Marie; Jensen, Jesper Rindom; Christensen, Mads Græsbøll

    2013-01-01

    In relation to speech enhancement, we study the influence of modifying the harmonic signal model for voiced speech to include small perturbations in the frequencies of the harmonics. A perturbed signal model is incorporated in the nonlinear least squares method, the Capon filter and the amplitude...

  14. Automated Discovery of Speech Act Categories in Educational Games

    Science.gov (United States)

    Rus, Vasile; Moldovan, Cristian; Niraula, Nobal; Graesser, Arthur C.

    2012-01-01

    In this paper we address the important task of automated discovery of speech act categories in dialogue-based, multi-party educational games. Speech acts are important in dialogue-based educational systems because they help infer the student speaker's intentions (the task of speech act classification) which in turn is crucial to providing adequate…

  15. Modelling speech intelligibility in adverse conditions

    DEFF Research Database (Denmark)

    Jørgensen, Søren; Dau, Torsten

    2013-01-01

    Jørgensen and Dau (J Acoust Soc Am 130:1475-1487, 2011) proposed the speech-based envelope power spectrum model (sEPSM) in an attempt to overcome the limitations of the classical speech transmission index (STI) and speech intelligibility index (SII) in conditions with nonlinearly processed speech...... subjected to phase jitter, a condition in which the spectral structure of the intelligibility of speech signal is strongly affected, while the broadband temporal envelope is kept largely intact. In contrast, the effects of this distortion can be predicted -successfully by the spectro-temporal modulation...... suggest that the SNRenv might reflect a powerful decision metric, while some explicit across-frequency analysis seems crucial in some conditions. How such across-frequency analysis is "realized" in the auditory system remains unresolved....

  16. Understanding the Linguistic Characteristics of the Great Speeches

    OpenAIRE

    Mouritzen, Kristian

    2016-01-01

    This dissertation attempts to find the common traits of great speeches. It does so by closely examining the language of some of the most well-known speeches in world. These speeches are presented in the book Speeches that Changed the World (2006) by Simon Sebag Montefiore. The dissertation specifically looks at four variables: The beginnings and endings of the speeches, the use of passive voice, the use of personal pronouns and the difficulty of the language. These four variables are based on...

  17. Speech-enabled Computer-aided Translation

    DEFF Research Database (Denmark)

    Mesa-Lao, Bartolomé

    2014-01-01

    The present study has surveyed post-editor trainees’ views and attitudes before and after the introduction of speech technology as a front end to a computer-aided translation workbench. The aim of the survey was (i) to identify attitudes and perceptions among post-editor trainees before performing...... a post-editing task using automatic speech recognition (ASR); and (ii) to assess the degree to which post-editors’ attitudes and expectations to the use of speech technology changed after actually using it. The survey was based on two questionnaires: the first one administered before the participants...

  18. Emotion Recognition of Speech Signals Based on Filter Methods

    Directory of Open Access Journals (Sweden)

    Narjes Yazdanian

    2016-10-01

    Full Text Available Speech is the basic mean of communication among human beings.With the increase of transaction between human and machine, necessity of automatic dialogue and removing human factor has been considered. The aim of this study was to determine a set of affective features the speech signal is based on emotions. In this study system was designs that include three mains sections, features extraction, features selection and classification. After extraction of useful features such as, mel frequency cepstral coefficient (MFCC, linear prediction cepstral coefficients (LPC, perceptive linear prediction coefficients (PLP, ferment frequency, zero crossing rate, cepstral coefficients and pitch frequency, Mean, Jitter, Shimmer, Energy, Minimum, Maximum, Amplitude, Standard Deviation, at a later stage with filter methods such as Pearson Correlation Coefficient, t-test, relief and information gain, we came up with a method to rank and select effective features in emotion recognition. Then Result, are given to the classification system as a subset of input. In this classification stage, multi support vector machine are used to classify seven type of emotion. According to the results, that method of relief, together with multi support vector machine, has the most classification accuracy with emotion recognition rate of 93.94%.

  19. Identifying Deceptive Speech Across Cultures

    Science.gov (United States)

    2016-06-25

    enough from the truth. Subjects were then interviewed individually in a sound booth to obtain “norming” speech data, pre- interview. We also...e.g. pitch, intensity, speaking rate, voice quality), gender, ethnicity and personality information, our machine learning experiments can classify...Have you ever been in trouble with the police?” vs. open-ended (e.g. “What is the last movie you saw that you really hated ?”) DISTRIBUTION A

  20. Effect of speech therapy and pharmacological treatment in prosody of parkinsonians

    Directory of Open Access Journals (Sweden)

    Luciana Lemos de Azevedo

    2015-01-01

    Full Text Available Objective Parkinsonian patients usually present speech impairment. The aim of this study was to verify the influence of levodopa and of the adapted Lee Silverman Vocal Treatment® method on prosodic parameters employed by parkinsonian patients. Method Ten patients with idiopathic Parkinson's disease using levodopa underwent recording of utterances produced in four stages: expressing attitudes of certainty and doubt and declarative and interrogative modalities. The sentences were recorded under the effect of levodopa (on, without the effect of levodopa (off; before and after speech therapy during the on and off periods. Results The speech therapy and its association with drug treatment promoted the improvement of prosodic parameters: increase of fundamental frequency measures, reduction of measures of duration and greater intensity. Conclusion The association of speech therapy to medication treatment is of great value in improving the communication of parkinsonian patients.

  1. Speech Production and Speech Discrimination by Hearing-Impaired Children.

    Science.gov (United States)

    Novelli-Olmstead, Tina; Ling, Daniel

    1984-01-01

    Seven hearing impaired children (five to seven years old) assigned to the Speakers group made highly significant gains in speech production and auditory discrimination of speech, while Listeners made only slight speech production gains and no gains in auditory discrimination. Combined speech and auditory training was more effective than auditory…

  2. Improving on hidden Markov models: An articulatorily constrained, maximum likelihood approach to speech recognition and speech coding

    Energy Technology Data Exchange (ETDEWEB)

    Hogden, J.

    1996-11-05

    The goal of the proposed research is to test a statistical model of speech recognition that incorporates the knowledge that speech is produced by relatively slow motions of the tongue, lips, and other speech articulators. This model is called Maximum Likelihood Continuity Mapping (Malcom). Many speech researchers believe that by using constraints imposed by articulator motions, we can improve or replace the current hidden Markov model based speech recognition algorithms. Unfortunately, previous efforts to incorporate information about articulation into speech recognition algorithms have suffered because (1) slight inaccuracies in our knowledge or the formulation of our knowledge about articulation may decrease recognition performance, (2) small changes in the assumptions underlying models of speech production can lead to large changes in the speech derived from the models, and (3) collecting measurements of human articulator positions in sufficient quantity for training a speech recognition algorithm is still impractical. The most interesting (and in fact, unique) quality of Malcom is that, even though Malcom makes use of a mapping between acoustics and articulation, Malcom can be trained to recognize speech using only acoustic data. By learning the mapping between acoustics and articulation using only acoustic data, Malcom avoids the difficulties involved in collecting articulator position measurements and does not require an articulatory synthesizer model to estimate the mapping between vocal tract shapes and speech acoustics. Preliminary experiments that demonstrate that Malcom can learn the mapping between acoustics and articulation are discussed. Potential applications of Malcom aside from speech recognition are also discussed. Finally, specific deliverables resulting from the proposed research are described.

  3. Automatic Speech Recognition from Neural Signals: A Focused Review

    Directory of Open Access Journals (Sweden)

    Christian Herff

    2016-09-01

    Full Text Available Speech interfaces have become widely accepted and are nowadays integrated in various real-life applications and devices. They have become a part of our daily life. However, speech interfaces presume the ability to produce intelligible speech, which might be impossible due to either loud environments, bothering bystanders or incapabilities to produce speech (i.e.~patients suffering from locked-in syndrome. For these reasons it would be highly desirable to not speak but to simply envision oneself to say words or sentences. Interfaces based on imagined speech would enable fast and natural communication without the need for audible speech and would give a voice to otherwise mute people.This focused review analyzes the potential of different brain imaging techniques to recognize speech from neural signals by applying Automatic Speech Recognition technology. We argue that modalities based on metabolic processes, such as functional Near Infrared Spectroscopy and functional Magnetic Resonance Imaging, are less suited for Automatic Speech Recognition from neural signals due to low temporal resolution but are very useful for the investigation of the underlying neural mechanisms involved in speech processes. In contrast, electrophysiologic activity is fast enough to capture speech processes and is therefor better suited for ASR. Our experimental results indicate the potential of these signals for speech recognition from neural data with a focus on invasively measured brain activity (electrocorticography. As a first example of Automatic Speech Recognition techniques used from neural signals, we discuss the emph{Brain-to-text} system.

  4. Automatic analysis of slips of the tongue: Insights into the cognitive architecture of speech production.

    Science.gov (United States)

    Goldrick, Matthew; Keshet, Joseph; Gustafson, Erin; Heller, Jordana; Needle, Jeremy

    2016-04-01

    Traces of the cognitive mechanisms underlying speaking can be found within subtle variations in how we pronounce sounds. While speech errors have traditionally been seen as categorical substitutions of one sound for another, acoustic/articulatory analyses show they partially reflect the intended sound. When "pig" is mispronounced as "big," the resulting /b/ sound differs from correct productions of "big," moving towards intended "pig"-revealing the role of graded sound representations in speech production. Investigating the origins of such phenomena requires detailed estimation of speech sound distributions; this has been hampered by reliance on subjective, labor-intensive manual annotation. Computational methods can address these issues by providing for objective, automatic measurements. We develop a novel high-precision computational approach, based on a set of machine learning algorithms, for measurement of elicited speech. The algorithms are trained on existing manually labeled data to detect and locate linguistically relevant acoustic properties with high accuracy. Our approach is robust, is designed to handle mis-productions, and overall matches the performance of expert coders. It allows us to analyze a very large dataset of speech errors (containing far more errors than the total in the existing literature), illuminating properties of speech sound distributions previously impossible to reliably observe. We argue that this provides novel evidence that two sources both contribute to deviations in speech errors: planning processes specifying the targets of articulation and articulatory processes specifying the motor movements that execute this plan. These findings illustrate how a much richer picture of speech provides an opportunity to gain novel insights into language processing. Copyright © 2016 Elsevier B.V. All rights reserved.

  5. Enhancement of Non-Stationary Speech using Harmonic Chirp Filters

    DEFF Research Database (Denmark)

    Nørholm, Sidsel Marie; Jensen, Jesper Rindom; Christensen, Mads Græsbøll

    2015-01-01

    In this paper, the issue of single channel speech enhancement of non-stationary voiced speech is addressed. The non-stationarity of speech is well known, but state of the art speech enhancement methods assume stationarity within frames of 20–30 ms. We derive optimal distortionless filters that take...... the non-stationarity nature of voiced speech into account via linear constraints. This is facilitated by imposing a harmonic chirp model on the speech signal. As an implicit part of the filter design, the noise statistics are also estimated based on the observed signal and parameters of the harmonic chirp...... model. Simulations on real speech show that the chirp based filters perform better than their harmonic counterparts. Further, it is seen that the gain of using the chirp model increases when the estimated chirp parameter is big corresponding to periods in the signal where the instantaneous fundamental...

  6. Recognizing speech in a novel accent: the motor theory of speech perception reframed.

    Science.gov (United States)

    Moulin-Frier, Clément; Arbib, Michael A

    2013-08-01

    The motor theory of speech perception holds that we perceive the speech of another in terms of a motor representation of that speech. However, when we have learned to recognize a foreign accent, it seems plausible that recognition of a word rarely involves reconstruction of the speech gestures of the speaker rather than the listener. To better assess the motor theory and this observation, we proceed in three stages. Part 1 places the motor theory of speech perception in a larger framework based on our earlier models of the adaptive formation of mirror neurons for grasping, and for viewing extensions of that mirror system as part of a larger system for neuro-linguistic processing, augmented by the present consideration of recognizing speech in a novel accent. Part 2 then offers a novel computational model of how a listener comes to understand the speech of someone speaking the listener's native language with a foreign accent. The core tenet of the model is that the listener uses hypotheses about the word the speaker is currently uttering to update probabilities linking the sound produced by the speaker to phonemes in the native language repertoire of the listener. This, on average, improves the recognition of later words. This model is neutral regarding the nature of the representations it uses (motor vs. auditory). It serve as a reference point for the discussion in Part 3, which proposes a dual-stream neuro-linguistic architecture to revisits claims for and against the motor theory of speech perception and the relevance of mirror neurons, and extracts some implications for the reframing of the motor theory.

  7. SII-Based Speech Prepocessing for Intelligibility Improvement in Noise

    DEFF Research Database (Denmark)

    Taal, Cees H.; Jensen, Jesper

    2013-01-01

    filter sets certain frequency bands to zero when they do not contribute to intelligibility anymore. Experiments show large intelligibility improvements with the proposed method when used in stationary speech-shaped noise. However, it was also found that the method does not perform well for speech...... corrupted by a competing speaker. This is due to the fact that the SII is not a reliable intelligibility predictor for fluctuating noise sources. MATLAB code is provided....

  8. Stuttering Frequency, Speech Rate, Speech Naturalness, and Speech Effort During the Production of Voluntary Stuttering.

    Science.gov (United States)

    Davidow, Jason H; Grossman, Heather L; Edge, Robin L

    2018-05-01

    Voluntary stuttering techniques involve persons who stutter purposefully interjecting disfluencies into their speech. Little research has been conducted on the impact of these techniques on the speech pattern of persons who stutter. The present study examined whether changes in the frequency of voluntary stuttering accompanied changes in stuttering frequency, articulation rate, speech naturalness, and speech effort. In total, 12 persons who stutter aged 16-34 years participated. Participants read four 300-syllable passages during a control condition, and three voluntary stuttering conditions that involved attempting to produce purposeful, tension-free repetitions of initial sounds or syllables of a word for two or more repetitions (i.e., bouncing). The three voluntary stuttering conditions included bouncing on 5%, 10%, and 15% of syllables read. Friedman tests and follow-up Wilcoxon signed ranks tests were conducted for the statistical analyses. Stuttering frequency, articulation rate, and speech naturalness were significantly different between the voluntary stuttering conditions. Speech effort did not differ between the voluntary stuttering conditions. Stuttering frequency was significantly lower during the three voluntary stuttering conditions compared to the control condition, and speech effort was significantly lower during two of the three voluntary stuttering conditions compared to the control condition. Due to changes in articulation rate across the voluntary stuttering conditions, it is difficult to conclude, as has been suggested previously, that voluntary stuttering is the reason for stuttering reductions found when using voluntary stuttering techniques. Additionally, future investigations should examine different types of voluntary stuttering over an extended period of time to determine their impact on stuttering frequency, speech rate, speech naturalness, and speech effort.

  9. Dysphagia Management: A Survey of School-Based Speech-Language Pathologists in Vermont

    Science.gov (United States)

    Hutchins, Tiffany L.; Gerety, Katherine W.; Mulligan, Moira

    2011-01-01

    Purpose: This study (a) gathered information about the kinds of dysphagia management services school-based speech-language pathologists (SLPs) provide, (b) examined the attitudes of SLPs related to dysphagia management, (c) compared the responses of SLPs on the basis of their experience working in a medical setting, and (d) investigated the…

  10. Comprehension of synthetic speech and digitized natural speech by adults with aphasia.

    Science.gov (United States)

    Hux, Karen; Knollman-Porter, Kelly; Brown, Jessica; Wallace, Sarah E

    2017-09-01

    Using text-to-speech technology to provide simultaneous written and auditory content presentation may help compensate for chronic reading challenges if people with aphasia can understand synthetic speech output; however, inherent auditory comprehension challenges experienced by people with aphasia may make understanding synthetic speech difficult. This study's purpose was to compare the preferences and auditory comprehension accuracy of people with aphasia when listening to sentences generated with digitized natural speech, Alex synthetic speech (i.e., Macintosh platform), or David synthetic speech (i.e., Windows platform). The methodology required each of 20 participants with aphasia to select one of four images corresponding in meaning to each of 60 sentences comprising three stimulus sets. Results revealed significantly better accuracy given digitized natural speech than either synthetic speech option; however, individual participant performance analyses revealed three patterns: (a) comparable accuracy regardless of speech condition for 30% of participants, (b) comparable accuracy between digitized natural speech and one, but not both, synthetic speech option for 45% of participants, and (c) greater accuracy with digitized natural speech than with either synthetic speech option for remaining participants. Ranking and Likert-scale rating data revealed a preference for digitized natural speech and David synthetic speech over Alex synthetic speech. Results suggest many individuals with aphasia can comprehend synthetic speech options available on popular operating systems. Further examination of synthetic speech use to support reading comprehension through text-to-speech technology is thus warranted. Copyright © 2017 Elsevier Inc. All rights reserved.

  11. Accelerometer-based automatic voice onset detection in speech mapping with navigated repetitive transcranial magnetic stimulation.

    Science.gov (United States)

    Vitikainen, Anne-Mari; Mäkelä, Elina; Lioumis, Pantelis; Jousmäki, Veikko; Mäkelä, Jyrki P

    2015-09-30

    The use of navigated repetitive transcranial magnetic stimulation (rTMS) in mapping of speech-related brain areas has recently shown to be useful in preoperative workflow of epilepsy and tumor patients. However, substantial inter- and intraobserver variability and non-optimal replicability of the rTMS results have been reported, and a need for additional development of the methodology is recognized. In TMS motor cortex mappings the evoked responses can be quantitatively monitored by electromyographic recordings; however, no such easily available setup exists for speech mappings. We present an accelerometer-based setup for detection of vocalization-related larynx vibrations combined with an automatic routine for voice onset detection for rTMS speech mapping applying naming. The results produced by the automatic routine were compared with the manually reviewed video-recordings. The new method was applied in the routine navigated rTMS speech mapping for 12 consecutive patients during preoperative workup for epilepsy or tumor surgery. The automatic routine correctly detected 96% of the voice onsets, resulting in 96% sensitivity and 71% specificity. Majority (63%) of the misdetections were related to visible throat movements, extra voices before the response, or delayed naming of the previous stimuli. The no-response errors were correctly detected in 88% of events. The proposed setup for automatic detection of voice onsets provides quantitative additional data for analysis of the rTMS-induced speech response modifications. The objectively defined speech response latencies increase the repeatability, reliability and stratification of the rTMS results. Copyright © 2015 Elsevier B.V. All rights reserved.

  12. Common neural substrates support speech and non-speech vocal tract gestures

    OpenAIRE

    Chang, Soo-Eun; Kenney, Mary Kay; Loucks, Torrey M.J.; Poletto, Christopher J.; Ludlow, Christy L.

    2009-01-01

    The issue of whether speech is supported by the same neural substrates as non-speech vocal-tract gestures has been contentious. In this fMRI study we tested whether producing non-speech vocal tract gestures in humans shares the same functional neuroanatomy as non-sense speech syllables. Production of non-speech vocal tract gestures, devoid of phonological content but similar to speech in that they had familiar acoustic and somatosensory targets, were compared to the production of speech sylla...

  13. Introductory speeches

    International Nuclear Information System (INIS)

    2001-01-01

    This CD is multimedia presentation of programme safety upgrading of Bohunice V1 NPP. This chapter consist of introductory commentary and 4 introductory speeches (video records): (1) Introductory speech of Vincent Pillar, Board chairman and director general of Slovak electric, Plc. (SE); (2) Introductory speech of Stefan Schmidt, director of SE - Bohunice Nuclear power plants; (3) Introductory speech of Jan Korec, Board chairman and director general of VUJE Trnava, Inc. - Engineering, Design and Research Organisation, Trnava; Introductory speech of Dietrich Kuschel, Senior vice-president of FRAMATOME ANP Project and Engineering

  14. Comparison of speech performance in labial and lingual orthodontic patients: A prospective study

    Science.gov (United States)

    Rai, Ambesh Kumar; Rozario, Joe E.; Ganeshkar, Sanjay V.

    2014-01-01

    Background: The intensity and duration of speech difficulty inherently associated with lingual therapy is a significant issue of concern in orthodontics. This study was designed to evaluate and to compare the duration of changes in speech between labial and lingual orthodontics. Materials and Methods: A prospective longitudinal clinical study was designed to assess speech of 24 patients undergoing labial or lingual orthodontic treatment. An objective spectrographic evaluation of/s/sound was done using software PRAAT version 5.0.47, a semiobjective auditive evaluation of articulation was done by four speech pathologists and a subjective assessment of speech was done by four laypersons. The tests were performed before (T1), within 24 h (T2), after 1 week (T3) and after 1 month (T4) of the start of therapy. The Mann-Whitney U-test for independent samples was used to assess the significance difference between the labial and lingual appliances. A speech alteration with P appliance systems caused a comparable speech difficulty immediately after bonding (T2). Although the speech recovered within a week in the labial group (T3), the lingual group continued to experience discomfort even after a month (T4). PMID:25540661

  15. Increased vocal intensity due to the Lombard effect in speakers with Parkinson's disease: simultaneous laryngeal and respiratory strategies.

    Science.gov (United States)

    Stathopoulos, Elaine T; Huber, Jessica E; Richardson, Kelly; Kamphaus, Jennifer; DeCicco, Devan; Darling, Meghan; Fulcher, Katrina; Sussman, Joan E

    2014-01-01

    The objective of the present study was to investigate whether speakers with hypophonia, secondary to Parkinson's disease (PD), would increases their vocal intensity when speaking in a noisy environment (Lombard effect). The other objective was to examine the underlying laryngeal and respiratory strategies used to increase vocal intensity. Thirty-three participants with PD were included for study. Each participant was fitted with the SpeechVive™ device that played multi-talker babble noise into one ear during speech. Using acoustic, aerodynamic and respiratory kinematic techniques, the simultaneous laryngeal and respiratory mechanisms used to regulate vocal intensity were examined. Significant group results showed that most speakers with PD (26/33) were successful at increasing their vocal intensity when speaking in the condition of multi-talker babble noise. They were able to support their increased vocal intensity and subglottal pressure with combined strategies from both the laryngeal and respiratory mechanisms. Individual speaker analysis indicated that the particular laryngeal and respiratory interactions differed among speakers. The SpeechVive™ device elicited higher vocal intensities from patients with PD. Speakers used different combinations of laryngeal and respiratory physiologic mechanisms to increase vocal intensity, thus suggesting that disease process does not uniformly affect the speech subsystems. Readers will be able to: (1) identify speech characteristics of people with Parkinson's disease (PD), (2) identify typical respiratory strategies for increasing sound pressure level (SPL), (3) identify typical laryngeal strategies for increasing SPL, (4) define the Lombard effect. Copyright © 2014 Elsevier Inc. All rights reserved.

  16. Ultra low bit-rate speech coding

    CERN Document Server

    Ramasubramanian, V

    2015-01-01

    "Ultra Low Bit-Rate Speech Coding" focuses on the specialized topic of speech coding at very low bit-rates of 1 Kbits/sec and less, particularly at the lower ends of this range, down to 100 bps. The authors set forth the fundamental results and trends that form the basis for such ultra low bit-rates to be viable and provide a comprehensive overview of various techniques and systems in literature to date, with particular attention to their work in the paradigm of unit-selection based segment quantization. The book is for research students, academic faculty and researchers, and industry practitioners in the areas of speech processing and speech coding.

  17. The Feature Extraction Based on Texture Image Information for Emotion Sensing in Speech

    Directory of Open Access Journals (Sweden)

    Kun-Ching Wang

    2014-09-01

    Full Text Available In this paper, we present a novel texture image feature for Emotion Sensing in Speech (ESS. This idea is based on the fact that the texture images carry emotion-related information. The feature extraction is derived from time-frequency representation of spectrogram images. First, we transform the spectrogram as a recognizable image. Next, we use a cubic curve to enhance the image contrast. Then, the texture image information (TII derived from the spectrogram image can be extracted by using Laws’ masks to characterize emotional state. In order to evaluate the effectiveness of the proposed emotion recognition in different languages, we use two open emotional databases including the Berlin Emotional Speech Database (EMO-DB and eNTERFACE corpus and one self-recorded database (KHUSC-EmoDB, to evaluate the performance cross-corpora. The results of the proposed ESS system are presented using support vector machine (SVM as a classifier. Experimental results show that the proposed TII-based feature extraction inspired by visual perception can provide significant classification for ESS systems. The two-dimensional (2-D TII feature can provide the discrimination between different emotions in visual expressions except for the conveyance pitch and formant tracks. In addition, the de-noising in 2-D images can be more easily completed than de-noising in 1-D speech.

  18. The benefit of combining a deep neural network architecture with ideal ratio mask estimation in computational speech segregation to improve speech intelligibility.

    Science.gov (United States)

    Bentsen, Thomas; May, Tobias; Kressner, Abigail A; Dau, Torsten

    2018-01-01

    Computational speech segregation attempts to automatically separate speech from noise. This is challenging in conditions with interfering talkers and low signal-to-noise ratios. Recent approaches have adopted deep neural networks and successfully demonstrated speech intelligibility improvements. A selection of components may be responsible for the success with these state-of-the-art approaches: the system architecture, a time frame concatenation technique and the learning objective. The aim of this study was to explore the roles and the relative contributions of these components by measuring speech intelligibility in normal-hearing listeners. A substantial improvement of 25.4 percentage points in speech intelligibility scores was found going from a subband-based architecture, in which a Gaussian Mixture Model-based classifier predicts the distributions of speech and noise for each frequency channel, to a state-of-the-art deep neural network-based architecture. Another improvement of 13.9 percentage points was obtained by changing the learning objective from the ideal binary mask, in which individual time-frequency units are labeled as either speech- or noise-dominated, to the ideal ratio mask, where the units are assigned a continuous value between zero and one. Therefore, both components play significant roles and by combining them, speech intelligibility improvements were obtained in a six-talker condition at a low signal-to-noise ratio.

  19. Individual differneces in degraded speech perception

    Science.gov (United States)

    Carbonell, Kathy M.

    One of the lasting concerns in audiology is the unexplained individual differences in speech perception performance even for individuals with similar audiograms. One proposal is that there are cognitive/perceptual individual differences underlying this vulnerability and that these differences are present in normal hearing (NH) individuals but do not reveal themselves in studies that use clear speech produced in quiet (because of a ceiling effect). However, previous studies have failed to uncover cognitive/perceptual variables that explain much of the variance in NH performance on more challenging degraded speech tasks. This lack of strong correlations may be due to either examining the wrong measures (e.g., working memory capacity) or to there being no reliable differences in degraded speech performance in NH listeners (i.e., variability in performance is due to measurement noise). The proposed project has 3 aims; the first, is to establish whether there are reliable individual differences in degraded speech performance for NH listeners that are sustained both across degradation types (speech in noise, compressed speech, noise-vocoded speech) and across multiple testing sessions. The second aim is to establish whether there are reliable differences in NH listeners' ability to adapt their phonetic categories based on short-term statistics both across tasks and across sessions; and finally, to determine whether performance on degraded speech perception tasks are correlated with performance on phonetic adaptability tasks, thus establishing a possible explanatory variable for individual differences in speech perception for NH and hearing impaired listeners.

  20. Least Squares Estimate of the Initial Phases in STFT based Speech Enhancement

    DEFF Research Database (Denmark)

    Nørholm, Sidsel Marie; Krawczyk-Becker, Martin; Gerkmann, Timo

    2015-01-01

    In this paper, we consider single-channel speech enhancement in the short time Fourier transform (STFT) domain. We suggest to improve an STFT phase estimate by estimating the initial phases. The method is based on the harmonic model and a model for the phase evolution over time. The initial phases...... are estimated by setting up a least squares problem between the noisy phase and the model for phase evolution. Simulations on synthetic and speech signals show a decreased error on the phase when an estimate of the initial phase is included compared to using the noisy phase as an initialisation. The error...... on the phase is decreased at input SNRs from -10 to 10 dB. Reconstructing the signal using the clean amplitude, the mean squared error is decreased and the PESQ score is increased....

  1. Modifying Speech to Children based on their Perceived Phonetic Accuracy

    Science.gov (United States)

    Julien, Hannah M.; Munson, Benjamin

    2014-01-01

    Purpose We examined the relationship between adults' perception of the accuracy of children's speech, and acoustic detail in their subsequent productions to children. Methods Twenty-two adults participated in a task in which they rated the accuracy of 2- and 3-year-old children's word-initial /s/and /∫/ using a visual analog scale (VAS), then produced a token of the same word as if they were responding to the child whose speech they had just rated. Result The duration of adults' fricatives varied as a function of their perception of the accuracy of children's speech: longer fricatives were produced following productions that they rated as inaccurate. This tendency to modify duration in response to perceived inaccurate tokens was mediated by measures of self-reported experience interacting with children. However, speakers did not increase the spectral distinctiveness of their fricatives following the perception of inaccurate tokens. Conclusion These results suggest that adults modify temporal features of their speech in response to perceiving children's inaccurate productions. These longer fricatives are potentially both enhanced input to children, and an error-corrective signal. PMID:22744140

  2. Semifragile Speech Watermarking Based on Least Significant Bit Replacement of Line Spectral Frequencies

    Directory of Open Access Journals (Sweden)

    Mohammad Ali Nematollahi

    2017-01-01

    Full Text Available There are various techniques for speech watermarking based on modifying the linear prediction coefficients (LPCs; however, the estimated and modified LPCs vary from each other even without attacks. Because line spectral frequency (LSF has less sensitivity to watermarking than LPC, watermark bits are embedded into the maximum number of LSFs by applying the least significant bit replacement (LSBR method. To reduce the differences between estimated and modified LPCs, a checking loop is added to minimize the watermark extraction error. Experimental results show that the proposed semifragile speech watermarking method can provide high imperceptibility and that any manipulation of the watermark signal destroys the watermark bits since manipulation changes it to a random stream of bits.

  3. Speech Emotion Feature Selection Method Based on Contribution Analysis Algorithm of Neural Network

    International Nuclear Information System (INIS)

    Wang Xiaojia; Mao Qirong; Zhan Yongzhao

    2008-01-01

    There are many emotion features. If all these features are employed to recognize emotions, redundant features may be existed. Furthermore, recognition result is unsatisfying and the cost of feature extraction is high. In this paper, a method to select speech emotion features based on contribution analysis algorithm of NN is presented. The emotion features are selected by using contribution analysis algorithm of NN from the 95 extracted features. Cluster analysis is applied to analyze the effectiveness for the features selected, and the time of feature extraction is evaluated. Finally, 24 emotion features selected are used to recognize six speech emotions. The experiments show that this method can improve the recognition rate and the time of feature extraction

  4. Modern Tools in Patient-Centred Speech Therapy for Romanian Language

    Directory of Open Access Journals (Sweden)

    Mirela Danubianu

    2016-03-01

    Full Text Available The most common way to communicate with those around us is speech. Suffering from a speech disorder can have negative social effects: from leaving the individuals with low confidence and moral to problems with social interaction and the ability to live independently like adults. The speech therapy intervention is a complex process having particular objectives such as: discovery and identification of speech disorder and directing the therapy to correction, recovery, compensation, adaptation and social integration of patients. Computer-based Speech Therapy systems are a real help for therapists by creating a special learning environment. The Romanian language is a phonetic one, with special linguistic particularities. This paper aims to present a few computer-based speech therapy systems developed for the treatment of various speech disorders specific to Romanian language.

  5. How Prosody Marks Shifts in Footing in Classroom Discourse

    Science.gov (United States)

    Skidmore, David; Murakami, Kyoko

    2010-01-01

    Prosody refers to features of speech such as intonation, volume and pace. In this paper, we examine teacher-student dialogue in an English lesson at a secondary school in England, using Conversation Analysis notation to mark features of prosody. We also make connections with Goffman's theoretical concept of footing. We show that, within an episode…

  6. Research Paper: Investigation of Acoustic Characteristics of Speech Motor Control in Children Who Stutter and Children Who Do Not Stutter

    Directory of Open Access Journals (Sweden)

    Fatemeh Fakar Gharamaleki

    2016-11-01

    Full Text Available Objective Stuttering is a developmental disorder of speech fluency with unknown causes. One of the proposed theories in this field is deficits in speech motor control that is associated with damaged control, timing, and coordination of the speech muscles. Fundamental frequency, fundamental frequency range, intensity, intensity range, and voice onset time are the most important acoustic components that are often used for indirect evaluation of physiological functions underlying the mechanisms of speech motor control. The purpose of this investigation was to compare some of the acoustic characteristics of speech motor control in children who stutter and children who do not stutter. Materials & Methods This research is a descriptive-analytic and cross-sectional comparative study. A total of 25 Azari-Persian bilingual boys who stutter (stutters group and 23 Azari-Persian bilinguals and 21 Persian monolingual boys who do not stutter (non-stutters group in the age range of 6 to 10 years participated in this study. Children participated in /a/ and /i/ vowels prolongation and carrier phrase repetition tasks for the analysis of some of their acoustic characteristics including fundamental frequency, fundamental frequency range, intensity, intensity range, and voice onset time. The PRAAT software was used for acoustic analysis. SPSS software (version 17, one-way ANOVA, and Kruskal-Wallis test were used for analyzing the data. Results The results indicated that there were no significant differences between the stutters and non-stutters groups (P>0.05 with respect to the acoustic features of speech motor control . Conclusion No significant group differences were observed in all of the dependent variables reported in this study. Thus, the results of this research do not support the notion of aberrant speech motor control in children who stutter.

  7. Patterns of poststroke brain damage that predict speech production errors in apraxia of speech and aphasia dissociate.

    Science.gov (United States)

    Basilakos, Alexandra; Rorden, Chris; Bonilha, Leonardo; Moser, Dana; Fridriksson, Julius

    2015-06-01

    Acquired apraxia of speech (AOS) is a motor speech disorder caused by brain damage. AOS often co-occurs with aphasia, a language disorder in which patients may also demonstrate speech production errors. The overlap of speech production deficits in both disorders has raised questions on whether AOS emerges from a unique pattern of brain damage or as a subelement of the aphasic syndrome. The purpose of this study was to determine whether speech production errors in AOS and aphasia are associated with distinctive patterns of brain injury. Forty-three patients with history of a single left-hemisphere stroke underwent comprehensive speech and language testing. The AOS Rating Scale was used to rate speech errors specific to AOS versus speech errors that can also be associated with both AOS and aphasia. Localized brain damage was identified using structural magnetic resonance imaging, and voxel-based lesion-impairment mapping was used to evaluate the relationship between speech errors specific to AOS, those that can occur in AOS or aphasia, and brain damage. The pattern of brain damage associated with AOS was most strongly associated with damage to cortical motor regions, with additional involvement of somatosensory areas. Speech production deficits that could be attributed to AOS or aphasia were associated with damage to the temporal lobe and the inferior precentral frontal regions. AOS likely occurs in conjunction with aphasia because of the proximity of the brain areas supporting speech and language, but the neurobiological substrate for each disorder differs. © 2015 American Heart Association, Inc.

  8. [Improving speech comprehension using a new cochlear implant speech processor].

    Science.gov (United States)

    Müller-Deile, J; Kortmann, T; Hoppe, U; Hessel, H; Morsnowski, A

    2009-06-01

    The aim of this multicenter clinical field study was to assess the benefits of the new Freedom 24 sound processor for cochlear implant (CI) users implanted with the Nucleus 24 cochlear implant system. The study included 48 postlingually profoundly deaf experienced CI users who demonstrated speech comprehension performance with their current speech processor on the Oldenburg sentence test (OLSA) in quiet conditions of at least 80% correct scores and who were able to perform adaptive speech threshold testing using the OLSA in noisy conditions. Following baseline measures of speech comprehension performance with their current speech processor, subjects were upgraded to the Freedom 24 speech processor. After a take-home trial period of at least 2 weeks, subject performance was evaluated by measuring the speech reception threshold with the Freiburg multisyllabic word test and speech intelligibility with the Freiburg monosyllabic word test at 50 dB and 70 dB in the sound field. The results demonstrated highly significant benefits for speech comprehension with the new speech processor. Significant benefits for speech comprehension were also demonstrated with the new speech processor when tested in competing background noise.In contrast, use of the Abbreviated Profile of Hearing Aid Benefit (APHAB) did not prove to be a suitably sensitive assessment tool for comparative subjective self-assessment of hearing benefits with each processor. Use of the preprocessing algorithm known as adaptive dynamic range optimization (ADRO) in the Freedom 24 led to additional improvements over the standard upgrade map for speech comprehension in quiet and showed equivalent performance in noise. Through use of the preprocessing beam-forming algorithm BEAM, subjects demonstrated a highly significant improved signal-to-noise ratio for speech comprehension thresholds (i.e., signal-to-noise ratio for 50% speech comprehension scores) when tested with an adaptive procedure using the Oldenburg

  9. ACOUSTIC SPEECH RECOGNITION FOR MARATHI LANGUAGE USING SPHINX

    Directory of Open Access Journals (Sweden)

    Aman Ankit

    2016-09-01

    Full Text Available Speech recognition or speech to text processing, is a process of recognizing human speech by the computer and converting into text. In speech recognition, transcripts are created by taking recordings of speech as audio and their text transcriptions. Speech based applications which include Natural Language Processing (NLP techniques are popular and an active area of research. Input to such applications is in natural language and output is obtained in natural language. Speech recognition mostly revolves around three approaches namely Acoustic phonetic approach, Pattern recognition approach and Artificial intelligence approach. Creation of acoustic model requires a large database of speech and training algorithms. The output of an ASR system is recognition and translation of spoken language into text by computers and computerized devices. ASR today finds enormous application in tasks that require human machine interfaces like, voice dialing, and etc. Our key contribution in this paper is to create corpora for Marathi language and explore the use of Sphinx engine for automatic speech recognition

  10. Understanding Political Influence in Modern-Era Conflict:A Qualitative Historical Analysis of Hassan Nasrallah’s Speeches

    Directory of Open Access Journals (Sweden)

    Reem Abu-Lughod

    2012-09-01

    Full Text Available Understanding Political Influence in Modern-Era Conflict: A Qualitative Historical Analysis of Hassan Nasrallah’s Speeches 'Abstract' 'This research examines and closely analyzes speeches delivered by Hezbollah’s secretary general and spokesman, Hassan Nasrallah, from a content analysis perspective. We reveal that several significant political phenomena that have occurred in Lebanon were impacted by the intensity of speeches delivered by Nasrallah; these three events being the 2006 War, the Doha Agreement, and the 2008 prisoner exchange. Data has been collected from transcribed speeches and analyzed using a qualitative historical analysis. Furthermore, we use latent analysis to assess Nasrallah’s underlying implications of his speeches and identify the themes he uses to influence his audience.'

  11. Teaching evidence-based speech and language therapy: Influences from formal and informal curriculum

    NARCIS (Netherlands)

    Spek, B.

    2015-01-01

    This dissertation focuses on influences from formal and informal curriculum on the effectiveness of teaching evidence-based speech and language therapy. A study showed that while EBP knowledge and skills increase during the years of study, motivational beliefs such as EBP task value and

  12. Speech-based recognition of self-reported and observed emotion in a dimensional space

    NARCIS (Netherlands)

    Truong, Khiet Phuong; van Leeuwen, David A.; de Jong, Franciska M.G.

    2012-01-01

    The differences between self-reported and observed emotion have only marginally been investigated in the context of speech-based automatic emotion recognition. We address this issue by comparing self-reported emotion ratings to observed emotion ratings and look at how differences between these two

  13. Speech emotion recognition methods: A literature review

    Science.gov (United States)

    Basharirad, Babak; Moradhaseli, Mohammadreza

    2017-10-01

    Recently, attention of the emotional speech signals research has been boosted in human machine interfaces due to availability of high computation capability. There are many systems proposed in the literature to identify the emotional state through speech. Selection of suitable feature sets, design of a proper classifications methods and prepare an appropriate dataset are the main key issues of speech emotion recognition systems. This paper critically analyzed the current available approaches of speech emotion recognition methods based on the three evaluating parameters (feature set, classification of features, accurately usage). In addition, this paper also evaluates the performance and limitations of available methods. Furthermore, it highlights the current promising direction for improvement of speech emotion recognition systems.

  14. CAR2 - Czech Database of Car Speech

    Directory of Open Access Journals (Sweden)

    P. Sovka

    1999-12-01

    Full Text Available This paper presents new Czech language two-channel (stereo speech database recorded in car environment. The created database was designed for experiments with speech enhancement for communication purposes and for the study and the design of a robust speech recognition systems. Tools for automated phoneme labelling based on Baum-Welch re-estimation were realised. The noise analysis of the car background environment was done.

  15. CAR2 - Czech Database of Car Speech

    OpenAIRE

    Pollak, P.; Vopicka, J.; Hanzl, V.; Sovka, Pavel

    1999-01-01

    This paper presents new Czech language two-channel (stereo) speech database recorded in car environment. The created database was designed for experiments with speech enhancement for communication purposes and for the study and the design of a robust speech recognition systems. Tools for automated phoneme labelling based on Baum-Welch re-estimation were realised. The noise analysis of the car background environment was done.

  16. The analysis of speech acts patterns in two Egyptian inaugural speeches

    Directory of Open Access Journals (Sweden)

    Imad Hayif Sameer

    2017-09-01

    Full Text Available The theory of speech acts, which clarifies what people do when they speak, is not about individual words or sentences that form the basic elements of human communication, but rather about particular speech acts that are performed when uttering words. A speech act is the attempt at doing something purely by speaking. Many things can be done by speaking.  Speech acts are studied under what is called speech act theory, and belong to the domain of pragmatics. In this paper, two Egyptian inaugural speeches from El-Sadat and El-Sisi, belonging to different periods were analyzed to find out whether there were differences within this genre in the same culture or not. The study showed that there was a very small difference between these two speeches which were analyzed according to Searle’s theory of speech acts. In El Sadat’s speech, commissives came to occupy the first place. Meanwhile, in El–Sisi’s speech, assertives occupied the first place. Within the speeches of one culture, we can find that the differences depended on the circumstances that surrounded the elections of the Presidents at the time. Speech acts were tools they used to convey what they wanted and to obtain support from their audiences.

  17. Patterns of Post-Stroke Brain Damage that Predict Speech Production Errors in Apraxia of Speech and Aphasia Dissociate

    Science.gov (United States)

    Basilakos, Alexandra; Rorden, Chris; Bonilha, Leonardo; Moser, Dana; Fridriksson, Julius

    2015-01-01

    Background and Purpose Acquired apraxia of speech (AOS) is a motor speech disorder caused by brain damage. AOS often co-occurs with aphasia, a language disorder in which patients may also demonstrate speech production errors. The overlap of speech production deficits in both disorders has raised questions regarding if AOS emerges from a unique pattern of brain damage or as a sub-element of the aphasic syndrome. The purpose of this study was to determine whether speech production errors in AOS and aphasia are associated with distinctive patterns of brain injury. Methods Forty-three patients with history of a single left-hemisphere stroke underwent comprehensive speech and language testing. The Apraxia of Speech Rating Scale was used to rate speech errors specific to AOS versus speech errors that can also be associated with AOS and/or aphasia. Localized brain damage was identified using structural MRI, and voxel-based lesion-impairment mapping was used to evaluate the relationship between speech errors specific to AOS, those that can occur in AOS and/or aphasia, and brain damage. Results The pattern of brain damage associated with AOS was most strongly associated with damage to cortical motor regions, with additional involvement of somatosensory areas. Speech production deficits that could be attributed to AOS and/or aphasia were associated with damage to the temporal lobe and the inferior pre-central frontal regions. Conclusion AOS likely occurs in conjunction with aphasia due to the proximity of the brain areas supporting speech and language, but the neurobiological substrate for each disorder differs. PMID:25908457

  18. Speech and language support: How physicians can identify and treat speech and language delays in the office setting.

    Science.gov (United States)

    Moharir, Madhavi; Barnett, Noel; Taras, Jillian; Cole, Martha; Ford-Jones, E Lee; Levin, Leo

    2014-01-01

    Failure to recognize and intervene early in speech and language delays can lead to multifaceted and potentially severe consequences for early child development and later literacy skills. While routine evaluations of speech and language during well-child visits are recommended, there is no standardized (office) approach to facilitate this. Furthermore, extensive wait times for speech and language pathology consultation represent valuable lost time for the child and family. Using speech and language expertise, and paediatric collaboration, key content for an office-based tool was developed. early and accurate identification of speech and language delays as well as children at risk for literacy challenges; appropriate referral to speech and language services when required; and teaching and, thus, empowering parents to create rich and responsive language environments at home. Using this tool, in combination with the Canadian Paediatric Society's Read, Speak, Sing and Grow Literacy Initiative, physicians will be better positioned to offer practical strategies to caregivers to enhance children's speech and language capabilities. The tool represents a strategy to evaluate speech and language delays. It depicts age-specific linguistic/phonetic milestones and suggests interventions. The tool represents a practical interim treatment while the family is waiting for formal speech and language therapy consultation.

  19. Speech Problems

    Science.gov (United States)

    ... Staying Safe Videos for Educators Search English Español Speech Problems KidsHealth / For Teens / Speech Problems What's in ... a person's ability to speak clearly. Some Common Speech and Language Disorders Stuttering is a problem that ...

  20. Alternative Speech Communication System for Persons with Severe Speech Disorders

    Science.gov (United States)

    Selouani, Sid-Ahmed; Sidi Yakoub, Mohammed; O'Shaughnessy, Douglas

    2009-12-01

    Assistive speech-enabled systems are proposed to help both French and English speaking persons with various speech disorders. The proposed assistive systems use automatic speech recognition (ASR) and speech synthesis in order to enhance the quality of communication. These systems aim at improving the intelligibility of pathologic speech making it as natural as possible and close to the original voice of the speaker. The resynthesized utterances use new basic units, a new concatenating algorithm and a grafting technique to correct the poorly pronounced phonemes. The ASR responses are uttered by the new speech synthesis system in order to convey an intelligible message to listeners. Experiments involving four American speakers with severe dysarthria and two Acadian French speakers with sound substitution disorders (SSDs) are carried out to demonstrate the efficiency of the proposed methods. An improvement of the Perceptual Evaluation of the Speech Quality (PESQ) value of 5% and more than 20% is achieved by the speech synthesis systems that deal with SSD and dysarthria, respectively.

  1. Speech and Language Therapy for Aphasia following Subacute Stroke

    NARCIS (Netherlands)

    Koyuncu, E.; Çam, P.; Altinok, N.; Çalli, D.E.; Yarbay Duman, T.; Özgirgin, N.

    2016-01-01

    The aim of this study was to investigate the time window, duration and intensity of optimal speech and language therapy applied to aphasic patients with subacute stroke in our hospital. The study consisted of 33 patients being hospitalized for stroke rehabilitation in our hospital with first stroke

  2. A Danish open-set speech corpus for competing-speech studies

    DEFF Research Database (Denmark)

    Nielsen, Jens Bo; Dau, Torsten; Neher, Tobias

    2014-01-01

    Studies investigating speech-on-speech masking effects commonly use closed-set speech materials such as the coordinate response measure [Bolia et al. (2000). J. Acoust. Soc. Am. 107, 1065-1066]. However, these studies typically result in very low (i.e., negative) speech recognition thresholds (SRTs......) when the competing speech signals are spatially separated. To achieve higher SRTs that correspond more closely to natural communication situations, an open-set, low-context, multi-talker speech corpus was developed. Three sets of 268 unique Danish sentences were created, and each set was recorded...... with one of three professional female talkers. The intelligibility of each sentence in the presence of speech-shaped noise was measured. For each talker, 200 approximately equally intelligible sentences were then selected and systematically distributed into 10 test lists. Test list homogeneity was assessed...

  3. Speech entrainment enables patients with Broca’s aphasia to produce fluent speech

    Science.gov (United States)

    Hubbard, H. Isabel; Hudspeth, Sarah Grace; Holland, Audrey L.; Bonilha, Leonardo; Fromm, Davida; Rorden, Chris

    2012-01-01

    A distinguishing feature of Broca’s aphasia is non-fluent halting speech typically involving one to three words per utterance. Yet, despite such profound impairments, some patients can mimic audio-visual speech stimuli enabling them to produce fluent speech in real time. We call this effect ‘speech entrainment’ and reveal its neural mechanism as well as explore its usefulness as a treatment for speech production in Broca’s aphasia. In Experiment 1, 13 patients with Broca’s aphasia were tested in three conditions: (i) speech entrainment with audio-visual feedback where they attempted to mimic a speaker whose mouth was seen on an iPod screen; (ii) speech entrainment with audio-only feedback where patients mimicked heard speech; and (iii) spontaneous speech where patients spoke freely about assigned topics. The patients produced a greater variety of words using audio-visual feedback compared with audio-only feedback and spontaneous speech. No difference was found between audio-only feedback and spontaneous speech. In Experiment 2, 10 of the 13 patients included in Experiment 1 and 20 control subjects underwent functional magnetic resonance imaging to determine the neural mechanism that supports speech entrainment. Group results with patients and controls revealed greater bilateral cortical activation for speech produced during speech entrainment compared with spontaneous speech at the junction of the anterior insula and Brodmann area 47, in Brodmann area 37, and unilaterally in the left middle temporal gyrus and the dorsal portion of Broca’s area. Probabilistic white matter tracts constructed for these regions in the normal subjects revealed a structural network connected via the corpus callosum and ventral fibres through the extreme capsule. Unilateral areas were connected via the arcuate fasciculus. In Experiment 3, all patients included in Experiment 1 participated in a 6-week treatment phase using speech entrainment to improve speech production

  4. Acquired apraxia of speech: features, accounts, and treatment.

    Science.gov (United States)

    Peach, Richard K

    2004-01-01

    The features of apraxia of speech (AOS) are presented with regard to both traditional and contemporary descriptions of the disorder. Models of speech processing, including the neurological bases for apraxia of speech, are discussed. Recent findings concerning subcortical contributions to apraxia of speech and the role of the insula are presented. The key features to differentially diagnose AOS from related speech syndromes are identified. Treatment implications derived from motor accounts of AOS are presented along with a summary of current approaches designed to treat the various subcomponents of the disorder. Finally, guidelines are provided for treating the AOS patient with coexisting aphasia.

  5. Robust Transmission of Speech LSFs Using Hidden Markov Model-Based Multiple Description Index Assignments

    Directory of Open Access Journals (Sweden)

    Rondeau Paul

    2008-01-01

    Full Text Available Speech coding techniques capable of generating encoded representations which are robust against channel losses play an important role in enabling reliable voice communication over packet networks and mobile wireless systems. In this paper, we investigate the use of multiple description index assignments (MDIAs for loss-tolerant transmission of line spectral frequency (LSF coefficients, typically generated by state-of-the-art speech coders. We propose a simulated annealing-based approach for optimizing MDIAs for Markov-model-based decoders which exploit inter- and intraframe correlations in LSF coefficients to reconstruct the quantized LSFs from coded bit streams corrupted by channel losses. Experimental results are presented which compare the performance of a number of novel LSF transmission schemes. These results clearly demonstrate that Markov-model-based decoders, when used in conjunction with optimized MDIA, can yield average spectral distortion much lower than that produced by methods such as interleaving/interpolation, commonly used to combat the packet losses.

  6. Robust Transmission of Speech LSFs Using Hidden Markov Model-Based Multiple Description Index Assignments

    Directory of Open Access Journals (Sweden)

    Pradeepa Yahampath

    2008-03-01

    Full Text Available Speech coding techniques capable of generating encoded representations which are robust against channel losses play an important role in enabling reliable voice communication over packet networks and mobile wireless systems. In this paper, we investigate the use of multiple description index assignments (MDIAs for loss-tolerant transmission of line spectral frequency (LSF coefficients, typically generated by state-of-the-art speech coders. We propose a simulated annealing-based approach for optimizing MDIAs for Markov-model-based decoders which exploit inter- and intraframe correlations in LSF coefficients to reconstruct the quantized LSFs from coded bit streams corrupted by channel losses. Experimental results are presented which compare the performance of a number of novel LSF transmission schemes. These results clearly demonstrate that Markov-model-based decoders, when used in conjunction with optimized MDIA, can yield average spectral distortion much lower than that produced by methods such as interleaving/interpolation, commonly used to combat the packet losses.

  7. Multimodal Speech Capture System for Speech Rehabilitation and Learning.

    Science.gov (United States)

    Sebkhi, Nordine; Desai, Dhyey; Islam, Mohammad; Lu, Jun; Wilson, Kimberly; Ghovanloo, Maysam

    2017-11-01

    Speech-language pathologists (SLPs) are trained to correct articulation of people diagnosed with motor speech disorders by analyzing articulators' motion and assessing speech outcome while patients speak. To assist SLPs in this task, we are presenting the multimodal speech capture system (MSCS) that records and displays kinematics of key speech articulators, the tongue and lips, along with voice, using unobtrusive methods. Collected speech modalities, tongue motion, lips gestures, and voice are visualized not only in real-time to provide patients with instant feedback but also offline to allow SLPs to perform post-analysis of articulators' motion, particularly the tongue, with its prominent but hardly visible role in articulation. We describe the MSCS hardware and software components, and demonstrate its basic visualization capabilities by a healthy individual repeating the words "Hello World." A proof-of-concept prototype has been successfully developed for this purpose, and will be used in future clinical studies to evaluate its potential impact on accelerating speech rehabilitation by enabling patients to speak naturally. Pattern matching algorithms to be applied to the collected data can provide patients with quantitative and objective feedback on their speech performance, unlike current methods that are mostly subjective, and may vary from one SLP to another.

  8. Speech Motor Control in Fluent and Dysfluent Speech Production of an Individual with Apraxia of Speech and Broca's Aphasia

    Science.gov (United States)

    van Lieshout, Pascal H. H. M.; Bose, Arpita; Square, Paula A.; Steele, Catriona M.

    2007-01-01

    Apraxia of speech (AOS) is typically described as a motor-speech disorder with clinically well-defined symptoms, but without a clear understanding of the underlying problems in motor control. A number of studies have compared the speech of subjects with AOS to the fluent speech of controls, but only a few have included speech movement data and if…

  9. Acoustic properties of naturally produced clear speech at normal speaking rates

    Science.gov (United States)

    Krause, Jean C.; Braida, Louis D.

    2004-01-01

    Sentences spoken ``clearly'' are significantly more intelligible than those spoken ``conversationally'' for hearing-impaired listeners in a variety of backgrounds [Picheny et al., J. Speech Hear. Res. 28, 96-103 (1985); Uchanski et al., ibid. 39, 494-509 (1996); Payton et al., J. Acoust. Soc. Am. 95, 1581-1592 (1994)]. While producing clear speech, however, talkers often reduce their speaking rate significantly [Picheny et al., J. Speech Hear. Res. 29, 434-446 (1986); Uchanski et al., ibid. 39, 494-509 (1996)]. Yet speaking slowly is not solely responsible for the intelligibility benefit of clear speech (over conversational speech), since a recent study [Krause and Braida, J. Acoust. Soc. Am. 112, 2165-2172 (2002)] showed that talkers can produce clear speech at normal rates with training. This finding suggests that clear speech has inherent acoustic properties, independent of rate, that contribute to improved intelligibility. Identifying these acoustic properties could lead to improved signal processing schemes for hearing aids. To gain insight into these acoustical properties, conversational and clear speech produced at normal speaking rates were analyzed at three levels of detail (global, phonological, and phonetic). Although results suggest that talkers may have employed different strategies to achieve clear speech at normal rates, two global-level properties were identified that appear likely to be linked to the improvements in intelligibility provided by clear/normal speech: increased energy in the 1000-3000-Hz range of long-term spectra and increased modulation depth of low frequency modulations of the intensity envelope. Other phonological and phonetic differences associated with clear/normal speech include changes in (1) frequency of stop burst releases, (2) VOT of word-initial voiceless stop consonants, and (3) short-term vowel spectra.

  10. The Effects of Speech and Language Therapy Intervention on Children with Pragmatic Language Impairments in Mainstream School

    Science.gov (United States)

    Adams, Catherine; Lloyd, Julian

    2007-01-01

    In this article, Catherine Adams, clinical senior lecturer in speech and language therapy at the University of Manchester, and Julian Lloyd, senior lecturer in psychology at Newman College, Birmingham, describe the implementation and effects of an intensive programme of speech and language therapy for children who have pragmatic language…

  11. The role of across-frequency envelope processing for speech intelligibility

    DEFF Research Database (Denmark)

    Chabot-Leclerc, Alexandre; Jørgensen, Søren; Dau, Torsten

    2013-01-01

    Speech intelligibility models consist of a preprocessing part that transforms the stimuli into some internal (auditory) representation, and a decision metric that quantifies effects of transmission channel, speech interferers, and auditory processing on the speech intelligibility. Here, two recent...... speech intelligibility models, the spectro-temporal modulation index [STMI; Elhilali et al. (2003)] and the speech-based envelope power spectrum model [sEPSM; Jørgensen and Dau (2011)] were evaluated in conditions of noisy speech subjected to reverberation, and to nonlinear distortions through either...

  12. The role of across-frequency envelope processing for speech intelligibility

    DEFF Research Database (Denmark)

    Chabot-Leclerc, Alexandre; Jørgensen, Søren; Dau, Torsten

    2013-01-01

    Speech intelligibility models consist of a preprocessing part that transforms the stimuli into some internal (auditory) representation, and a decision metric that quantifies effects of transmission channel, speech interferers, and auditory processing on the speech intelligibility. Here, two recent...... speech intelligibility models, the spectro-temporal modulation index (STMI; Elhilali et al., 2003) and the speech-based envelope power spectrum model (sEPSM; Jørgensen and Dau, 2011) were evaluated in conditions of noisy speech subjected to reverberation, and to nonlinear distortions through either...

  13. Modeling speech intelligibility in adverse conditions

    DEFF Research Database (Denmark)

    Dau, Torsten

    2012-01-01

    ) in conditions with nonlinearly processed speech. Instead of considering the reduction of the temporal modulation energy as the intelligibility metric, as assumed in the STI, the sEPSM applies the signal-to-noise ratio in the envelope domain (SNRenv). This metric was shown to be the key for predicting...... understanding speech when more than one person is talking, even when reduced audibility has been fully compensated for by a hearing aid. The reasons for these difficulties are not well understood. This presentation highlights recent concepts of the monaural and binaural signal processing strategies employed...... by the normal as well as impaired auditory system. Jørgensen and Dau [(2011). J. Acoust. Soc. Am. 130, 1475-1487] proposed the speech-based envelope power spectrum model (sEPSM) in an attempt to overcome the limitations of the classical speech transmission index (STI) and speech intelligibility index (SII...

  14. Detecting Nasal Vowels in Speech Interfaces Based on Surface Electromyography.

    Directory of Open Access Journals (Sweden)

    João Freitas

    Full Text Available Nasality is a very important characteristic of several languages, European Portuguese being one of them. This paper addresses the challenge of nasality detection in surface electromyography (EMG based speech interfaces. We explore the existence of useful information about the velum movement and also assess if muscles deeper down in the face and neck region can be measured using surface electrodes, and the best electrode location to do so. The procedure we adopted uses Real-Time Magnetic Resonance Imaging (RT-MRI, collected from a set of speakers, providing a method to interpret EMG data. By ensuring compatible data recording conditions, and proper time alignment between the EMG and the RT-MRI data, we are able to accurately estimate the time when the velum moves and the type of movement when a nasal vowel occurs. The combination of these two sources revealed interesting and distinct characteristics in the EMG signal when a nasal vowel is uttered, which motivated a classification experiment. Overall results of this experiment provide evidence that it is possible to detect velum movement using sensors positioned below the ear, between mastoid process and the mandible, in the upper neck region. In a frame-based classification scenario, error rates as low as 32.5% for all speakers and 23.4% for the best speaker have been achieved, for nasal vowel detection. This outcome stands as an encouraging result, fostering the grounds for deeper exploration of the proposed approach as a promising route to the development of an EMG-based speech interface for languages with strong nasal characteristics.

  15. When Arguments and Sounds of Words Begin to Speak - About Learning rhetorics

    Directory of Open Access Journals (Sweden)

    Tatjana Zidar

    1995-12-01

    Full Text Available There has been a growing awareness as to the need of knowledge to improvc public speaking and interpersonal communication. When limiting our consideration to two essential elements contributing to efficient public speakiog, i.e. the strength of argumcntation and voice modulation of the text, one becomes aware of how important argumentation is. It is necessary for a speaker to fonn efficient arguments by taking into account his or her ethical features. Further, there are at least five types of argumentation (deduction, induction, analogous, causative and referential concluding. There is a need, on the other hand, to avoid sly and dishonest argumentation (evasion, dangerous speculations, ... etc.. Nevertheless, the argumentation alone does not suffice. Speakers should develop a feeling enabling them to teli when the speech is good, i e. how to catch attention of the audience, how to conclude efficiently, how to be dramatic, how to order arguments, what language to use. To this purpose three basic elements of persuasion are used: etos, patos and logos. Other elements can help forwarding the message, such as: intonation, language register, colour and intensity of speech, tempo and pronunciation. To enhance public speaking the content and the speech form are to be linked.

  16. Sector-Based Detection for Hands-Free Speech Enhancement in Cars

    Directory of Open Access Journals (Sweden)

    Bourgeois Julien

    2006-01-01

    Full Text Available Adaptation control of beamforming interference cancellation techniques is investigated for in-car speech acquisition. Two efficient adaptation control methods are proposed that avoid target cancellation. The "implicit" method varies the step-size continuously, based on the filtered output signal. The "explicit" method decides in a binary manner whether to adapt or not, based on a novel estimate of target and interference energies. It estimates the average delay-sum power within a volume of space, for the same cost as the classical delay-sum. Experiments on real in-car data validate both methods, including a case with km/h background road noise.

  17. Evidence-Based Speech-Language Pathology Practices in Schools: Findings from a National Survey

    Science.gov (United States)

    Hoffman, LaVae M.; Ireland, Marie; Hall-Mills, Shannon; Flynn, Perry

    2013-01-01

    Purpose: This study documented evidence-based practice (EBP) patterns as reported by speech-language pathologists (SLPs) employed in public schools during 2010-2011. Method: Using an online survey, practioners reported their EBP training experiences, resources available in their workplaces, and the frequency with which they engage in specific EBP…

  18. Acquirement and enhancement of remote speech signals

    Science.gov (United States)

    Lü, Tao; Guo, Jin; Zhang, He-yong; Yan, Chun-hui; Wang, Can-jin

    2017-07-01

    To address the challenges of non-cooperative and remote acoustic detection, an all-fiber laser Doppler vibrometer (LDV) is established. The all-fiber LDV system can offer the advantages of smaller size, lightweight design and robust structure, hence it is a better fit for remote speech detection. In order to improve the performance and the efficiency of LDV for long-range hearing, the speech enhancement technology based on optimally modified log-spectral amplitude (OM-LSA) algorithm is used. The experimental results show that the comprehensible speech signals within the range of 150 m can be obtained by the proposed LDV. The signal-to-noise ratio ( SNR) and mean opinion score ( MOS) of the LDV speech signal can be increased by 100% and 27%, respectively, by using the speech enhancement technology. This all-fiber LDV, which combines the speech enhancement technology, can meet the practical demand in engineering.

  19. Identification of speech transients using variable frame rate analysis and wavelet packets.

    Science.gov (United States)

    Rasetshwane, Daniel M; Boston, J Robert; Li, Ching-Chung

    2006-01-01

    Speech transients are important cues for identifying and discriminating speech sounds. Yoo et al. and Tantibundhit et al. were successful in identifying speech transients and, emphasizing them, improving the intelligibility of speech in noise. However, their methods are computationally intensive and unsuitable for real-time applications. This paper presents a method to identify and emphasize speech transients that combines subband decomposition by the wavelet packet transform with variable frame rate (VFR) analysis and unvoiced consonant detection. The VFR analysis is applied to each wavelet packet to define a transitivity function that describes the extent to which the wavelet coefficients of that packet are changing. Unvoiced consonant detection is used to identify unvoiced consonant intervals and the transitivity function is amplified during these intervals. The wavelet coefficients are multiplied by the transitivity function for that packet, amplifying the coefficients localized at times when they are changing and attenuating coefficients at times when they are steady. Inverse transform of the modified wavelet packet coefficients produces a signal corresponding to speech transients similar to the transients identified by Yoo et al. and Tantibundhit et al. A preliminary implementation of the algorithm runs more efficiently.

  20. Speech and language support: How physicians can identify and treat speech and language delays in the office setting

    Science.gov (United States)

    Moharir, Madhavi; Barnett, Noel; Taras, Jillian; Cole, Martha; Ford-Jones, E Lee; Levin, Leo

    2014-01-01

    Failure to recognize and intervene early in speech and language delays can lead to multifaceted and potentially severe consequences for early child development and later literacy skills. While routine evaluations of speech and language during well-child visits are recommended, there is no standardized (office) approach to facilitate this. Furthermore, extensive wait times for speech and language pathology consultation represent valuable lost time for the child and family. Using speech and language expertise, and paediatric collaboration, key content for an office-based tool was developed. The tool aimed to help physicians achieve three main goals: early and accurate identification of speech and language delays as well as children at risk for literacy challenges; appropriate referral to speech and language services when required; and teaching and, thus, empowering parents to create rich and responsive language environments at home. Using this tool, in combination with the Canadian Paediatric Society’s Read, Speak, Sing and Grow Literacy Initiative, physicians will be better positioned to offer practical strategies to caregivers to enhance children’s speech and language capabilities. The tool represents a strategy to evaluate speech and language delays. It depicts age-specific linguistic/phonetic milestones and suggests interventions. The tool represents a practical interim treatment while the family is waiting for formal speech and language therapy consultation. PMID:24627648

  1. Variable Rate Characteristic Waveform Interpolation Speech Coder Based on Phonetic Classification

    Institute of Scientific and Technical Information of China (English)

    WANG Jing; KUANG Jing-ming; ZHAO Sheng-hui

    2007-01-01

    A variable-bit-rate characteristic waveform interpolation (VBR-CWI) speech codec with about 1.8kbit/s average bit rate which integrates phonetic classification into characteristic waveform (CW) decomposition is proposed.Each input frame is classified into one of 4 phonetic classes.Non-speech frames are represented with Bark-band noise model.The extracted CWs become rapidly evolving waveforms (REWs) or slowly evolving waveforms (SEWs) in the cases of unvoiced or stationary voiced frames respectively, while mixed voiced frames use the same CW decomposition as that in the conventional CWI.Experimental results show that the proposed codec can eliminate most buzzy and noisy artifacts existing in the fixed-bit-rate characteristic waveform interpolation (FBR-CWI) speech codec, the average bit rate can be much lower, and its reconstructed speech quality is much better than FS 1016 CELP at 4.8kbit/s and similar to G.723.1 ACELP at 5.3kbit/s.

  2. Speech-Based Human and Service Robot Interaction: An Application for Mexican Dysarthric People

    Directory of Open Access Journals (Sweden)

    Santiago Omar Caballero Morales

    2013-01-01

    Full Text Available Dysarthria is a motor speech disorder due to weakness or poor coordination of the speech muscles. This condition can be caused by a stroke, traumatic brain injury, or by a degenerative neurological disease. Commonly, people with this disorder also have muscular dystrophy, which restricts their use of switches or keyboards for communication or control of assistive devices (i.e., an electric wheelchair or a service robot. In this case, speech recognition is an attractive alternative for interaction and control of service robots, despite the difficulty of achieving robust recognition performance. In this paper we present a speech recognition system for human and service robot interaction for Mexican Spanish dysarthric speakers. The core of the system consisted of a Speaker Adaptive (SA recognition system trained with normal-speech. Features such as on-line control of the language model perplexity and the adding of vocabulary, contribute to high recognition performance. Others, such as assessment and text-to-speech (TTS synthesis, contribute to a more complete interaction with a service robot. Live tests were performed with two mild dysarthric speakers, achieving recognition accuracies of 90–95% for spontaneous speech and 95–100% of accomplished simulated service robot tasks.

  3. Brain networks engaged in audiovisual integration during speech perception revealed by persistent homology-based network filtration.

    Science.gov (United States)

    Kim, Heejung; Hahm, Jarang; Lee, Hyekyoung; Kang, Eunjoo; Kang, Hyejin; Lee, Dong Soo

    2015-05-01

    The human brain naturally integrates audiovisual information to improve speech perception. However, in noisy environments, understanding speech is difficult and may require much effort. Although the brain network is supposed to be engaged in speech perception, it is unclear how speech-related brain regions are connected during natural bimodal audiovisual or unimodal speech perception with counterpart irrelevant noise. To investigate the topological changes of speech-related brain networks at all possible thresholds, we used a persistent homological framework through hierarchical clustering, such as single linkage distance, to analyze the connected component of the functional network during speech perception using functional magnetic resonance imaging. For speech perception, bimodal (audio-visual speech cue) or unimodal speech cues with counterpart irrelevant noise (auditory white-noise or visual gum-chewing) were delivered to 15 subjects. In terms of positive relationship, similar connected components were observed in bimodal and unimodal speech conditions during filtration. However, during speech perception by congruent audiovisual stimuli, the tighter couplings of left anterior temporal gyrus-anterior insula component and right premotor-visual components were observed than auditory or visual speech cue conditions, respectively. Interestingly, visual speech is perceived under white noise by tight negative coupling in the left inferior frontal region-right anterior cingulate, left anterior insula, and bilateral visual regions, including right middle temporal gyrus, right fusiform components. In conclusion, the speech brain network is tightly positively or negatively connected, and can reflect efficient or effortful processes during natural audiovisual integration or lip-reading, respectively, in speech perception.

  4. Human phoneme recognition depending on speech-intrinsic variability.

    Science.gov (United States)

    Meyer, Bernd T; Jürgens, Tim; Wesker, Thorsten; Brand, Thomas; Kollmeier, Birger

    2010-11-01

    The influence of different sources of speech-intrinsic variation (speaking rate, effort, style and dialect or accent) on human speech perception was investigated. In listening experiments with 16 listeners, confusions of consonant-vowel-consonant (CVC) and vowel-consonant-vowel (VCV) sounds in speech-weighted noise were analyzed. Experiments were based on the OLLO logatome speech database, which was designed for a man-machine comparison. It contains utterances spoken by 50 speakers from five dialect/accent regions and covers several intrinsic variations. By comparing results depending on intrinsic and extrinsic variations (i.e., different levels of masking noise), the degradation induced by variabilities can be expressed in terms of the SNR. The spectral level distance between the respective speech segment and the long-term spectrum of the masking noise was found to be a good predictor for recognition rates, while phoneme confusions were influenced by the distance to spectrally close phonemes. An analysis based on transmitted information of articulatory features showed that voicing and manner of articulation are comparatively robust cues in the presence of intrinsic variations, whereas the coding of place is more degraded. The database and detailed results have been made available for comparisons between human speech recognition (HSR) and automatic speech recognizers (ASR).

  5. Robust digital processing of speech signals

    CERN Document Server

    Kovacevic, Branko; Veinović, Mladen; Marković, Milan

    2017-01-01

    This book focuses on speech signal phenomena, presenting a robustification of the usual speech generation models with regard to the presumed types of excitation signals, which is equivalent to the introduction of a class of nonlinear models and the corresponding criterion functions for parameter estimation. Compared to the general class of nonlinear models, such as various neural networks, these models possess good properties of controlled complexity, the option of working in “online” mode, as well as a low information volume for efficient speech encoding and transmission. Providing comprehensive insights, the book is based on the authors’ research, which has already been published, supplemented by additional texts discussing general considerations of speech modeling, linear predictive analysis and robust parameter estimation.

  6. Speech Denoising in White Noise Based on Signal Subspace Low-rank Plus Sparse Decomposition

    Directory of Open Access Journals (Sweden)

    yuan Shuai

    2017-01-01

    Full Text Available In this paper, a new subspace speech enhancement method using low-rank and sparse decomposition is presented. In the proposed method, we firstly structure the corrupted data as a Toeplitz matrix and estimate its effective rank for the underlying human speech signal. Then the low-rank and sparse decomposition is performed with the guidance of speech rank value to remove the noise. Extensive experiments have been carried out in white Gaussian noise condition, and experimental results show the proposed method performs better than conventional speech enhancement methods, in terms of yielding less residual noise and lower speech distortion.

  7. Perceptions of University Instructors When Listening to International Student Speech

    Science.gov (United States)

    Sheppard, Beth; Elliott, Nancy; Baese-Berk, Melissa

    2017-01-01

    Intensive English Program (IEP) Instructors and content faculty both listen to international students at the university. For these two groups of instructors, this study compared perceptions of international student speech by collecting comprehensibility ratings and transcription samples for intelligibility scores. No significant differences were…

  8. Speech Recognition for the iCub Platform

    Directory of Open Access Journals (Sweden)

    Bertrand Higy

    2018-02-01

    Full Text Available This paper describes open source software (available at https://github.com/robotology/natural-speech to build automatic speech recognition (ASR systems and run them within the YARP platform. The toolkit is designed (i to allow non-ASR experts to easily create their own ASR system and run it on iCub and (ii to build deep learning-based models specifically addressing the main challenges an ASR system faces in the context of verbal human–iCub interactions. The toolkit mostly consists of Python, C++ code and shell scripts integrated in YARP. As additional contribution, a second codebase (written in Matlab is provided for more expert ASR users who want to experiment with bio-inspired and developmental learning-inspired ASR systems. Specifically, we provide code for two distinct kinds of speech recognition: “articulatory” and “unsupervised” speech recognition. The first is largely inspired by influential neurobiological theories of speech perception which assume speech perception to be mediated by brain motor cortex activities. Our articulatory systems have been shown to outperform strong deep learning-based baselines. The second type of recognition systems, the “unsupervised” systems, do not use any supervised information (contrary to most ASR systems, including our articulatory systems. To some extent, they mimic an infant who has to discover the basic speech units of a language by herself. In addition, we provide resources consisting of pre-trained deep learning models for ASR, and a 2.5-h speech dataset of spoken commands, the VoCub dataset, which can be used to adapt an ASR system to the typical acoustic environments in which iCub operates.

  9. Listening to Yourself Is like Listening to Others: External, but Not Internal, Verbal Self-Monitoring Is Based on Speech Perception

    Science.gov (United States)

    Huettig, Falk; Hartsuiker, Robert J.

    2010-01-01

    Theories of verbal self-monitoring generally assume an internal (pre-articulatory) monitoring channel, but there is debate about whether this channel relies on speech perception or on production-internal mechanisms. Perception-based theories predict that listening to one's own inner speech has similar behavioural consequences as listening to…

  10. Intelligibility for Binaural Speech with Discarded Low-SNR Speech Components.

    Science.gov (United States)

    Schoenmaker, Esther; van de Par, Steven

    2016-01-01

    Speech intelligibility in multitalker settings improves when the target speaker is spatially separated from the interfering speakers. A factor that may contribute to this improvement is the improved detectability of target-speech components due to binaural interaction in analogy to the Binaural Masking Level Difference (BMLD). This would allow listeners to hear target speech components within specific time-frequency intervals that have a negative SNR, similar to the improvement in the detectability of a tone in noise when these contain disparate interaural difference cues. To investigate whether these negative-SNR target-speech components indeed contribute to speech intelligibility, a stimulus manipulation was performed where all target components were removed when local SNRs were smaller than a certain criterion value. It can be expected that for sufficiently high criterion values target speech components will be removed that do contribute to speech intelligibility. For spatially separated speakers, assuming that a BMLD-like detection advantage contributes to intelligibility, degradation in intelligibility is expected already at criterion values below 0 dB SNR. However, for collocated speakers it is expected that higher criterion values can be applied without impairing speech intelligibility. Results show that degradation of intelligibility for separated speakers is only seen for criterion values of 0 dB and above, indicating a negligible contribution of a BMLD-like detection advantage in multitalker settings. These results show that the spatial benefit is related to a spatial separation of speech components at positive local SNRs rather than to a BMLD-like detection improvement for speech components at negative local SNRs.

  11. An experimental Dutch keyboard-to-speech system for the speech impaired

    NARCIS (Netherlands)

    Deliege, R.J.H.

    1989-01-01

    An experimental Dutch keyboard-to-speech system has been developed to explor the possibilities and limitations of Dutch speech synthesis in a communication aid for the speech impaired. The system uses diphones and a formant synthesizer chip for speech synthesis. Input to the system is in

  12. Freedom of racist speech: Ego and expressive threats.

    Science.gov (United States)

    White, Mark H; Crandall, Christian S

    2017-09-01

    Do claims of "free speech" provide cover for prejudice? We investigate whether this defense of racist or hate speech serves as a justification for prejudice. In a series of 8 studies (N = 1,624), we found that explicit racial prejudice is a reliable predictor of the "free speech defense" of racist expression. Participants endorsed free speech values for singing racists songs or posting racist comments on social media; people high in prejudice endorsed free speech more than people low in prejudice (meta-analytic r = .43). This endorsement was not principled-high levels of prejudice did not predict endorsement of free speech values when identical speech was directed at coworkers or the police. Participants low in explicit racial prejudice actively avoided endorsing free speech values in racialized conditions compared to nonracial conditions, but participants high in racial prejudice increased their endorsement of free speech values in racialized conditions. Three experiments failed to find evidence that defense of racist speech by the highly prejudiced was based in self-relevant or self-protective motives. Two experiments found evidence that the free speech argument protected participants' own freedom to express their attitudes; the defense of other's racist speech seems motivated more by threats to autonomy than threats to self-regard. These studies serve as an elaboration of the Justification-Suppression Model (Crandall & Eshleman, 2003) of prejudice expression. The justification of racist speech by endorsing fundamental political values can serve to buffer racial and hate speech from normative disapproval. (PsycINFO Database Record (c) 2017 APA, all rights reserved).

  13. Planning community-based intervention for speech for children with cleft lip and palate from rural South India: A needs assessment

    Directory of Open Access Journals (Sweden)

    Subramaniyan Balasubramaniyan

    2017-01-01

    Full Text Available Background and Aim: A community-based rehabilitation programme, Sri Ramachandra University-Transforming Faces project, was initiated to provide comprehensive management of communication disorders in individuals with CLP in two districts in Tamil Nadu, India. This community-based programme aims to integrate hospital-based services with the community-based initiatives and to enable long-term care. The programme was initiated in Thiruvannamalai (2005 district and extended to Cuddalore (2011. The aim of this study was to identify needs related to speech among children with CLP, enroled in the above community-based programme in two districts in Tamil Nadu, India. Design: This was a cross–sectional study. Participants and Setting: Ten camps were conducted specifically for speech assessments in two districts over a 12-month period. Two hundred and seventeen individuals (116 males and 101 females> 3 years of age reported to the camps. Methods: Investigator (SLP collected data using the speech protocol of the cleft and craniofacial centre. Descriptive analysis and profiling of speech samples were carried out and reported using universal protocol for reporting speech outcomes. Fleiss' Kappa test was used to estimate inter-rater reliability. Results: In this study, inter-rater reliability between three evaluators revealed good agreement for the parameters: resonance, articulatory errors and voice disorder. About 83.8% (n = 151/180 of the participants demonstrated errors in articulation and 69% (n = 124/180 exhibited abnormal resonance. Velopharyngeal port functioning assessment was completed for 55/124 participants. Conclusion: This study allows us to capture a “snapshot” of children with CLP, living in a specific geographical location, and assist in planning intervention programmes.

  14. Speech Function and Speech Role in Carl Fredricksen's Dialogue on Up Movie

    OpenAIRE

    Rehana, Ridha; Silitonga, Sortha

    2013-01-01

    One aim of this article is to show through a concrete example how speech function and speech role used in movie. The illustrative example is taken from the dialogue of Up movie. Central to the analysis proper form of dialogue on Up movie that contain of speech function and speech role; i.e. statement, offer, question, command, giving, and demanding. 269 dialogue were interpreted by actor, and it was found that the use of speech function and speech role.

  15. Using Web Speech Technology with Language Learning Applications

    Science.gov (United States)

    Daniels, Paul

    2015-01-01

    In this article, the author presents the history of human-to-computer interaction based upon the design of sophisticated computerized speech recognition algorithms. Advancements such as the arrival of cloud-based computing and software like Google's Web Speech API allows anyone with an Internet connection and Chrome browser to take advantage of…

  16. Applying Corpus-Based Findings to Form-Focused Instruction: The Case of Reported Speech

    Science.gov (United States)

    Barbieri, Federica; Eckhardt, Suzanne E. B.

    2007-01-01

    Arguing that the introduction of corpus linguistics in teaching materials and the language classroom should be informed by theories and principles of SLA, this paper presents a case study illustrating how corpus-based findings on reported speech can be integrated into a form-focused model of instruction. After overviewing previous work which…

  17. Speech Telepractice: Installing a Speech Therapy Upgrade for the 21st Century

    Directory of Open Access Journals (Sweden)

    Michael P. Towey

    2012-12-01

    Full Text Available Much of speech therapy involves the clinician guiding the therapeutic process (e.g., presenting stimuli and eliciting client responses; however, this Brief Communication describes a different approach to speech therapy delivery. Clinicians at Waldo County General Hospital (WCGH use high definition audio and video to engage clients in telepractice using interactive web-based virtual environments. This technology enables clients and their clinicians to co-create salient treatment activities using authentic materials captured via digital cameras, video and/or curricular materials.  Both therapists and clients manipulate the materials and interact online in real-time. The web-based technology engenders highly personalized and engaging activities, such that clients’ interactions with these high interest tasks often continue well beyond the therapy sessions.

  18. The Hierarchical Cortical Organization of Human Speech Processing.

    Science.gov (United States)

    de Heer, Wendy A; Huth, Alexander G; Griffiths, Thomas L; Gallant, Jack L; Theunissen, Frédéric E

    2017-07-05

    Speech comprehension requires that the brain extract semantic meaning from the spectral features represented at the cochlea. To investigate this process, we performed an fMRI experiment in which five men and two women passively listened to several hours of natural narrative speech. We then used voxelwise modeling to predict BOLD responses based on three different feature spaces that represent the spectral, articulatory, and semantic properties of speech. The amount of variance explained by each feature space was then assessed using a separate validation dataset. Because some responses might be explained equally well by more than one feature space, we used a variance partitioning analysis to determine the fraction of the variance that was uniquely explained by each feature space. Consistent with previous studies, we found that speech comprehension involves hierarchical representations starting in primary auditory areas and moving laterally on the temporal lobe: spectral features are found in the core of A1, mixtures of spectral and articulatory in STG, mixtures of articulatory and semantic in STS, and semantic in STS and beyond. Our data also show that both hemispheres are equally and actively involved in speech perception and interpretation. Further, responses as early in the auditory hierarchy as in STS are more correlated with semantic than spectral representations. These results illustrate the importance of using natural speech in neurolinguistic research. Our methodology also provides an efficient way to simultaneously test multiple specific hypotheses about the representations of speech without using block designs and segmented or synthetic speech. SIGNIFICANCE STATEMENT To investigate the processing steps performed by the human brain to transform natural speech sound into meaningful language, we used models based on a hierarchical set of speech features to predict BOLD responses of individual voxels recorded in an fMRI experiment while subjects listened to

  19. Commencement Speech as a Hybrid Polydiscursive Practice

    Directory of Open Access Journals (Sweden)

    Светлана Викторовна Иванова

    2017-12-01

    Full Text Available Discourse and media communication researchers pay attention to the fact that popular discursive and communicative practices have a tendency to hybridization and convergence. Discourse which is understood as language in use is flexible. Consequently, it turns out that one and the same text can represent several types of discourses. A vivid example of this tendency is revealed in American commencement speech / commencement address / graduation speech. A commencement speech is a speech university graduates are addressed with which in compliance with the modern trend is delivered by outstanding media personalities (politicians, athletes, actors, etc.. The objective of this study is to define the specificity of the realization of polydiscursive practices within commencement speech. The research involves discursive, contextual, stylistic and definitive analyses. Methodologically the study is based on the discourse analysis theory, in particular the notion of a discursive practice as a verbalized social practice makes up the conceptual basis of the research. This research draws upon a hundred commencement speeches delivered by prominent representatives of American society since 1980s till now. In brief, commencement speech belongs to institutional discourse public speech embodies. Commencement speech institutional parameters are well represented in speeches delivered by people in power like American and university presidents. Nevertheless, as the results of the research indicate commencement speech institutional character is not its only feature. Conceptual information analysis enables to refer commencement speech to didactic discourse as it is aimed at teaching university graduates how to deal with challenges life is rich in. Discursive practices of personal discourse are also actively integrated into the commencement speech discourse. More than that, existential discursive practices also find their way into the discourse under study. Commencement

  20. Music training improves speech-in-noise perception: Longitudinal evidence from a community-based music program.

    Science.gov (United States)

    Slater, Jessica; Skoe, Erika; Strait, Dana L; O'Connell, Samantha; Thompson, Elaine; Kraus, Nina

    2015-09-15

    Music training may strengthen auditory skills that help children not only in musical performance but in everyday communication. Comparisons of musicians and non-musicians across the lifespan have provided some evidence for a "musician advantage" in understanding speech in noise, although reports have been mixed. Controlled longitudinal studies are essential to disentangle effects of training from pre-existing differences, and to determine how much music training is necessary to confer benefits. We followed a cohort of elementary school children for 2 years, assessing their ability to perceive speech in noise before and after musical training. After the initial assessment, participants were randomly assigned to one of two groups: one group began music training right away and completed 2 years of training, while the second group waited a year and then received 1 year of music training. Outcomes provide the first longitudinal evidence that speech-in-noise perception improves after 2 years of group music training. The children were enrolled in an established and successful community-based music program and followed the standard curriculum, therefore these findings provide an important link between laboratory-based research and real-world assessment of the impact of music training on everyday communication skills. Copyright © 2015 Elsevier B.V. All rights reserved.

  1. Integrated Phoneme Subspace Method for Speech Feature Extraction

    Directory of Open Access Journals (Sweden)

    Park Hyunsin

    2009-01-01

    Full Text Available Speech feature extraction has been a key focus in robust speech recognition research. In this work, we discuss data-driven linear feature transformations applied to feature vectors in the logarithmic mel-frequency filter bank domain. Transformations are based on principal component analysis (PCA, independent component analysis (ICA, and linear discriminant analysis (LDA. Furthermore, this paper introduces a new feature extraction technique that collects the correlation information among phoneme subspaces and reconstructs feature space for representing phonemic information efficiently. The proposed speech feature vector is generated by projecting an observed vector onto an integrated phoneme subspace (IPS based on PCA or ICA. The performance of the new feature was evaluated for isolated word speech recognition. The proposed method provided higher recognition accuracy than conventional methods in clean and reverberant environments.

  2. Robust Speech/Non-Speech Classification in Heterogeneous Multimedia Content

    NARCIS (Netherlands)

    Huijbregts, M.A.H.; de Jong, Franciska M.G.

    In this paper we present a speech/non-speech classification method that allows high quality classification without the need to know in advance what kinds of audible non-speech events are present in an audio recording and that does not require a single parameter to be tuned on in-domain data. Because

  3. Speech Recognition with the Advanced Combination Encoder and Transient Emphasis Spectral Maxima Strategies in Nucleus 24 Recipients

    Science.gov (United States)

    Holden, Laura K.; Vandali, Andrew E.; Skinner, Margaret W.; Fourakis, Marios S.; Holden, Timothy A.

    2005-01-01

    One of the difficulties faced by cochlear implant (CI) recipients is perception of low-intensity speech cues. A. E. Vandali (2001) has developed the transient emphasis spectral maxima (TESM) strategy to amplify short-duration, low-level sounds. The aim of the present study was to determine whether speech scores would be significantly higher with…

  4. Speech-Language Pathologists' Knowledge of Genetics: Perceived Confidence, Attitudes, Knowledge Acquisition and Practice-Based Variables

    Science.gov (United States)

    Tramontana, G. Michael; Blood, Ingrid M.; Blood, Gordon W.

    2013-01-01

    The purpose of this study was to determine (a) the general knowledge bases demonstrated by school-based speech-language pathologists (SLPs) in the area of genetics, (b) the confidence levels of SLPs in providing services to children and their families with genetic disorders/syndromes, (c) the attitudes of SLPs regarding genetics and communication…

  5. AGGLOMERATIVE CLUSTERING OF SOUND RECORD SPEECH SEGMENTS BASED ON BAYESIAN INFORMATION CRITERION

    Directory of Open Access Journals (Sweden)

    O. Yu. Kydashev

    2013-01-01

    Full Text Available This paper presents the detailed description of agglomerative clustering system implementation for speech segments based on Bayesian information criterion. Numerical experiment results with different acoustic features, as well as the full and diagonal covariance matrices application are given. The error rate DER equal to 6.4% for audio records of radio «Svoboda» was achieved by means of designed system.

  6. An evaluation of speech production in two boys with neurodevelopmental disorders who received communication intervention with a speech-generating device.

    Science.gov (United States)

    Roche, Laura; Sigafoos, Jeff; Lancioni, Giulio E; O'Reilly, Mark F; Schlosser, Ralf W; Stevens, Michelle; van der Meer, Larah; Achmadi, Donna; Kagohara, Debora; James, Ruth; Carnett, Amarie; Hodis, Flaviu; Green, Vanessa A; Sutherland, Dean; Lang, Russell; Rispoli, Mandy; Machalicek, Wendy; Marschik, Peter B

    2014-11-01

    Children with neurodevelopmental disorders often present with little or no speech. Augmentative and alternative communication (AAC) aims to promote functional communication using non-speech modes, but it might also influence natural speech production. To investigate this possibility, we provided AAC intervention to two boys with neurodevelopmental disorders and severe communication impairment. Intervention focused on teaching the boys to use a tablet computer-based speech-generating device (SGD) to request preferred stimuli. During SGD intervention, both boys began to utter relevant single words. In an effort to induce more speech, and investigate the relation between SGD availability and natural speech production, the SGD was removed during some requesting opportunities. With intervention, both participants learned to use the SGD to request preferred stimuli. After learning to use the SGD, both participants began to respond more frequently with natural speech when the SGD was removed. The results suggest that a rehabilitation program involving initial SGD intervention, followed by subsequent withdrawal of the SGD, might increase the frequency of natural speech production in some children with neurodevelopmental disorders. This effect could be an example of response generalization. Copyright © 2014 ISDN. Published by Elsevier Ltd. All rights reserved.

  7. Speech networks at rest and in action: interactions between functional brain networks controlling speech production

    Science.gov (United States)

    Fuertinger, Stefan

    2015-01-01

    Speech production is one of the most complex human behaviors. Although brain activation during speaking has been well investigated, our understanding of interactions between the brain regions and neural networks remains scarce. We combined seed-based interregional correlation analysis with graph theoretical analysis of functional MRI data during the resting state and sentence production in healthy subjects to investigate the interface and topology of functional networks originating from the key brain regions controlling speech, i.e., the laryngeal/orofacial motor cortex, inferior frontal and superior temporal gyri, supplementary motor area, cingulate cortex, putamen, and thalamus. During both resting and speaking, the interactions between these networks were bilaterally distributed and centered on the sensorimotor brain regions. However, speech production preferentially recruited the inferior parietal lobule (IPL) and cerebellum into the large-scale network, suggesting the importance of these regions in facilitation of the transition from the resting state to speaking. Furthermore, the cerebellum (lobule VI) was the most prominent region showing functional influences on speech-network integration and segregation. Although networks were bilaterally distributed, interregional connectivity during speaking was stronger in the left vs. right hemisphere, which may have underlined a more homogeneous overlap between the examined networks in the left hemisphere. Among these, the laryngeal motor cortex (LMC) established a core network that fully overlapped with all other speech-related networks, determining the extent of network interactions. Our data demonstrate complex interactions of large-scale brain networks controlling speech production and point to the critical role of the LMC, IPL, and cerebellum in the formation of speech production network. PMID:25673742

  8. Prediction and constraint in audiovisual speech perception

    Science.gov (United States)

    Peelle, Jonathan E.; Sommers, Mitchell S.

    2015-01-01

    During face-to-face conversational speech listeners must efficiently process a rapid and complex stream of multisensory information. Visual speech can serve as a critical complement to auditory information because it provides cues to both the timing of the incoming acoustic signal (the amplitude envelope, influencing attention and perceptual sensitivity) and its content (place and manner of articulation, constraining lexical selection). Here we review behavioral and neurophysiological evidence regarding listeners' use of visual speech information. Multisensory integration of audiovisual speech cues improves recognition accuracy, particularly for speech in noise. Even when speech is intelligible based solely on auditory information, adding visual information may reduce the cognitive demands placed on listeners through increasing precision of prediction. Electrophysiological studies demonstrate oscillatory cortical entrainment to speech in auditory cortex is enhanced when visual speech is present, increasing sensitivity to important acoustic cues. Neuroimaging studies also suggest increased activity in auditory cortex when congruent visual information is available, but additionally emphasize the involvement of heteromodal regions of posterior superior temporal sulcus as playing a role in integrative processing. We interpret these findings in a framework of temporally-focused lexical competition in which visual speech information affects auditory processing to increase sensitivity to auditory information through an early integration mechanism, and a late integration stage that incorporates specific information about a speaker's articulators to constrain the number of possible candidates in a spoken utterance. Ultimately it is words compatible with both auditory and visual information that most strongly determine successful speech perception during everyday listening. Thus, audiovisual speech perception is accomplished through multiple stages of integration, supported

  9. Prediction and constraint in audiovisual speech perception.

    Science.gov (United States)

    Peelle, Jonathan E; Sommers, Mitchell S

    2015-07-01

    During face-to-face conversational speech listeners must efficiently process a rapid and complex stream of multisensory information. Visual speech can serve as a critical complement to auditory information because it provides cues to both the timing of the incoming acoustic signal (the amplitude envelope, influencing attention and perceptual sensitivity) and its content (place and manner of articulation, constraining lexical selection). Here we review behavioral and neurophysiological evidence regarding listeners' use of visual speech information. Multisensory integration of audiovisual speech cues improves recognition accuracy, particularly for speech in noise. Even when speech is intelligible based solely on auditory information, adding visual information may reduce the cognitive demands placed on listeners through increasing the precision of prediction. Electrophysiological studies demonstrate that oscillatory cortical entrainment to speech in auditory cortex is enhanced when visual speech is present, increasing sensitivity to important acoustic cues. Neuroimaging studies also suggest increased activity in auditory cortex when congruent visual information is available, but additionally emphasize the involvement of heteromodal regions of posterior superior temporal sulcus as playing a role in integrative processing. We interpret these findings in a framework of temporally-focused lexical competition in which visual speech information affects auditory processing to increase sensitivity to acoustic information through an early integration mechanism, and a late integration stage that incorporates specific information about a speaker's articulators to constrain the number of possible candidates in a spoken utterance. Ultimately it is words compatible with both auditory and visual information that most strongly determine successful speech perception during everyday listening. Thus, audiovisual speech perception is accomplished through multiple stages of integration

  10. Speech disorders - children

    Science.gov (United States)

    ... disorder; Voice disorders; Vocal disorders; Disfluency; Communication disorder - speech disorder; Speech disorder - stuttering ... evaluation tools that can help identify and diagnose speech disorders: Denver Articulation Screening Examination Goldman-Fristoe Test of ...

  11. Šalutiniai pietų aukštaičių kirčiai

    Directory of Open Access Journals (Sweden)

    Asta Leskauskaitė

    2011-11-01

    Full Text Available SECONDARY STRESS IN THE SOUTHERN AUKŠTAITISHDIALECT OF LITHUANIANSummaryThis paper sums up the empirical observations of the rhythmic, distinctive and non-distinctive secondary stresses in Southern Aukštaitish. The analysis has revealed the prevalence of the rhythmic stress and a mere optional use of the two other stresses. The occurrence and place of the stresses are conditioned by various factors: the syllabic structure of a word, the manner of speech (its tempo, the character of intonation, etc. The proof of the existence of the distinctive morpheme stress is based on the use of instrumental and auditory methods.

  12. Neurophysiology of speech differences in childhood apraxia of speech.

    Science.gov (United States)

    Preston, Jonathan L; Molfese, Peter J; Gumkowski, Nina; Sorcinelli, Andrea; Harwood, Vanessa; Irwin, Julia R; Landi, Nicole

    2014-01-01

    Event-related potentials (ERPs) were recorded during a picture naming task of simple and complex words in children with typical speech and with childhood apraxia of speech (CAS). Results reveal reduced amplitude prior to speaking complex (multisyllabic) words relative to simple (monosyllabic) words for the CAS group over the right hemisphere during a time window thought to reflect phonological encoding of word forms. Group differences were also observed prior to production of spoken tokens regardless of word complexity during a time window just prior to speech onset (thought to reflect motor planning/programming). Results suggest differences in pre-speech neurolinguistic processes.

  13. Application of artifical intelligence principles to the analysis of "crazy" speech.

    Science.gov (United States)

    Garfield, D A; Rapp, C

    1994-04-01

    Artificial intelligence computer simulation methods can be used to investigate psychotic or "crazy" speech. Here, symbolic reasoning algorithms establish semantic networks that schematize speech. These semantic networks consist of two main structures: case frames and object taxonomies. Node-based reasoning rules apply to object taxonomies and pathway-based reasoning rules apply to case frames. Normal listeners may recognize speech as "crazy talk" based on violations of node- and pathway-based reasoning rules. In this article, three separate segments of schizophrenic speech illustrate violations of these rules. This artificial intelligence approach is compared and contrasted with other neurolinguistic approaches and is discussed as a conceptual link between neurobiological and psychodynamic understandings of psychopathology.

  14. Speech Segregation based on Binary Classification

    Science.gov (United States)

    2016-07-15

    speaker is roughly 5 minutes long. Large chunks of silence in the excerpt are removed . Then we divide the recording into 5 second pieces. Two pieces...shown in Fig. 5(c), the smearing effect caused by reverberation is largely removed or attenuated, and the boundaries between voiced and unvoiced...Audience is a provider of audio and noise suppression processors for mobile equipment, including Android phones. The PI advises the company on speech

  15. Prevalence of speech and language disorders in children in northern Kosovo and Metohija

    Directory of Open Access Journals (Sweden)

    Nešić Blagoje V.

    2011-01-01

    Full Text Available On the territory of the northern part of Kosovo and Metohija (Kosovo municipalities Mitrovica, Zvecan, Leposavic and Zubin Potok a study is conducted in primary schools in order to determine the presence of speech-language disorders in children of early school age. Data were collected from the teachers of the third and fourth grades of primary schools in these municipalities (n = 36, which include a total number of 641 student. The results show that the number of children with speech and language disorders represented in the different municipalities of the research vary (the largest is in Leposavic, the smallest is in Zvecan, then 3/4 the total number of children with speech and language disorders are boys. It is also found that the speech-language disorders usually appear from the very beginning of schooling and that the examined teachers recognize 12 types of speech-language disorders in their students. Teachers recognize dyslexia as the most common speech-language disorder, while dysphasia and distortion are the least common, in the opinion of the teachers. The results show that children are generally accepted by their peers, but only during schooling; then, there is a difference in school success between children with speech and language disorders and children without any speech-language disorders. It also found that the teachers' work is generally not affected by the children with speech and language disorders, and that there is generally an intensive cooperation between teachers and parents of children with speech and language disorders. The research and the results on prevalence of speech-language disorders in children in northern Kosovo and Metohija can be considered as an important guidelines in future work.

  16. A Case Study of a Collaborative Speech-Language Pathologist

    Science.gov (United States)

    Ritzman, Mitzi J.; Sanger, Dixie; Coufal, Kathy L.

    2006-01-01

    This study explored how a school-based speech-language pathologist implemented a classroom-based service delivery model that focused on collaborative practices in classroom settings. The study used ethnographic observations and interviews with 1 speech-language pathologist to provide insights into how she implemented collaborative consultation and…

  17. New Ways in Teaching Connected Speech. New Ways Series

    Science.gov (United States)

    Brown, James Dean, Ed.

    2012-01-01

    Connected speech is based on a set of rules used to modify pronunciations so that words connect and flow more smoothly in natural speech (hafta versus have to). Native speakers of English tend to feel that connected speech is friendlier, more natural, more sympathetic, and more personal. Is there any reason why learners of English would prefer to…

  18. The politeness prosody of the Javanese directive speech

    Directory of Open Access Journals (Sweden)

    F.X. Rahyono

    2009-10-01

    Full Text Available This experimental phonetic research deals with the prosodies of directive speech in Javanese. The research procedures were: (1 speech production, (2 acoustic analysis, and (3 perception test. The data investigated are three directive utterances, in the form of statements, commands, and questions. The data were obtained by recording dialogues that present polite as well as impolite speech. Three acoustic experiments were conducted for statements, commands, and questions in directive speech: (1 modifications of duration, (2 modifications of contour, and (3 modifications of fundamental frequency. The result of the subsequent perception tests to 90 stimuli with 24 subjects were analysed statistically with ANOVA (Analysis of Variant. Based on this statistic analysis, the prosodic characteristics of polite and impolite speech were identified.

  19. Multi-thread Parallel Speech Recognition for Mobile Applications

    Directory of Open Access Journals (Sweden)

    LOJKA Martin

    2014-05-01

    Full Text Available In this paper, the server based solution of the multi-thread large vocabulary automatic speech recognition engine is described along with the Android OS and HTML5 practical application examples. The basic idea was to bring speech recognition available for full variety of applications for computers and especially for mobile devices. The speech recognition engine should be independent of commercial products and services (where the dictionary could not be modified. Using of third-party services could be also a security and privacy problem in specific applications, when the unsecured audio data could not be sent to uncontrolled environments (voice data transferred to servers around the globe. Using our experience with speech recognition applications, we have been able to construct a multi-thread speech recognition serverbased solution designed for simple applications interface (API to speech recognition engine modified to specific needs of particular application.

  20. Enhancement and Noise Statistics Estimation for Non-Stationary Voiced Speech

    DEFF Research Database (Denmark)

    Nørholm, Sidsel Marie; Jensen, Jesper Rindom; Christensen, Mads Græsbøll

    2016-01-01

    In this paper, single channel speech enhancement in the time domain is considered. We address the problem of modelling non-stationary speech by describing the voiced speech parts by a harmonic linear chirp model instead of using the traditional harmonic model. This means that the speech signal...... through simulations on synthetic and speech signals, that the chirp versions of the filters perform better than their harmonic counterparts in terms of output signal-to-noise ratio (SNR) and signal reduction factor. For synthetic signals, the output SNR for the harmonic chirp APES based filter...... is increased 3 dB compared to the harmonic APES based filter at an input SNR of 10 dB, and at the same time the signal reduction factor is decreased. For speech signals, the increase is 1.5 dB along with a decrease in the signal reduction factor of 0.7. As an implicit part of the APES filter, a noise...

  1. Gender and Speech in a Disney Princess Movie

    Directory of Open Access Journals (Sweden)

    Azmi N.J.

    2016-11-01

    Full Text Available One of the latest Disney princess movies is Frozen which was released in 2013. Female characters in Frozen differ from the female characters in previous Disney movies, such as The Little Mermaid and Tangled. In comparison, female characters in Frozen are portrayed as having more heroic values and norms, which makes it interesting to examine their speech characteristics. Do they use typical female speech despite having more heroic characteristics? This paper aims to provide insights into the female speech characteristics in this movie based on Lakoff’s (1975 model of female speech.  Data analysis shows that female and male characters in the movie used almost equal number of female speech elements in their dialogues. Interestingly, although female characters in the movie do not behave stereotypically, their speech still contain the elements of female speech, such as the use empty adjectives, questions, hedges and intensifier. This paper argues that the blurring of boundaries between male and female speech characteristics in this movie is an attempt to break gender stereotyping by showing that female characters share similar characteristics with heroic male characters thus they should not be seen as inferior to the male  characters.

  2. Speech and neurology-chemical impairment correlates

    Science.gov (United States)

    Hayre, Harb S.

    2002-05-01

    Speech correlates of alcohol/drug impairment and its neurological basis is presented with suggestion for further research in impairment from poly drug/medicine/inhalent/chew use/abuse, and prediagnosis of many neuro- and endocrin-related disorders. Nerve cells all over the body detect chemical entry by smoking, injection, drinking, chewing, or skin absorption, and transmit neurosignals to their corresponding cerebral subsystems, which in turn affect speech centers-Broca's and Wernick's area, and motor cortex. For instance, gustatory cells in the mouth, cranial and spinal nerve cells in the skin, and cilia/olfactory neurons in the nose are the intake sensing nerve cells. Alcohol depression, and brain cell damage were detected from telephone speech using IMPAIRLYZER-TM, and the results of these studies were presented at 1996 ASA meeting in Indianapolis, and 2001 German Acoustical Society-DEGA conference in Hamburg, Germany respectively. Speech based chemical Impairment measure results were presented at the 2001 meeting of ASA in Chicago. New data on neurotolerance based chemical impairment for alcohol, drugs, and medicine shall be presented, and shown not to fully support NIDA-SAMSHA drug and alcohol threshold used in drug testing domain.

  3. Analysis of Documentation Speed Using Web-Based Medical Speech Recognition Technology: Randomized Controlled Trial.

    Science.gov (United States)

    Vogel, Markus; Kaisers, Wolfgang; Wassmuth, Ralf; Mayatepek, Ertan

    2015-11-03

    Clinical documentation has undergone a change due to the usage of electronic health records. The core element is to capture clinical findings and document therapy electronically. Health care personnel spend a significant portion of their time on the computer. Alternatives to self-typing, such as speech recognition, are currently believed to increase documentation efficiency and quality, as well as satisfaction of health professionals while accomplishing clinical documentation, but few studies in this area have been published to date. This study describes the effects of using a Web-based medical speech recognition system for clinical documentation in a university hospital on (1) documentation speed, (2) document length, and (3) physician satisfaction. Reports of 28 physicians were randomized to be created with (intervention) or without (control) the assistance of a Web-based system of medical automatic speech recognition (ASR) in the German language. The documentation was entered into a browser's text area and the time to complete the documentation including all necessary corrections, correction effort, number of characters, and mood of participant were stored in a database. The underlying time comprised text entering, text correction, and finalization of the documentation event. Participants self-assessed their moods on a scale of 1-3 (1=good, 2=moderate, 3=bad). Statistical analysis was done using permutation tests. The number of clinical reports eligible for further analysis stood at 1455. Out of 1455 reports, 718 (49.35%) were assisted by ASR and 737 (50.65%) were not assisted by ASR. Average documentation speed without ASR was 173 (SD 101) characters per minute, while it was 217 (SD 120) characters per minute using ASR. The overall increase in documentation speed through Web-based ASR assistance was 26% (P=.04). Participants documented an average of 356 (SD 388) characters per report when not assisted by ASR and 649 (SD 561) characters per report when assisted

  4. Robust Speaker Authentication Based on Combined Speech and Voiceprint Recognition

    Science.gov (United States)

    Malcangi, Mario

    2009-08-01

    Personal authentication is becoming increasingly important in many applications that have to protect proprietary data. Passwords and personal identification numbers (PINs) prove not to be robust enough to ensure that unauthorized people do not use them. Biometric authentication technology may offer a secure, convenient, accurate solution but sometimes fails due to its intrinsically fuzzy nature. This research aims to demonstrate that combining two basic speech processing methods, voiceprint identification and speech recognition, can provide a very high degree of robustness, especially if fuzzy decision logic is used.

  5. Attention mechanisms and the mosaic evolution of speech

    Directory of Open Access Journals (Sweden)

    Pedro Tiago Martins

    2014-12-01

    Full Text Available There is still no categorical answer for why humans, and no other species, have speech, or why speech is the way it is. Several purely anatomical arguments have been put forward, but they have been shown to be false, biologically implausible, or of limited scope. This perspective paper supports the idea that evolutionary theories of speech could benefit from a focus on the cognitive mechanisms that make speech possible, for which antecedents in evolutionary history and brain correlates can be found. This type of approach is part of a very recent, but rapidly growing tradition, which has provided crucial insights on the nature of human speech by focusing on the biological bases of vocal learning. Here, we call attention to what might be an important ingredient for speech. We contend that a general mechanism of attention, which manifests itself not only in visual but also auditory (and possibly other modalities, might be one of the key pieces of human speech, in addition to the mechanisms underlying vocal learning, and the pairing of facial gestures with vocalic units.

  6. Speech networks at rest and in action: interactions between functional brain networks controlling speech production.

    Science.gov (United States)

    Simonyan, Kristina; Fuertinger, Stefan

    2015-04-01

    Speech production is one of the most complex human behaviors. Although brain activation during speaking has been well investigated, our understanding of interactions between the brain regions and neural networks remains scarce. We combined seed-based interregional correlation analysis with graph theoretical analysis of functional MRI data during the resting state and sentence production in healthy subjects to investigate the interface and topology of functional networks originating from the key brain regions controlling speech, i.e., the laryngeal/orofacial motor cortex, inferior frontal and superior temporal gyri, supplementary motor area, cingulate cortex, putamen, and thalamus. During both resting and speaking, the interactions between these networks were bilaterally distributed and centered on the sensorimotor brain regions. However, speech production preferentially recruited the inferior parietal lobule (IPL) and cerebellum into the large-scale network, suggesting the importance of these regions in facilitation of the transition from the resting state to speaking. Furthermore, the cerebellum (lobule VI) was the most prominent region showing functional influences on speech-network integration and segregation. Although networks were bilaterally distributed, interregional connectivity during speaking was stronger in the left vs. right hemisphere, which may have underlined a more homogeneous overlap between the examined networks in the left hemisphere. Among these, the laryngeal motor cortex (LMC) established a core network that fully overlapped with all other speech-related networks, determining the extent of network interactions. Our data demonstrate complex interactions of large-scale brain networks controlling speech production and point to the critical role of the LMC, IPL, and cerebellum in the formation of speech production network. Copyright © 2015 the American Physiological Society.

  7. Listeners Experience Linguistic Masking Release in Noise-Vocoded Speech-in-Speech Recognition

    Science.gov (United States)

    Viswanathan, Navin; Kokkinakis, Kostas; Williams, Brittany T.

    2018-01-01

    Purpose: The purpose of this study was to evaluate whether listeners with normal hearing perceiving noise-vocoded speech-in-speech demonstrate better intelligibility of target speech when the background speech was mismatched in language (linguistic release from masking [LRM]) and/or location (spatial release from masking [SRM]) relative to the…

  8. A retrospective study of long-term treatment outcomes for reduced vocal intensity in hypokinetic dysarthria

    Directory of Open Access Journals (Sweden)

    Christopher R. Watts

    2016-02-01

    Full Text Available Abstract Background Reduced vocal intensity is a core impairment of hypokinetic dysarthria in Parkinson’s disease (PD. Speech treatments have been developed to rehabilitate the vocal subsystems underlying this impairment. Intensive treatment programs requiring high-intensity voice and speech exercises with clinician-guided prompting and feedback have been established as effective for improving vocal function. Less is known, however, regarding long-term outcomes of clinical benefit in speakers with PD who receive these treatments. Methods A retrospective cohort design was utilized. Data from 78 patient files across a three year period were analyzed. All patients received a structured, intensive program of voice therapy focusing on speaking intent and loudness. The dependent variable for all analyses was vocal intensity in decibels (dBSPL. Vocal intensity during sustained vowel production, reading, and novel conversational speech was compared at pre-treatment, post-treatment, six month follow-up, and twelve month follow-up periods. Results Statistically significant increases in vocal intensity were found at post-treatment, 6 months, and 12 month follow-up periods with intensity gains ranging from 5 to 17 dB depending on speaking condition and measurement period. Significant treatment effects were found in all three speaking conditions. Effect sizes for all outcome measures were large, suggesting a strong degree of practical significance. Conclusions Significant increases in vocal intensity measured at 6 and 12 moth follow-up periods suggested that the sample of patients maintained treatment benefit for up to a year. These findings are supported by outcome studies reporting treatment outcomes within a few months post-treatment, in addition to prior studies that have reported long-term outcome results. The positive treatment outcomes experienced by the PD cohort in this study are consistent with treatment responses subsequent to other treatment

  9. Hints About Some Baseful but Indispensable Elements in Speech Recognition and Reconstruction

    Directory of Open Access Journals (Sweden)

    Mihaela Costin

    2002-07-01

    Full Text Available The cochlear implant (CI is a device used to reconstruct the hearing capabilities of a person diagnosed with total cophosis. This impairment may occur after some accidents, chemotherapy etc., the person still having an intact hearing nerve. The cochlear implant has two parts: a programmable, external part, the Digital Signal Processing (DSP device which process and transform the speech signal, and another surgically implanted part, with a certain number of electrodes (depending on brand used to stimulate the hearing nerve. The speech signal is fully processed in the DSP external device resulting the ``coded'' information on speech. This is modulated with the support of the fundamental frequency F0 and the energy impulses are inductively sent to the hearing nerve. The correct detection of this frequency is very important, determining the manner of hearing and making the difference between a "computer'' voice and a natural one. The results are applicable not only in the medical domain, but also in the Romanian speech synthesis.

  10. Instantaneous Fundamental Frequency Estimation with Optimal Segmentation for Nonstationary Voiced Speech

    DEFF Research Database (Denmark)

    Nørholm, Sidsel Marie; Jensen, Jesper Rindom; Christensen, Mads Græsbøll

    2016-01-01

    In speech processing, the speech is often considered stationary within segments of 20–30 ms even though it is well known not to be true. In this paper, we take the non-stationarity of voiced speech into account by using a linear chirp model to describe the speech signal. We propose a maximum...... likelihood estimator of the fundamental frequency and chirp rate of this model, and show that it reaches the Cramer-Rao bound. Since the speech varies over time, a fixed segment length is not optimal, and we propose to make a segmentation of the signal based on the maximum a posteriori (MAP) criterion. Using...... of the chirp model than the harmonic model to the speech signal. The methods are based on an assumption of white Gaussian noise, and, therefore, two prewhitening filters are also proposed....

  11. Treating speech subsystems in childhood apraxia of speech with tactual input: the PROMPT approach.

    Science.gov (United States)

    Dale, Philip S; Hayden, Deborah A

    2013-11-01

    Prompts for Restructuring Oral Muscular Phonetic Targets (PROMPT; Hayden, 2004; Hayden, Eigen, Walker, & Olsen, 2010)-a treatment approach for the improvement of speech sound disorders in children-uses tactile-kinesthetic- proprioceptive (TKP) cues to support and shape movements of the oral articulators. No research to date has systematically examined the efficacy of PROMPT for children with childhood apraxia of speech (CAS). Four children (ages 3;6 [years;months] to 4;8), all meeting the American Speech-Language-Hearing Association (2007) criteria for CAS, were treated using PROMPT. All children received 8 weeks of 2 × per week treatment, including at least 4 weeks of full PROMPT treatment that included TKP cues. During the first 4 weeks, 2 of the 4 children received treatment that included all PROMPT components except TKP cues. This design permitted both between-subjects and within-subjects comparisons to evaluate the effect of TKP cues. Gains in treatment were measured by standardized tests and by criterion-referenced measures based on the production of untreated probe words, reflecting change in speech movements and auditory perceptual accuracy. All 4 children made significant gains during treatment, but measures of motor speech control and untreated word probes provided evidence for more gain when TKP cues were included. PROMPT as a whole appears to be effective for treating children with CAS, and the inclusion of TKP cues appears to facilitate greater effect.

  12. The Functional Connectome of Speech Control.

    Directory of Open Access Journals (Sweden)

    Stefan Fuertinger

    2015-07-01

    Full Text Available In the past few years, several studies have been directed to understanding the complexity of functional interactions between different brain regions during various human behaviors. Among these, neuroimaging research installed the notion that speech and language require an orchestration of brain regions for comprehension, planning, and integration of a heard sound with a spoken word. However, these studies have been largely limited to mapping the neural correlates of separate speech elements and examining distinct cortical or subcortical circuits involved in different aspects of speech control. As a result, the complexity of the brain network machinery controlling speech and language remained largely unknown. Using graph theoretical analysis of functional MRI (fMRI data in healthy subjects, we quantified the large-scale speech network topology by constructing functional brain networks of increasing hierarchy from the resting state to motor output of meaningless syllables to complex production of real-life speech as well as compared to non-speech-related sequential finger tapping and pure tone discrimination networks. We identified a segregated network of highly connected local neural communities (hubs in the primary sensorimotor and parietal regions, which formed a commonly shared core hub network across the examined conditions, with the left area 4p playing an important role in speech network organization. These sensorimotor core hubs exhibited features of flexible hubs based on their participation in several functional domains across different networks and ability to adaptively switch long-range functional connectivity depending on task content, resulting in a distinct community structure of each examined network. Specifically, compared to other tasks, speech production was characterized by the formation of six distinct neural communities with specialized recruitment of the prefrontal cortex, insula, putamen, and thalamus, which collectively

  13. Speech Perception and Short-Term Memory Deficits in Persistent Developmental Speech Disorder

    Science.gov (United States)

    Kenney, Mary Kay; Barac-Cikoja, Dragana; Finnegan, Kimberly; Jeffries, Neal; Ludlow, Christy L.

    2006-01-01

    Children with developmental speech disorders may have additional deficits in speech perception and/or short-term memory. To determine whether these are only transient developmental delays that can accompany the disorder in childhood or persist as part of the speech disorder, adults with a persistent familial speech disorder were tested on speech…

  14. Hemispheric asymmetries in speech perception: sense, nonsense and modulations.

    Directory of Open Access Journals (Sweden)

    Stuart Rosen

    Full Text Available The well-established left hemisphere specialisation for language processing has long been claimed to be based on a low-level auditory specialization for specific acoustic features in speech, particularly regarding 'rapid temporal processing'.A novel analysis/synthesis technique was used to construct a variety of sounds based on simple sentences which could be manipulated in spectro-temporal complexity, and whether they were intelligible or not. All sounds consisted of two noise-excited spectral prominences (based on the lower two formants in the original speech which could be static or varying in frequency and/or amplitude independently. Dynamically varying both acoustic features based on the same sentence led to intelligible speech but when either or both acoustic features were static, the stimuli were not intelligible. Using the frequency dynamics from one sentence with the amplitude dynamics of another led to unintelligible sounds of comparable spectro-temporal complexity to the intelligible ones. Positron emission tomography (PET was used to compare which brain regions were active when participants listened to the different sounds.Neural activity to spectral and amplitude modulations sufficient to support speech intelligibility (without actually being intelligible was seen bilaterally, with a right temporal lobe dominance. A left dominant response was seen only to intelligible sounds. It thus appears that the left hemisphere specialisation for speech is based on the linguistic properties of utterances, not on particular acoustic features.

  15. Segmentation, Diarization and Speech Transcription: Surprise Data Unraveled

    NARCIS (Netherlands)

    Huijbregts, M.A.H.

    2008-01-01

    In this thesis, research on large vocabulary continuous speech recognition for unknown audio conditions is presented. For automatic speech recognition systems based on statistical methods, it is important that the conditions of the audio used for training the statistical models match the conditions

  16. DFT-Domain Based Single-Microphone Noise Reduction for Speech Enhancement

    DEFF Research Database (Denmark)

    C. Hendriks, Richard; Gerkmann, Timo; Jensen, Jesper

    As speech processing devices like mobile phones, voice controlled devices, and hearing aids have increased in popularity, people expect them to work anywhere and at any time without user intervention. However, the presence of acoustical disturbances limits the use of these applications, degrades...... their performance, or causes the user difficulties in understanding the conversation or appreciating the device. A common way to reduce the effects of such disturbances is through the use of single-microphone noise reduction algorithms for speech enhancement. The field of single-microphone noise reduction...

  17. Speech and Language Delay

    Science.gov (United States)

    ... OTC Relief for Diarrhea Home Diseases and Conditions Speech and Language Delay Condition Speech and Language Delay Share Print Table of Contents1. ... Treatment6. Everyday Life7. Questions8. Resources What is a speech and language delay? A speech and language delay ...

  18. The Beginnings of Danish Speech Perception

    DEFF Research Database (Denmark)

    Østerbye, Torkil

    , in the light of the rich and complex Danish sound system. The first two studies report on native adults’ perception of Danish speech sounds in quiet and noise. The third study examined the development of language-specific perception in native Danish infants at 6, 9 and 12 months of age. The book points......Little is known about the perception of speech sounds by native Danish listeners. However, the Danish sound system differs in several interesting ways from the sound systems of other languages. For instance, Danish is characterized, among other features, by a rich vowel inventory and by different...... reductions of speech sounds evident in the pronunciation of the language. This book (originally a PhD thesis) consists of three studies based on the results of two experiments. The experiments were designed to provide knowledge of the perception of Danish speech sounds by Danish adults and infants...

  19. Plasticity in the Human Speech Motor System Drives Changes in Speech Perception

    Science.gov (United States)

    Lametti, Daniel R.; Rochet-Capellan, Amélie; Neufeld, Emily; Shiller, Douglas M.

    2014-01-01

    Recent studies of human speech motor learning suggest that learning is accompanied by changes in auditory perception. But what drives the perceptual change? Is it a consequence of changes in the motor system? Or is it a result of sensory inflow during learning? Here, subjects participated in a speech motor-learning task involving adaptation to altered auditory feedback and they were subsequently tested for perceptual change. In two separate experiments, involving two different auditory perceptual continua, we show that changes in the speech motor system that accompany learning drive changes in auditory speech perception. Specifically, we obtained changes in speech perception when adaptation to altered auditory feedback led to speech production that fell into the phonetic range of the speech perceptual tests. However, a similar change in perception was not observed when the auditory feedback that subjects' received during learning fell into the phonetic range of the perceptual tests. This indicates that the central motor outflow associated with vocal sensorimotor adaptation drives changes to the perceptual classification of speech sounds. PMID:25080594

  20. Charisma in business speeches -- A contrastive acoustic-prosodic analysis of Steve Jobs and Mark Zuckerberg

    OpenAIRE

    Niebuhr, Oliver; Brem, Alexander; Novák-Tót, Eszter; Voße, Jana

    2016-01-01

    Charisma is a key component of spoken language interaction; and it is probably for this reason that charismatic speech has been the subject of intensive research for centuries. However, what is still largely missing is a quantitative and objective line of research that, firstly, involves analyses of the acoustic-prosodic signal, secondly, focuses on business speeches like product presentations, and, thirdly, in doing so, advances the still fairly fragmentary evidence on the prosodic correlate...

  1. Telephone based speech interfaces in the developing world, from the perspective of human-human communication

    CSIR Research Space (South Africa)

    Naidoo, S

    2005-07-01

    Full Text Available recently, before computers systems were able to synthesize or recognize speech, speech was a capability unique to humans. The human brain has developed to differentiate between human speech and other audio occurrences. Therefore, the slowly- evolving... human brain reacts in certain ways to voice stimuli, and has certain expectations regarding communication by voice. Nass affirms that the human brain operates using the same mechanisms when interacting with speech interfaces as when conversing...

  2. Educational Applications for Blind and Partially Sighted Pupils Based on Speech Technologies for Serbian

    Science.gov (United States)

    Lučić, Branko; Ostrogonac, Stevan; Vujnović Sedlar, Nataša; Sečujski, Milan

    2015-01-01

    The inclusion of persons with disabilities has always represented an important issue. Advancements within the field of computer science have enabled the development of different types of aids, which have significantly improved the quality of life of the disabled. However, for some disabilities, such as visual impairment, the purpose of these aids is to establish an alternative communication channel and thus overcome the user's disability. Speech technologies play the crucial role in this process. This paper presents the ongoing efforts to create a set of educational applications based on speech technologies for Serbian for the early stages of education of blind and partially sighted children. Two educational applications dealing with memory exercises and comprehension of geometrical shapes are presented, along with the initial tests results obtained from research including visually impaired pupils. PMID:26171422

  3. Educational Applications for Blind and Partially Sighted Pupils Based on Speech Technologies for Serbian.

    Science.gov (United States)

    Lučić, Branko; Ostrogonac, Stevan; Vujnović Sedlar, Nataša; Sečujski, Milan

    2015-01-01

    The inclusion of persons with disabilities has always represented an important issue. Advancements within the field of computer science have enabled the development of different types of aids, which have significantly improved the quality of life of the disabled. However, for some disabilities, such as visual impairment, the purpose of these aids is to establish an alternative communication channel and thus overcome the user's disability. Speech technologies play the crucial role in this process. This paper presents the ongoing efforts to create a set of educational applications based on speech technologies for Serbian for the early stages of education of blind and partially sighted children. Two educational applications dealing with memory exercises and comprehension of geometrical shapes are presented, along with the initial tests results obtained from research including visually impaired pupils.

  4. Deep Complementary Bottleneck Features for Visual Speech Recognition

    NARCIS (Netherlands)

    Petridis, Stavros; Pantic, Maja

    Deep bottleneck features (DBNFs) have been used successfully in the past for acoustic speech recognition from audio. However, research on extracting DBNFs for visual speech recognition is very limited. In this work, we present an approach to extract deep bottleneck visual features based on deep

  5. Assessment of speech intelligibility in background noise and reverberation

    DEFF Research Database (Denmark)

    Nielsen, Jens Bo

    Reliable methods for assessing speech intelligibility are essential within hearing research, audiology, and related areas. Such methods can be used for obtaining a better understanding of how speech intelligibility is affected by, e.g., various environmental factors or different types of hearing...... impairment. In this thesis, two sentence-based tests for speech intelligibility in Danish were developed. The first test is the Conversational Language Understanding Evaluation (CLUE), which is based on the principles of the original American-English Hearing in Noise Test (HINT). The second test...... is a modified version of CLUE where the speech material and the scoring rules have been reconsidered. An extensive validation of the modified test was conducted with both normal-hearing and hearing-impaired listeners. The validation showed that the test produces reliable results for both groups of listeners...

  6. Common cues to emotion in the dynamic facial expressions of speech and song.

    Science.gov (United States)

    Livingstone, Steven R; Thompson, William F; Wanderley, Marcelo M; Palmer, Caroline

    2015-01-01

    Speech and song are universal forms of vocalization that may share aspects of emotional expression. Research has focused on parallels in acoustic features, overlooking facial cues to emotion. In three experiments, we compared moving facial expressions in speech and song. In Experiment 1, vocalists spoke and sang statements each with five emotions. Vocalists exhibited emotion-dependent movements of the eyebrows and lip corners that transcended speech-song differences. Vocalists' jaw movements were coupled to their acoustic intensity, exhibiting differences across emotion and speech-song. Vocalists' emotional movements extended beyond vocal sound to include large sustained expressions, suggesting a communicative function. In Experiment 2, viewers judged silent videos of vocalists' facial expressions prior to, during, and following vocalization. Emotional intentions were identified accurately for movements during and after vocalization, suggesting that these movements support the acoustic message. Experiment 3 compared emotional identification in voice-only, face-only, and face-and-voice recordings. Emotion judgements for voice-only singing were poorly identified, yet were accurate for all other conditions, confirming that facial expressions conveyed emotion more accurately than the voice in song, yet were equivalent in speech. Collectively, these findings highlight broad commonalities in the facial cues to emotion in speech and song, yet highlight differences in perception and acoustic-motor production.

  7. Speech-specific audiovisual perception affects identification but not detection of speech

    DEFF Research Database (Denmark)

    Eskelund, Kasper; Andersen, Tobias

    Speech perception is audiovisual as evidenced by the McGurk effect in which watching incongruent articulatory mouth movements can change the phonetic auditory speech percept. This type of audiovisual integration may be specific to speech or be applied to all stimuli in general. To investigate...... of audiovisual integration specific to speech perception. However, the results of Tuomainen et al. might have been influenced by another effect. When observers were naïve, they had little motivation to look at the face. When informed, they knew that the face was relevant for the task and this could increase...... visual detection task. In our first experiment, observers presented with congruent and incongruent audiovisual sine-wave speech stimuli did only show a McGurk effect when informed of the speech nature of the stimulus. Performance on the secondary visual task was very good, thus supporting the finding...

  8. Speech-Language Therapy (For Parents)

    Science.gov (United States)

    ... Staying Safe Videos for Educators Search English Español Speech-Language Therapy KidsHealth / For Parents / Speech-Language Therapy ... most kids with speech and/or language disorders. Speech Disorders, Language Disorders, and Feeding Disorders A speech ...

  9. Digital speech processing using Matlab

    CERN Document Server

    Gopi, E S

    2014-01-01

    Digital Speech Processing Using Matlab deals with digital speech pattern recognition, speech production model, speech feature extraction, and speech compression. The book is written in a manner that is suitable for beginners pursuing basic research in digital speech processing. Matlab illustrations are provided for most topics to enable better understanding of concepts. This book also deals with the basic pattern recognition techniques (illustrated with speech signals using Matlab) such as PCA, LDA, ICA, SVM, HMM, GMM, BPN, and KSOM.

  10. Self-Administered Computer Therapy for Apraxia of Speech: Two-Period Randomized Control Trial With Crossover.

    Science.gov (United States)

    Varley, Rosemary; Cowell, Patricia E; Dyson, Lucy; Inglis, Lesley; Roper, Abigail; Whiteside, Sandra P

    2016-03-01

    There is currently little evidence on effective interventions for poststroke apraxia of speech. We report outcomes of a trial of self-administered computer therapy for apraxia of speech. Effects of speech intervention on naming and repetition of treated and untreated words were compared with those of a visuospatial sham program. The study used a parallel-group, 2-period, crossover design, with participants receiving 2 interventions. Fifty participants with chronic and stable apraxia of speech were randomly allocated to 1 of 2 order conditions: speech-first condition versus sham-first condition. Period 1 design was equivalent to a randomized controlled trial. We report results for this period and profile the effect of the period 2 crossover. Period 1 results revealed significant improvement in naming and repetition only in the speech-first group. The sham-first group displayed improvement in speech production after speech intervention in period 2. Significant improvement of treated words was found in both naming and repetition, with little generalization to structurally similar and dissimilar untreated words. Speech gains were largely maintained after withdrawal of intervention. There was a significant relationship between treatment dose and response. However, average self-administered dose was modest for both groups. Future software design would benefit from incorporation of social and gaming components to boost motivation. Single-word production can be improved in chronic apraxia of speech with behavioral intervention. Self-administered computerized therapy is a promising method for delivering high-intensity speech/language rehabilitation. URL: http://orcid.org/0000-0002-1278-0601. Unique identifier: ISRCTN88245643. © 2016 American Heart Association, Inc.

  11. Appropriate baseline values for HMM-based speech recognition

    CSIR Research Space (South Africa)

    Barnard, E

    2004-11-01

    Full Text Available A number of issues realted to the development of speech-recognition systems with Hidden Markov Models (HMM) are discussed. A set of systematic experiments using the HTK toolkit and the TMIT database are used to elucidate matters such as the number...

  12. Developmental apraxia of speech in children. Quantitive assessment of speech characteristics

    NARCIS (Netherlands)

    Thoonen, G.H.J.

    1998-01-01

    Developmental apraxia of speech (DAS) in children is a speech disorder, supposed to have a neurological origin, which is commonly considered to result from particular deficits in speech processing (i.e., phonological planning, motor programming). However, the label DAS has often been used as

  13. Speech and language therapy for aphasia following stroke.

    Science.gov (United States)

    Brady, Marian C; Kelly, Helen; Godwin, Jon; Enderby, Pam; Campbell, Pauline

    2016-06-01

    Aphasia is an acquired language impairment following brain damage that affects some or all language modalities: expression and understanding of speech, reading, and writing. Approximately one third of people who have a stroke experience aphasia. To assess the effects of speech and language therapy (SLT) for aphasia following stroke. We searched the Cochrane Stroke Group Trials Register (last searched 9 September 2015), CENTRAL (2015, Issue 5) and other Cochrane Library Databases (CDSR, DARE, HTA, to 22 September 2015), MEDLINE (1946 to September 2015), EMBASE (1980 to September 2015), CINAHL (1982 to September 2015), AMED (1985 to September 2015), LLBA (1973 to September 2015), and SpeechBITE (2008 to September 2015). We also searched major trials registers for ongoing trials including ClinicalTrials.gov (to 21 September 2015), the Stroke Trials Registry (to 21 September 2015), Current Controlled Trials (to 22 September 2015), and WHO ICTRP (to 22 September 2015). In an effort to identify further published, unpublished, and ongoing trials we also handsearched the International Journal of Language and Communication Disorders (1969 to 2005) and reference lists of relevant articles, and we contacted academic institutions and other researchers. There were no language restrictions. Randomised controlled trials (RCTs) comparing SLT (a formal intervention that aims to improve language and communication abilities, activity and participation) versus no SLT; social support or stimulation (an intervention that provides social support and communication stimulation but does not include targeted therapeutic interventions); or another SLT intervention (differing in duration, intensity, frequency, intervention methodology or theoretical approach). We independently extracted the data and assessed the quality of included trials. We sought missing data from investigators. We included 57 RCTs (74 randomised comparisons) involving 3002 participants in this review (some appearing in

  14. Writing and Speech Recognition : Observing Error Correction Strategies of Professional Writers

    NARCIS (Netherlands)

    Leijten, M.A.J.C.

    2007-01-01

    In this thesis we describe the organization of speech recognition based writing processes. Writing can be seen as a visual representation of spoken language: a combination that speech recognition takes full advantage of. In the field of writing research, speech recognition is a new writing

  15. Effects of ethanol intoxication on speech suprasegmentals

    Science.gov (United States)

    Hollien, Harry; Dejong, Gea; Martin, Camilo A.; Schwartz, Reva; Liljegren, Kristen

    2001-12-01

    The effects of ingesting ethanol have been shown to be somewhat variable in humans. To date, there appear to be but few universals. Yet, the question often arises: is it possible to determine if a person is intoxicated by observing them in some manner? A closely related question is: can speech be used for this purpose and, if so, can the degree of intoxication be determined? One of the many issues associated with these questions involves the relationships between a person's paralinguistic characteristics and the presence and level of inebriation. To this end, young, healthy speakers of both sexes were carefully selected and sorted into roughly equal groups of light, moderate, and heavy drinkers. They were asked to produce four types of utterances during a learning phase, when sober and at four strictly controlled levels of intoxication (three ascending and one descending). The primary motor speech measures employed were speaking fundamental frequency, speech intensity, speaking rate and nonfluencies. Several statistically significant changes were found for increasing intoxication; the primary ones included rises in F0, in task duration and for nonfluencies. Minor gender differences were found but they lacked statistical significance. So did the small differences among the drinking category subgroups and the subject groupings related to levels of perceived intoxication. Finally, although it may be concluded that certain changes in speech suprasegmentals will occur as a function of increasing intoxication, these patterns cannot be viewed as universal since a few subjects (about 20%) exhibited no (or negative) changes.

  16. REPETITION AS A SPECIAL TYPE OF SYNTACTIC RELATIONS IN REPRESENTED SPEECH: BASED ON THE PROSE BY MARINA TSVETAEVA

    Directory of Open Access Journals (Sweden)

    Olga Pavlovna Puchinina

    2017-03-01

    Full Text Available The author studies the peculiarities of repetition, being one of the ways to build syntactic relations in the represented speech structure, based on the prose by Marina Tsvetaeva. The article describes the different types of repetitions, their distinctive features and frequency of usage in the studied tests. The author describes different types of repetitions in the studied cases of represented speech from the point of view of their position. Repetition occurs at the beginning, end of sentences, beginning and end of a statement or paragraph, end of a statement and beginning of the next one; lexical items may be repeated in the middle of a statement. The morphology of repetitions, i.e., the way by what parts of speech the repeated words and structures are expressed, is of interest from the point of view of functional grammar. The author notes that Tsvetaeva repeats different parts of speech: conjunctions, prepositions, particles, nouns, pronouns, adverbs, numerals, verbs, modal words or a combination of two words. Moreover, due to her special intention, Tsvetaeva intensifies repetition through particular phonetic devices, such as alliteration, rhyme and rhythm, which make her prosaic works sound poetic. Purpose. The article is devoted to the topic of rendering another person’s speech, as it continues to be one of the most important issues of modern linguistics. The subject of analysis is repetition and its different types in the structure of represented speech on the material of prose texts by Marina Tsvetaeva. The author’s aims is to reveal the way these types of repetition (lexical, syntactic, semantic ones function in the structure of represented speech and what effect is achieved with their help. Methodology. The research has been conducted using the continuous sampling method and the quantitative estimation method, aimed to identify the frequency of using different types of repetition and repeated parts of speech and constructions in the

  17. Eigennoise Speech Recovery in Adverse Environments with Joint Compensation of Additive and Convolutive Noise

    Directory of Open Access Journals (Sweden)

    Trung-Nghia Phung

    2015-01-01

    Full Text Available The learning-based speech recovery approach using statistical spectral conversion has been used for some kind of distorted speech as alaryngeal speech and body-conducted speech (or bone-conducted speech. This approach attempts to recover clean speech (undistorted speech from noisy speech (distorted speech by converting the statistical models of noisy speech into that of clean speech without the prior knowledge on characteristics and distributions of noise source. Presently, this approach has still not attracted many researchers to apply in general noisy speech enhancement because of some major problems: those are the difficulties of noise adaptation and the lack of noise robust synthesizable features in different noisy environments. In this paper, we adopted the methods of state-of-the-art voice conversions and speaker adaptation in speech recognition to the proposed speech recovery approach applied in different kinds of noisy environment, especially in adverse environments with joint compensation of additive and convolutive noises. We proposed to use the decorrelated wavelet packet coefficients as a low-dimensional robust synthesizable feature under noisy environments. We also proposed a noise adaptation for speech recovery with the eigennoise similar to the eigenvoice in voice conversion. The experimental results showed that the proposed approach highly outperformed traditional nonlearning-based approaches.

  18. Cross-Cultural Register Differences in Infant-Directed Speech: An Initial Study.

    Directory of Open Access Journals (Sweden)

    Lama K Farran

    Full Text Available Infant-directed speech (IDS provides an environment that appears to play a significant role in the origins of language in the human infant. Differences have been reported in the use of IDS across cultures, suggesting different styles of infant language-learning. Importantly, both cross-cultural and intra-cultural research suggest there may be a positive relationship between the use of IDS and rates of language development, underscoring the need to investigate cultural differences more deeply. The majority of studies, however, have conceptualized IDS monolithically, granting little attention to a potentially key distinction in how IDS manifests across cultures during the first two years. This study examines and quantifies for the first time differences within IDS in the use of baby register (IDS/BR, an acoustically identifiable type of IDS that includes features such as high pitch, long duration, and smooth intonation (the register that is usually assumed to occur in IDS, and adult register (IDS/AR, the type of IDS that does not include such features and thus sounds as if it could have been addressed to an adult. We studied IDS across 19 American and 19 Lebanese mother-infant dyads, with particular focus on the differential use of registers within IDS as mothers interacted with their infants ages 0-24 months. Our results showed considerable usage of IDS/AR (>30% of utterances and a tendency for Lebanese mothers to use more IDS than American mothers. Implications for future research on IDS and its role in elucidating how language evolves across cultures are explored.

  19. Cross-Cultural Register Differences in Infant-Directed Speech: An Initial Study.

    Science.gov (United States)

    Farran, Lama K; Lee, Chia-Cheng; Yoo, Hyunjoo; Oller, D Kimbrough

    2016-01-01

    Infant-directed speech (IDS) provides an environment that appears to play a significant role in the origins of language in the human infant. Differences have been reported in the use of IDS across cultures, suggesting different styles of infant language-learning. Importantly, both cross-cultural and intra-cultural research suggest there may be a positive relationship between the use of IDS and rates of language development, underscoring the need to investigate cultural differences more deeply. The majority of studies, however, have conceptualized IDS monolithically, granting little attention to a potentially key distinction in how IDS manifests across cultures during the first two years. This study examines and quantifies for the first time differences within IDS in the use of baby register (IDS/BR), an acoustically identifiable type of IDS that includes features such as high pitch, long duration, and smooth intonation (the register that is usually assumed to occur in IDS), and adult register (IDS/AR), the type of IDS that does not include such features and thus sounds as if it could have been addressed to an adult. We studied IDS across 19 American and 19 Lebanese mother-infant dyads, with particular focus on the differential use of registers within IDS as mothers interacted with their infants ages 0-24 months. Our results showed considerable usage of IDS/AR (>30% of utterances) and a tendency for Lebanese mothers to use more IDS than American mothers. Implications for future research on IDS and its role in elucidating how language evolves across cultures are explored.

  20. The Impact of Activity-Based Oral Expression Course on Speech Self-Efficacy of Students

    Science.gov (United States)

    Altunkaya, Hatice

    2018-01-01

    The objective of the present study was to examine the effect of the activity-based oral expression course on the speech self-efficacy of psychological counseling and guidance students. The study group included 80 freshmen students in the Psychological Counseling and Guidance Department in the Faculty of Education of a university located in western…

  1. Phonological and Executive Working Memory in L2 Task-Based Speech Planning and Performance

    Science.gov (United States)

    Wen, Zhisheng

    2016-01-01

    The present study sets out to explore the distinctive roles played by two working memory (WM) components in various aspects of L2 task-based speech planning and performance. A group of 40 post-intermediate proficiency level Chinese EFL learners took part in the empirical study. Following the tenets and basic principles of the…

  2. Perception of synthetic speech produced automatically by rule: Intelligibility of eight text-to-speech systems.

    Science.gov (United States)

    Greene, Beth G; Logan, John S; Pisoni, David B

    1986-03-01

    We present the results of studies designed to measure the segmental intelligibility of eight text-to-speech systems and a natural speech control, using the Modified Rhyme Test (MRT). Results indicated that the voices tested could be grouped into four categories: natural speech, high-quality synthetic speech, moderate-quality synthetic speech, and low-quality synthetic speech. The overall performance of the best synthesis system, DECtalk-Paul, was equivalent to natural speech only in terms of performance on initial consonants. The findings are discussed in terms of recent work investigating the perception of synthetic speech under more severe conditions. Suggestions for future research on improving the quality of synthetic speech are also considered.

  3. Perception of synthetic speech produced automatically by rule: Intelligibility of eight text-to-speech systems

    Science.gov (United States)

    GREENE, BETH G.; LOGAN, JOHN S.; PISONI, DAVID B.

    2012-01-01

    We present the results of studies designed to measure the segmental intelligibility of eight text-to-speech systems and a natural speech control, using the Modified Rhyme Test (MRT). Results indicated that the voices tested could be grouped into four categories: natural speech, high-quality synthetic speech, moderate-quality synthetic speech, and low-quality synthetic speech. The overall performance of the best synthesis system, DECtalk-Paul, was equivalent to natural speech only in terms of performance on initial consonants. The findings are discussed in terms of recent work investigating the perception of synthetic speech under more severe conditions. Suggestions for future research on improving the quality of synthetic speech are also considered. PMID:23225916

  4. Dual silent communication system development based on subvocal speech and Raspberry Pi

    Directory of Open Access Journals (Sweden)

    José Daniel Ramírez-Corzo

    2016-09-01

    Additionally, in this article we show the speech subvocal signals’ recording system realization. The average accuracy percentage was 72.5 %, and includes a total of 50 words by class, this is 200 signals. Finally, it demonstrated that using the Raspberry Pi it is possible to set a silent communication system, using subvocal. speech signals.

  5. Speech-like rhythm in a voiced and voiceless orangutan call.

    Directory of Open Access Journals (Sweden)

    Adriano R Lameira

    Full Text Available The evolutionary origins of speech remain obscure. Recently, it was proposed that speech derived from monkey facial signals which exhibit a speech-like rhythm of ∼5 open-close lip cycles per second. In monkeys, these signals may also be vocalized, offering a plausible evolutionary stepping stone towards speech. Three essential predictions remain, however, to be tested to assess this hypothesis' validity; (i Great apes, our closest relatives, should likewise produce 5Hz-rhythm signals, (ii speech-like rhythm should involve calls articulatorily similar to consonants and vowels given that speech rhythm is the direct product of stringing together these two basic elements, and (iii speech-like rhythm should be experience-based. Via cinematic analyses we demonstrate that an ex-entertainment orangutan produces two calls at a speech-like rhythm, coined "clicks" and "faux-speech." Like voiceless consonants, clicks required no vocal fold action, but did involve independent manoeuvring over lips and tongue. In parallel to vowels, faux-speech showed harmonic and formant modulations, implying vocal fold and supralaryngeal action. This rhythm was several times faster than orangutan chewing rates, as observed in monkeys and humans. Critically, this rhythm was seven-fold faster, and contextually distinct, than any other known rhythmic calls described to date in the largest database of the orangutan repertoire ever assembled. The first two predictions advanced by this study are validated and, based on parsimony and exclusion of potential alternative explanations, initial support is given to the third prediction. Irrespectively of the putative origins of these calls and underlying mechanisms, our findings demonstrate irrevocably that great apes are not respiratorily, articulatorilly, or neurologically constrained for the production of consonant- and vowel-like calls at speech rhythm. Orangutan clicks and faux-speech confirm the importance of rhythmic speech

  6. Personality in speech assessment and automatic classification

    CERN Document Server

    Polzehl, Tim

    2015-01-01

    This work combines interdisciplinary knowledge and experience from research fields of psychology, linguistics, audio-processing, machine learning, and computer science. The work systematically explores a novel research topic devoted to automated modeling of personality expression from speech. For this aim, it introduces a novel personality assessment questionnaire and presents the results of extensive labeling sessions to annotate the speech data with personality assessments. It provides estimates of the Big 5 personality traits, i.e. openness, conscientiousness, extroversion, agreeableness, and neuroticism. Based on a database built on the questionnaire, the book presents models to tell apart different personality types or classes from speech automatically.

  7. Speech Matters

    DEFF Research Database (Denmark)

    Hasse Jørgensen, Stina

    2011-01-01

    About Speech Matters - Katarina Gregos, the Greek curator's exhibition at the Danish Pavillion, the Venice Biannual 2011.......About Speech Matters - Katarina Gregos, the Greek curator's exhibition at the Danish Pavillion, the Venice Biannual 2011....

  8. Speech Inconsistency in Children with Childhood Apraxia of Speech, Language Impairment, and Speech Delay: Depends on the Stimuli

    Science.gov (United States)

    Iuzzini-Seigel, Jenya; Hogan, Tiffany P.; Green, Jordan R.

    2017-01-01

    Purpose: The current research sought to determine (a) if speech inconsistency is a core feature of childhood apraxia of speech (CAS) or if it is driven by comorbid language impairment that affects a large subset of children with CAS and (b) if speech inconsistency is a sensitive and specific diagnostic marker that can differentiate between CAS and…

  9. EVOLUTION OF SPEECH: A NEW HYPOTHESIS

    Directory of Open Access Journals (Sweden)

    Shishir

    2016-03-01

    Full Text Available BACKGROUND The first and foremost characteristic of speech is that it is human. Speech is one characteristic feature that has evolved in humans and is by far the most powerful form of communication in the Kingdom Animalia. Today, human has established himself as an alpha species and speech and language evolution has made it possible. But how is speech possible? What anatomical changes have made us possible to speak? A sincere effort has been put in this paper to establish a possible anatomical answer to the riddle. METHODS The prototypes of the cranial skeletons of all the major classes of phylum vertebrata were studied. The materials were studied in museums of Wayanad, Karwar and Museum of Natural History, Imphal. The skeleton of mammal was studied in the Department of Anatomy, K. S. Hegde Medical Academy, Mangalore. RESULTS The curve formed in the base of the skull due to flexion of the splanchnocranium with the neurocranium holds the key to answer of how humans were able to speak. CONCLUSION Of course this may not be the only reason which participated in the evolution of speech like the brain also had to evolve and as a matter of fact the occipital lobes are more prominent in humans when compared to that of the lower mammals. Although, not the only criteria but it is one of the most important thing that has happened in the course of evolution and made us to speak. This small space at the base of the brain is the difference which made us the dominant alpha species.

  10. Steganography based on pixel intensity value decomposition

    Science.gov (United States)

    Abdulla, Alan Anwar; Sellahewa, Harin; Jassim, Sabah A.

    2014-05-01

    This paper focuses on steganography based on pixel intensity value decomposition. A number of existing schemes such as binary, Fibonacci, Prime, Natural, Lucas, and Catalan-Fibonacci (CF) are evaluated in terms of payload capacity and stego quality. A new technique based on a specific representation is proposed to decompose pixel intensity values into 16 (virtual) bit-planes suitable for embedding purposes. The proposed decomposition has a desirable property whereby the sum of all bit-planes does not exceed the maximum pixel intensity value, i.e. 255. Experimental results demonstrate that the proposed technique offers an effective compromise between payload capacity and stego quality of existing embedding techniques based on pixel intensity value decomposition. Its capacity is equal to that of binary and Lucas, while it offers a higher capacity than Fibonacci, Prime, Natural, and CF when the secret bits are embedded in 1st Least Significant Bit (LSB). When the secret bits are embedded in higher bit-planes, i.e., 2nd LSB to 8th Most Significant Bit (MSB), the proposed scheme has more capacity than Natural numbers based embedding. However, from the 6th bit-plane onwards, the proposed scheme offers better stego quality. In general, the proposed decomposition scheme has less effect in terms of quality on pixel value when compared to most existing pixel intensity value decomposition techniques when embedding messages in higher bit-planes.

  11. Speech graphs provide a quantitative measure of thought disorder in psychosis.

    Science.gov (United States)

    Mota, Natalia B; Vasconcelos, Nivaldo A P; Lemos, Nathalia; Pieretti, Ana C; Kinouchi, Osame; Cecchi, Guillermo A; Copelli, Mauro; Ribeiro, Sidarta

    2012-01-01

    Psychosis has various causes, including mania and schizophrenia. Since the differential diagnosis of psychosis is exclusively based on subjective assessments of oral interviews with patients, an objective quantification of the speech disturbances that characterize mania and schizophrenia is in order. In principle, such quantification could be achieved by the analysis of speech graphs. A graph represents a network with nodes connected by edges; in speech graphs, nodes correspond to words and edges correspond to semantic and grammatical relationships. To quantify speech differences related to psychosis, interviews with schizophrenics, manics and normal subjects were recorded and represented as graphs. Manics scored significantly higher than schizophrenics in ten graph measures. Psychopathological symptoms such as logorrhea, poor speech, and flight of thoughts were grasped by the analysis even when verbosity differences were discounted. Binary classifiers based on speech graph measures sorted schizophrenics from manics with up to 93.8% of sensitivity and 93.7% of specificity. In contrast, sorting based on the scores of two standard psychiatric scales (BPRS and PANSS) reached only 62.5% of sensitivity and specificity. The results demonstrate that alterations of the thought process manifested in the speech of psychotic patients can be objectively measured using graph-theoretical tools, developed to capture specific features of the normal and dysfunctional flow of thought, such as divergence and recurrence. The quantitative analysis of speech graphs is not redundant with standard psychometric scales but rather complementary, as it yields a very accurate sorting of schizophrenics and manics. Overall, the results point to automated psychiatric diagnosis based not on what is said, but on how it is said.

  12. Speech graphs provide a quantitative measure of thought disorder in psychosis.

    Directory of Open Access Journals (Sweden)

    Natalia B Mota

    Full Text Available BACKGROUND: Psychosis has various causes, including mania and schizophrenia. Since the differential diagnosis of psychosis is exclusively based on subjective assessments of oral interviews with patients, an objective quantification of the speech disturbances that characterize mania and schizophrenia is in order. In principle, such quantification could be achieved by the analysis of speech graphs. A graph represents a network with nodes connected by edges; in speech graphs, nodes correspond to words and edges correspond to semantic and grammatical relationships. METHODOLOGY/PRINCIPAL FINDINGS: To quantify speech differences related to psychosis, interviews with schizophrenics, manics and normal subjects were recorded and represented as graphs. Manics scored significantly higher than schizophrenics in ten graph measures. Psychopathological symptoms such as logorrhea, poor speech, and flight of thoughts were grasped by the analysis even when verbosity differences were discounted. Binary classifiers based on speech graph measures sorted schizophrenics from manics with up to 93.8% of sensitivity and 93.7% of specificity. In contrast, sorting based on the scores of two standard psychiatric scales (BPRS and PANSS reached only 62.5% of sensitivity and specificity. CONCLUSIONS/SIGNIFICANCE: The results demonstrate that alterations of the thought process manifested in the speech of psychotic patients can be objectively measured using graph-theoretical tools, developed to capture specific features of the normal and dysfunctional flow of thought, such as divergence and recurrence. The quantitative analysis of speech graphs is not redundant with standard psychometric scales but rather complementary, as it yields a very accurate sorting of schizophrenics and manics. Overall, the results point to automated psychiatric diagnosis based not on what is said, but on how it is said.

  13. Evaluation of Short-Term Cepstral Based Features for Detection of Parkinson’s Disease Severity Levels through Speech signals

    Science.gov (United States)

    Oung, Qi Wei; Nisha Basah, Shafriza; Muthusamy, Hariharan; Vijean, Vikneswaran; Lee, Hoileong

    2018-03-01

    Parkinson’s disease (PD) is one type of progressive neurodegenerative disease known as motor system syndrome, which is due to the death of dopamine-generating cells, a region of the human midbrain. PD normally affects people over 60 years of age, which at present has influenced a huge part of worldwide population. Lately, many researches have shown interest into the connection between PD and speech disorders. Researches have revealed that speech signals may be a suitable biomarker for distinguishing between people with Parkinson’s (PWP) from healthy subjects. Therefore, early diagnosis of PD through the speech signals can be considered for this aim. In this research, the speech data are acquired based on speech behaviour as the biomarker for differentiating PD severity levels (mild and moderate) from healthy subjects. Feature extraction algorithms applied are Mel Frequency Cepstral Coefficients (MFCC), Linear Predictive Coefficients (LPC), Linear Prediction Cepstral Coefficients (LPCC), and Weighted Linear Prediction Cepstral Coefficients (WLPCC). For classification, two types of classifiers are used: k-Nearest Neighbour (KNN) and Probabilistic Neural Network (PNN). The experimental results demonstrated that PNN classifier and KNN classifier achieve the best average classification performance of 92.63% and 88.56% respectively through 10-fold cross-validation measures. Favourably, the suggested techniques have the possibilities of becoming a new choice of promising tools for the PD detection with tremendous performance.

  14. Clear Speech - Mere Speech? How segmental and prosodic speech reduction shape the impression that speakers create on listeners

    DEFF Research Database (Denmark)

    Niebuhr, Oliver

    2017-01-01

    of reduction levels and perceived speaker attributes in which moderate reduction can make a better impression on listeners than no reduction. In addition to its relevance in reduction models and theories, this interplay is instructive for various fields of speech application from social robotics to charisma...... whether variation in the degree of reduction also has a systematic effect on the attributes we ascribe to the speaker who produces the speech signal. A perception experiment was carried out for German in which 46 listeners judged whether or not speakers showing 3 different combinations of segmental...... and prosodic reduction levels (unreduced, moderately reduced, strongly reduced) are appropriately described by 13 physical, social, and cognitive attributes. The experiment shows that clear speech is not mere speech, and less clear speech is not just reduced either. Rather, results revealed a complex interplay...

  15. Audiovisual Temporal Recalibration for Speech in Synchrony Perception and Speech Identification

    Science.gov (United States)

    Asakawa, Kaori; Tanaka, Akihiro; Imai, Hisato

    We investigated whether audiovisual synchrony perception for speech could change after observation of the audiovisual temporal mismatch. Previous studies have revealed that audiovisual synchrony perception is re-calibrated after exposure to a constant timing difference between auditory and visual signals in non-speech. In the present study, we examined whether this audiovisual temporal recalibration occurs at the perceptual level even for speech (monosyllables). In Experiment 1, participants performed an audiovisual simultaneity judgment task (i.e., a direct measurement of the audiovisual synchrony perception) in terms of the speech signal after observation of the speech stimuli which had a constant audiovisual lag. The results showed that the “simultaneous” responses (i.e., proportion of responses for which participants judged the auditory and visual stimuli to be synchronous) at least partly depended on exposure lag. In Experiment 2, we adopted the McGurk identification task (i.e., an indirect measurement of the audiovisual synchrony perception) to exclude the possibility that this modulation of synchrony perception was solely attributable to the response strategy using stimuli identical to those of Experiment 1. The characteristics of the McGurk effect reported by participants depended on exposure lag. Thus, it was shown that audiovisual synchrony perception for speech could be modulated following exposure to constant lag both in direct and indirect measurement. Our results suggest that temporal recalibration occurs not only in non-speech signals but also in monosyllabic speech at the perceptual level.

  16. Using DEDICOM for completely unsupervised part-of-speech tagging.

    Energy Technology Data Exchange (ETDEWEB)

    Chew, Peter A.; Bader, Brett William; Rozovskaya, Alla (University of Illinois, Urbana, IL)

    2009-02-01

    A standard and widespread approach to part-of-speech tagging is based on Hidden Markov Models (HMMs). An alternative approach, pioneered by Schuetze (1993), induces parts of speech from scratch using singular value decomposition (SVD). We introduce DEDICOM as an alternative to SVD for part-of-speech induction. DEDICOM retains the advantages of SVD in that it is completely unsupervised: no prior knowledge is required to induce either the tagset or the associations of terms with tags. However, unlike SVD, it is also fully compatible with the HMM framework, in that it can be used to estimate emission- and transition-probability matrices which can then be used as the input for an HMM. We apply the DEDICOM method to the CONLL corpus (CONLL 2000) and compare the output of DEDICOM to the part-of-speech tags given in the corpus, and find that the correlation (almost 0.5) is quite high. Using DEDICOM, we also estimate part-of-speech ambiguity for each term, and find that these estimates correlate highly with part-of-speech ambiguity as measured in the original corpus (around 0.88). Finally, we show how the output of DEDICOM can be evaluated and compared against the more familiar output of supervised HMM-based tagging.

  17. Modeling Speech Intelligibility in Hearing Impaired Listeners

    DEFF Research Database (Denmark)

    Scheidiger, Christoph; Jørgensen, Søren; Dau, Torsten

    2014-01-01

    speech, e.g. phase jitter or spectral subtraction. Recent studies predict SI for normal-hearing (NH) listeners based on a signal-to-noise ratio measure in the envelope domain (SNRenv), in the framework of the speech-based envelope power spectrum model (sEPSM, [20, 21]). These models have shown good...... agreement with measured data under a broad range of conditions, including stationary and modulated interferers, reverberation, and spectral subtraction. Despite the advances in modeling intelligibility in NH listeners, a broadly applicable model that can predict SI in hearing-impaired (HI) listeners...... is not yet available. As a firrst step towards such a model, this study investigates to what extent eects of hearing impairment on SI can be modeled in the sEPSM framework. Preliminary results show that, by only modeling the loss of audibility, the model cannot account for the higher speech reception...

  18. PRACTICING SPEECH THERAPY INTERVENTION FOR SOCIAL INTEGRATION OF CHILDREN WITH SPEECH DISORDERS

    Directory of Open Access Journals (Sweden)

    Martin Ofelia POPESCU

    2016-11-01

    Full Text Available The article presents a concise speech correction intervention program in of dyslalia in conjunction with capacity development of intra, interpersonal and social integration of children with speech disorders. The program main objectives represent: the potential increasing of individual social integration by correcting speech disorders in conjunction with intra- and interpersonal capacity, the potential growth of children and community groups for social integration by optimizing the socio-relational context of children with speech disorder. In the program were included 60 children / students with dyslalia speech disorders (monomorphic and polymorphic dyslalia, from 11 educational institutions - 6 kindergartens and 5 schools / secondary schools, joined with inter-school logopedic centre (CLI from Targu Jiu city and areas of Gorj district. The program was implemented under the assumption that therapeutic-formative intervention to correct speech disorders and facilitate the social integration will lead, in combination with correct pronunciation disorders, to social integration optimization of children with speech disorders. The results conirm the hypothesis and gives facts about the intervention program eficiency.

  19. Measuring Speech Comprehensibility in Students with Down Syndrome

    Science.gov (United States)

    Yoder, Paul J.; Woynaroski, Tiffany; Camarata, Stephen

    2016-01-01

    Purpose: There is an ongoing need to develop assessments of spontaneous speech that focus on whether the child's utterances are comprehensible to listeners. This study sought to identify the attributes of a stable ratings-based measure of speech comprehensibility, which enabled examining the criterion-related validity of an orthography-based…

  20. Relative Contributions of the Dorsal vs. Ventral Speech Streams to Speech Perception are Context Dependent: a lesion study

    Directory of Open Access Journals (Sweden)

    Corianne Rogalsky

    2014-04-01

    Full Text Available The neural basis of speech perception has been debated for over a century. While it is generally agreed that the superior temporal lobes are critical for the perceptual analysis of speech, a major current topic is whether the motor system contributes to speech perception, with several conflicting findings attested. In a dorsal-ventral speech stream framework (Hickok & Poeppel 2007, this debate is essentially about the roles of the dorsal versus ventral speech processing streams. A major roadblock in characterizing the neuroanatomy of speech perception is task-specific effects. For example, much of the evidence for dorsal stream involvement comes from syllable discrimination type tasks, which have been found to behaviorally doubly dissociate from auditory comprehension tasks (Baker et al. 1981. Discrimination task deficits could be a result of difficulty perceiving the sounds themselves, which is the typical assumption, or it could be a result of failures in temporary maintenance of the sensory traces, or the comparison and/or the decision process. Similar complications arise in perceiving sentences: the extent of inferior frontal (i.e. dorsal stream activation during listening to sentences increases as a function of increased task demands (Love et al. 2006. Another complication is the stimulus: much evidence for dorsal stream involvement uses speech samples lacking semantic context (CVs, non-words. The present study addresses these issues in a large-scale lesion-symptom mapping study. 158 patients with focal cerebral lesions from the Mutli-site Aphasia Research Consortium underwent a structural MRI or CT scan, as well as an extensive psycholinguistic battery. Voxel-based lesion symptom mapping was used to compare the neuroanatomy involved in the following speech perception tasks with varying phonological, semantic, and task loads: (i two discrimination tasks of syllables (non-words and words, respectively, (ii two auditory comprehension tasks