WorldWideScience

Sample records for accurate automatic speech

  1. Why has (reasonably accurate) Automatic Speech Recognition been so hard to achieve?

    CERN Document Server

    Wegmann, Steven

    2010-01-01

    Hidden Markov models (HMMs) have been successfully applied to automatic speech recognition for more than 35 years in spite of the fact that a key HMM assumption -- the statistical independence of frames -- is obviously violated by speech data. In fact, this data/model mismatch has inspired many attempts to modify or replace HMMs with alternative models that are better able to take into account the statistical dependence of frames. However it is fair to say that in 2010 the HMM is the consensus model of choice for speech recognition and that HMMs are at the heart of both commercially available products and contemporary research systems. In this paper we present a preliminary exploration aimed at understanding how speech data depart from HMMs and what effect this departure has on the accuracy of HMM-based speech recognition. Our analysis uses standard diagnostic tools from the field of statistics -- hypothesis testing, simulation and resampling -- which are rarely used in the field of speech recognition. Our ma...

  2. Brain-inspired speech segmentation for automatic speech recognition using the speech envelope as a temporal reference

    Science.gov (United States)

    Lee, Byeongwook; Cho, Kwang-Hyun

    2016-11-01

    Speech segmentation is a crucial step in automatic speech recognition because additional speech analyses are performed for each framed speech segment. Conventional segmentation techniques primarily segment speech using a fixed frame size for computational simplicity. However, this approach is insufficient for capturing the quasi-regular structure of speech, which causes substantial recognition failure in noisy environments. How does the brain handle quasi-regular structured speech and maintain high recognition performance under any circumstance? Recent neurophysiological studies have suggested that the phase of neuronal oscillations in the auditory cortex contributes to accurate speech recognition by guiding speech segmentation into smaller units at different timescales. A phase-locked relationship between neuronal oscillation and the speech envelope has recently been obtained, which suggests that the speech envelope provides a foundation for multi-timescale speech segmental information. In this study, we quantitatively investigated the role of the speech envelope as a potential temporal reference to segment speech using its instantaneous phase information. We evaluated the proposed approach by the achieved information gain and recognition performance in various noisy environments. The results indicate that the proposed segmentation scheme not only extracts more information from speech but also provides greater robustness in a recognition test.

  3. Phoneme vs Grapheme Based Automatic Speech Recognition

    OpenAIRE

    Magimai.-Doss, Mathew; Dines, John; Bourlard, Hervé; Hermansky, Hynek

    2004-01-01

    In recent literature, different approaches have been proposed to use graphemes as subword units with implicit source of phoneme information for automatic speech recognition. The major advantage of using graphemes as subword units is that the definition of lexicon is easy. In previous studies, results comparable to phoneme-based automatic speech recognition systems have been reported using context-independent graphemes or context-dependent graphemes with decision trees. In this paper, we study...

  4. Automatic speech recognition a deep learning approach

    CERN Document Server

    Yu, Dong

    2015-01-01

    This book summarizes the recent advancement in the field of automatic speech recognition with a focus on discriminative and hierarchical models. This will be the first automatic speech recognition book to include a comprehensive coverage of recent developments such as conditional random field and deep learning techniques. It presents insights and theoretical foundation of a series of recent models such as conditional random field, semi-Markov and hidden conditional random field, deep neural network, deep belief network, and deep stacking models for sequential learning. It also discusses practical considerations of using these models in both acoustic and language modeling for continuous speech recognition.

  5. Automatic Speech Segmentation Based on HMM

    Directory of Open Access Journals (Sweden)

    M. Kroul

    2007-06-01

    Full Text Available This contribution deals with the problem of automatic phoneme segmentation using HMMs. Automatization of speech segmentation task is important for applications, where large amount of data is needed to process, so manual segmentation is out of the question. In this paper we focus on automatic segmentation of recordings, which will be used for triphone synthesis unit database creation. For speech synthesis, the speech unit quality is a crucial aspect, so the maximal accuracy in segmentation is needed here. In this work, different kinds of HMMs with various parameters have been trained and their usefulness for automatic segmentation is discussed. At the end of this work, some segmentation accuracy tests of all models are presented.

  6. Automatic speech recognition (ASR) based approach for speech therapy of aphasic patients: A review

    Science.gov (United States)

    Jamal, Norezmi; Shanta, Shahnoor; Mahmud, Farhanahani; Sha'abani, MNAH

    2017-09-01

    This paper reviews the state-of-the-art an automatic speech recognition (ASR) based approach for speech therapy of aphasic patients. Aphasia is a condition in which the affected person suffers from speech and language disorder resulting from a stroke or brain injury. Since there is a growing body of evidence indicating the possibility of improving the symptoms at an early stage, ASR based solutions are increasingly being researched for speech and language therapy. ASR is a technology that transfers human speech into transcript text by matching with the system's library. This is particularly useful in speech rehabilitation therapies as they provide accurate, real-time evaluation for speech input from an individual with speech disorder. ASR based approaches for speech therapy recognize the speech input from the aphasic patient and provide real-time feedback response to their mistakes. However, the accuracy of ASR is dependent on many factors such as, phoneme recognition, speech continuity, speaker and environmental differences as well as our depth of knowledge on human language understanding. Hence, the review examines recent development of ASR technologies and its performance for individuals with speech and language disorders.

  7. A Statistical Approach to Automatic Speech Summarization

    Science.gov (United States)

    Hori, Chiori; Furui, Sadaoki; Malkin, Rob; Yu, Hua; Waibel, Alex

    2003-12-01

    This paper proposes a statistical approach to automatic speech summarization. In our method, a set of words maximizing a summarization score indicating the appropriateness of summarization is extracted from automatically transcribed speech and then concatenated to create a summary. The extraction process is performed using a dynamic programming (DP) technique based on a target compression ratio. In this paper, we demonstrate how an English news broadcast transcribed by a speech recognizer is automatically summarized. We adapted our method, which was originally proposed for Japanese, to English by modifying the model for estimating word concatenation probabilities based on a dependency structure in the original speech given by a stochastic dependency context free grammar (SDCFG). We also propose a method of summarizing multiple utterances using a two-level DP technique. The automatically summarized sentences are evaluated by summarization accuracy based on a comparison with a manual summary of speech that has been correctly transcribed by human subjects. Our experimental results indicate that the method we propose can effectively extract relatively important information and remove redundant and irrelevant information from English news broadcasts.

  8. A Statistical Approach to Automatic Speech Summarization

    Directory of Open Access Journals (Sweden)

    Chiori Hori

    2003-02-01

    Full Text Available This paper proposes a statistical approach to automatic speech summarization. In our method, a set of words maximizing a summarization score indicating the appropriateness of summarization is extracted from automatically transcribed speech and then concatenated to create a summary. The extraction process is performed using a dynamic programming (DP technique based on a target compression ratio. In this paper, we demonstrate how an English news broadcast transcribed by a speech recognizer is automatically summarized. We adapted our method, which was originally proposed for Japanese, to English by modifying the model for estimating word concatenation probabilities based on a dependency structure in the original speech given by a stochastic dependency context free grammar (SDCFG. We also propose a method of summarizing multiple utterances using a two-level DP technique. The automatically summarized sentences are evaluated by summarization accuracy based on a comparison with a manual summary of speech that has been correctly transcribed by human subjects. Our experimental results indicate that the method we propose can effectively extract relatively important information and remove redundant and irrelevant information from English news broadcasts.

  9. Personality in speech assessment and automatic classification

    CERN Document Server

    Polzehl, Tim

    2015-01-01

    This work combines interdisciplinary knowledge and experience from research fields of psychology, linguistics, audio-processing, machine learning, and computer science. The work systematically explores a novel research topic devoted to automated modeling of personality expression from speech. For this aim, it introduces a novel personality assessment questionnaire and presents the results of extensive labeling sessions to annotate the speech data with personality assessments. It provides estimates of the Big 5 personality traits, i.e. openness, conscientiousness, extroversion, agreeableness, and neuroticism. Based on a database built on the questionnaire, the book presents models to tell apart different personality types or classes from speech automatically.

  10. Issues in acoustic modeling of speech for automatic speech recognition

    OpenAIRE

    Gong, Yifan; Haton, Jean-Paul; Mari, Jean-François

    1994-01-01

    Projet RFIA; Stochastic modeling is a flexible method for handling the large variability in speech for recognition applications. In contrast to dynamic time warping where heuristic training methods for estimating word templates are used, stochastic modeling allows a probabilistic and automatic training for estimating models. This paper deals with the improvement of stochastic techniques, especially for a better representation of time varying phenomena.

  11. Industrial Applications of Automatic Speech Recognition Systems

    Directory of Open Access Journals (Sweden)

    Dr. Jayashri Vajpai

    2016-03-01

    Full Text Available Current trends in developing technologies form important bridges to the future, fortified by the early and productive use of technology for enriching the human life. Speech signal processing, which includes automatic speech recognition, synthetic speech, and natural language processing, is beginning to have a significant impact on business, industry and ease of operation of personal computers. Apart from this, it facilitates the deeper understanding of complex mechanism of functioning of human brain. Advances in speech recognition technology, over the past five decades, have enabled a wide range of industrial applications. Yet today's applications provide a small preview of a rich future for speech and voice interface technology that will eventually replace keyboards with microphones for designing human machine interface for providing easy access to increasingly intelligent machines. It also shows how the capabilities of speech recognition systems in industrial applications are evolving over time to usher in the next generation of voice-enabled services. This paper aims to present an effective survey of the speech recognition technology described in the available literature and integrate the insights gained during the process of study of individual research and developments. The current applications of speech recognition for real world and industry have also been outlined with special reference to applications in the areas of medical, industrial robotics, forensic, defence and aviation

  12. Hidden Markov models in automatic speech recognition

    Science.gov (United States)

    Wrzoskowicz, Adam

    1993-11-01

    This article describes a method for constructing an automatic speech recognition system based on hidden Markov models (HMMs). The author discusses the basic concepts of HMM theory and the application of these models to the analysis and recognition of speech signals. The author provides algorithms which make it possible to train the ASR system and recognize signals on the basis of distinct stochastic models of selected speech sound classes. The author describes the specific components of the system and the procedures used to model and recognize speech. The author discusses problems associated with the choice of optimal signal detection and parameterization characteristics and their effect on the performance of the system. The author presents different options for the choice of speech signal segments and their consequences for the ASR process. The author gives special attention to the use of lexical, syntactic, and semantic information for the purpose of improving the quality and efficiency of the system. The author also describes an ASR system developed by the Speech Acoustics Laboratory of the IBPT PAS. The author discusses the results of experiments on the effect of noise on the performance of the ASR system and describes methods of constructing HMM's designed to operate in a noisy environment. The author also describes a language for human-robot communications which was defined as a complex multilevel network from an HMM model of speech sounds geared towards Polish inflections. The author also added mandatory lexical and syntactic rules to the system for its communications vocabulary.

  13. Automatic Phonetic Transcription for Danish Speech Recognition

    DEFF Research Database (Denmark)

    Kirkedal, Andreas Søeborg

    Automatic speech recognition (ASR) uses dictionaries that map orthographic words to their phonetic representation. To minimize the occurrence of out-of-vocabulary words, ASR requires large phonetic dictionaries to model pronunciation. Hand-crafted high-quality phonetic dictionaries are difficult......, like Danish, the graphemic and phonetic representations are very dissimilar and more complex rewriting rules must be applied to create the correct phonetic representation. Automatic phonetic transcribers use different strategies, from deep analysis to shallow rewriting rules, to produce phonetic......, syllabication, stød and several other suprasegmental features (Kirkedal, 2013). Simplifying the transcriptions by filtering out the symbols for suprasegmental features in a post-processing step produces a format that is suitable for ASR purposes. eSpeak is an open source speech synthesizer originally created...

  14. Multilingual Vocabularies in Automatic Speech Recognition

    Science.gov (United States)

    2000-08-01

    monolingual (a few thousands) is an obstacle to a full generalization of the inventories, then moved to the multilingual case. In the approach towards the...language. of multilingual models than the monolingual models, and it was specifically observed in the test with Spanish utterances. In fact...UNCLASSIFIED Defense Technical Information Center Compilation Part Notice ADP010389 TITLE: Multilingual Vocabularies in Automatic Speech Recognition

  15. Is markerless acquisition of speech production accurate ?

    OpenAIRE

    Ouni, Slim; Dahmani, Sara

    2016-01-01

    International audience; In this study, the precision of markerless acquisition techniques have been assessed when used to acquire articulatory data for speech production studies. Two different markerless systems have been evaluated and compared to a marker-based one. The main finding is that both markerless systems provide reasonable result during normal speech and the quality is uneven during fast articulated speech. The quality of the data is dependent on the temporal resolution of the mark...

  16. Automatic Smoker Detection from Telephone Speech Signals

    DEFF Research Database (Denmark)

    Alavijeh, Amir Hossein Poorjam; Hesaraki, Soheila; Safavi, Saeid

    2017-01-01

    on Gaussian mixture model means and weights respectively. Each framework is evaluated using different classification algorithms to detect the smoker speakers. Finally, score-level fusion of the i-vector-based and the NFA-based recognizers is considered to improve the classification accuracy. The proposed......This paper proposes an automatic smoking habit detection from spontaneous telephone speech signals. In this method, each utterance is modeled using i-vector and non-negative factor analysis (NFA) frameworks, which yield low-dimensional representation of utterances by applying factor analysis...... method is evaluated on telephone speech signals of speakers whose smoking habits are known drawn from the National Institute of Standards and Technology (NIST) 2008 and 2010 Speaker Recognition Evaluation databases. Experimental results over 1194 utterances show the effectiveness of the proposed approach...

  17. Brain-inspired speech segmentation for automatic speech recognition using the speech envelope as a temporal reference

    OpenAIRE

    Byeongwook Lee; Kwang-Hyun Cho

    2016-01-01

    Speech segmentation is a crucial step in automatic speech recognition because additional speech analyses are performed for each framed speech segment. Conventional segmentation techniques primarily segment speech using a fixed frame size for computational simplicity. However, this approach is insufficient for capturing the quasi-regular structure of speech, which causes substantial recognition failure in noisy environments. How does the brain handle quasi-regular structured speech and maintai...

  18. Indonesian Automatic Speech Recognition For Command Speech Controller Multimedia Player

    Directory of Open Access Journals (Sweden)

    Vivien Arief Wardhany

    2014-12-01

    Full Text Available The purpose of multimedia devices development is controlling through voice. Nowdays voice that can be recognized only in English. To overcome the issue, then recognition using Indonesian language model and accousticc model and dictionary. Automatic Speech Recognizier is build using engine CMU Sphinx with modified english language to Indonesian Language database and XBMC used as the multimedia player. The experiment is using 10 volunteers testing items based on 7 commands. The volunteers is classifiedd by the genders, 5 Male & 5 female. 10 samples is taken in each command, continue with each volunteer perform 10 testing command. Each volunteer also have to try all 7 command that already provided. Based on percentage clarification table, the word “Kanan” had the most recognize with percentage 83% while “pilih” is the lowest one. The word which had the most wrong clarification is “kembali” with percentagee 67%, while the word “kanan” is the lowest one. From the result of Recognition Rate by male there are several command such as “Kembali”, “Utama”, “Atas “ and “Bawah” has the low Recognition Rate. Especially for “kembali” cannot be recognized as the command in the female voices but in male voice that command has 4% of RR this is because the command doesn’t have similar word in english near to “kembali” so the system unrecognize the command. Also for the command “Pilih” using the female voice has 80% of RR but for the male voice has only 4% of RR. This problem is mostly because of the different voice characteristic between adult male and female which male has lower voice frequencies (from 85 to 180 Hz than woman (165 to 255 Hz.The result of the experiment showed that each man had different number of recognition rate caused by the difference tone, pronunciation, and speed of speech. For further work needs to be done in order to improving the accouracy of the Indonesian Automatic Speech Recognition system

  19. Automatic Speech Recognition from Neural Signals: A Focused Review

    Directory of Open Access Journals (Sweden)

    Christian Herff

    2016-09-01

    Full Text Available Speech interfaces have become widely accepted and are nowadays integrated in various real-life applications and devices. They have become a part of our daily life. However, speech interfaces presume the ability to produce intelligible speech, which might be impossible due to either loud environments, bothering bystanders or incapabilities to produce speech (i.e.~patients suffering from locked-in syndrome. For these reasons it would be highly desirable to not speak but to simply envision oneself to say words or sentences. Interfaces based on imagined speech would enable fast and natural communication without the need for audible speech and would give a voice to otherwise mute people.This focused review analyzes the potential of different brain imaging techniques to recognize speech from neural signals by applying Automatic Speech Recognition technology. We argue that modalities based on metabolic processes, such as functional Near Infrared Spectroscopy and functional Magnetic Resonance Imaging, are less suited for Automatic Speech Recognition from neural signals due to low temporal resolution but are very useful for the investigation of the underlying neural mechanisms involved in speech processes. In contrast, electrophysiologic activity is fast enough to capture speech processes and is therefor better suited for ASR. Our experimental results indicate the potential of these signals for speech recognition from neural data with a focus on invasively measured brain activity (electrocorticography. As a first example of Automatic Speech Recognition techniques used from neural signals, we discuss the emph{Brain-to-text} system.

  20. Automatic speech recognition An evaluation of Google Speech

    OpenAIRE

    Stenman, Magnus

    2015-01-01

    The use of speech recognition is increasing rapidly and is now available in smart TVs, desktop computers, every new smart phone, etc. allowing us to talk to computers naturally. With the use in home appliances, education and even in surgical procedures accuracy and speed becomes very important. This thesis aims to give an introduction to speech recognition and discuss its use in robotics. An evaluation of Google Speech, using Google’s speech API, in regards to word error rate and translation ...

  1. Confidence and rejection in automatic speech recognition

    Science.gov (United States)

    Colton, Larry Don

    Automatic speech recognition (ASR) is performed imperfectly by computers. For some designated part (e.g., word or phrase) of the ASR output, rejection is deciding (yes or no) whether it is correct, and confidence is the probability (0.0 to 1.0) of it being correct. This thesis presents new methods of rejecting errors and estimating confidence for telephone speech. These are also called word or utterance verification and can be used in wordspotting or voice-response systems. Open-set or out-of-vocabulary situations are a primary focus. Language models are not considered. In vocabulary-dependent rejection all words in the target vocabulary are known in advance and a strategy can be developed for confirming each word. A word-specific artificial neural network (ANN) is shown to discriminate well, and scores from such ANNs are shown on a closed-set recognition task to reorder the N-best hypothesis list (N=3) for improved recognition performance. Segment-based duration and perceptual linear prediction (PLP) features are shown to perform well for such ANNs. The majority of the thesis concerns vocabulary- and task-independent confidence and rejection based on phonetic word models. These can be computed for words even when no training examples of those words have been seen. New techniques are developed using phoneme ranks instead of probabilities in each frame. These are shown to perform as well as the best other methods examined despite the data reduction involved. Certain new weighted averaging schemes are studied but found to give no performance benefit. Hierarchical averaging is shown to improve performance significantly: frame scores combine to make segment (phoneme state) scores, which combine to make phoneme scores, which combine to make word scores. Use of intermediate syllable scores is shown to not affect performance. Normalizing frame scores by an average of the top probabilities in each frame is shown to improve performance significantly. Perplexity of the wrong

  2. Disordered Speech Assessment Using Automatic Methods Based on Quantitative Measures

    Directory of Open Access Journals (Sweden)

    Christine Sapienza

    2005-06-01

    Full Text Available Speech quality assessment methods are necessary for evaluating and documenting treatment outcomes of patients suffering from degraded speech due to Parkinson's disease, stroke, or other disease processes. Subjective methods of speech quality assessment are more accurate and more robust than objective methods but are time-consuming and costly. We propose a novel objective measure of speech quality assessment that builds on traditional speech processing techniques such as dynamic time warping (DTW and the Itakura-Saito (IS distortion measure. Initial results show that our objective measure correlates well with the more expensive subjective methods.

  3. A Review on Speech Corpus Development for Automatic Speech Recognition in Indian Languages

    Directory of Open Access Journals (Sweden)

    Cini kurian

    2015-05-01

    Full Text Available Corpus development gained much attention due to recent statistics based natural language processing. It has new applications in Language Technology, linguistic research, language education and information exchange. Corpus based Language research has an innovative outlook which will discard the aged linguistic theories. Speech corpus is the essential resources for building a speech recognizer. One of the main challenges faced by speech scientist is the unavailability of these resources. Very fewer efforts have been made in Indian languages to make these resources available to public compared to English. In this paper we review the efforts made in Indian languages for developing speech corpus for automatic speech recognition.

  4. Modelling context in automatic speech recognition

    NARCIS (Netherlands)

    Wiggers, P.

    2008-01-01

    Speech is at the core of human communication. Speaking and listing comes so natural to us that we do not have to think about it at all. The underlying cognitive processes are very rapid and almost completely subconscious. It is hard, if not impossible not to understand speech. For computers on the o

  5. Automatic Blind Syllable Segmentation for Continuous Speech

    OpenAIRE

    Villing, Rudi; Timoney, Joseph; Ward, Tomas

    2004-01-01

    In this paper a simple practical method for blind segmentation of continuous speech into its constituent syllables is presented. This technique which uses amplitude onset velocity and coarse spectral makeup to identify syllable boundaries is tested on a corpus of continuous speech and compared with an established segmentation algorithm. The results show substantial performance benefit using the proposed algorithm.

  6. Modelling Human Speech Recognition using Automatic Speech Recognition Paradigms in SpeM

    NARCIS (Netherlands)

    Scharenborg, O.E.; McQueen, J.M.; Bosch, L.F.M. ten; Norris, D.

    2003-01-01

    We have recently developed a new model of human speech recognition, based on automatic speech recognition techniques [1]. The present paper has two goals. First, we show that the new model performs well in the recognition of lexically ambiguous input. These demonstrations suggest that the model is

  7. Matrix sentence intelligibility prediction using an automatic speech recognition system.

    Science.gov (United States)

    Schädler, Marc René; Warzybok, Anna; Hochmuth, Sabine; Kollmeier, Birger

    2015-01-01

    The feasibility of predicting the outcome of the German matrix sentence test for different types of stationary background noise using an automatic speech recognition (ASR) system was studied. Speech reception thresholds (SRT) of 50% intelligibility were predicted in seven noise conditions. The ASR system used Mel-frequency cepstral coefficients as a front-end and employed whole-word Hidden Markov models on the back-end side. The ASR system was trained and tested with noisy matrix sentences on a broad range of signal-to-noise ratios. The ASR-based predictions were compared to data from the literature ( Hochmuth et al, 2015 ) obtained with 10 native German listeners with normal hearing and predictions of the speech intelligibility index (SII). The ASR-based predictions showed a high and significant correlation (R² = 0.95, p speech and noise signals. Minimum assumptions were made about human speech processing already incorporated in a reference-free ordinary ASR system.

  8. Automatic speech signal segmentation based on the innovation adaptive filter

    Directory of Open Access Journals (Sweden)

    Makowski Ryszard

    2014-06-01

    Full Text Available Speech segmentation is an essential stage in designing automatic speech recognition systems and one can find several algorithms proposed in the literature. It is a difficult problem, as speech is immensely variable. The aim of the authors’ studies was to design an algorithm that could be employed at the stage of automatic speech recognition. This would make it possible to avoid some problems related to speech signal parametrization. Posing the problem in such a way requires the algorithm to be capable of working in real time. The only such algorithm was proposed by Tyagi et al., (2006, and it is a modified version of Brandt’s algorithm. The article presents a new algorithm for unsupervised automatic speech signal segmentation. It performs segmentation without access to information about the phonetic content of the utterances, relying exclusively on second-order statistics of a speech signal. The starting point for the proposed method is time-varying Schur coefficients of an innovation adaptive filter. The Schur algorithm is known to be fast, precise, stable and capable of rapidly tracking changes in second order signal statistics. A transfer from one phoneme to another in the speech signal always indicates a change in signal statistics caused by vocal track changes. In order to allow for the properties of human hearing, detection of inter-phoneme boundaries is performed based on statistics defined on the mel spectrum determined from the reflection coefficients. The paper presents the structure of the algorithm, defines its properties, lists parameter values, describes detection efficiency results, and compares them with those for another algorithm. The obtained segmentation results, are satisfactory.

  9. Automatic discrimination between laughter and speech

    NARCIS (Netherlands)

    Truong, K.; Leeuwen, D. van

    2007-01-01

    Emotions can be recognized by audible paralinguistic cues in speech. By detecting these paralinguistic cues that can consist of laughter, a trembling voice, coughs, changes in the intonation contour etc., information about the speaker’s state and emotion can be revealed. This paper describes the

  10. Automatic discrimination between laughter and speech

    NARCIS (Netherlands)

    Truong, K.; Leeuwen, D. van

    2007-01-01

    Emotions can be recognized by audible paralinguistic cues in speech. By detecting these paralinguistic cues that can consist of laughter, a trembling voice, coughs, changes in the intonation contour etc., information about the speaker’s state and emotion can be revealed. This paper describes the dev

  11. Automatic Speech Recognition: Reliability and Pedagogical Implications for Teaching Pronunciation

    Science.gov (United States)

    Kim, In-Seok

    2006-01-01

    This study examines the reliability of automatic speech recognition (ASR) software used to teach English pronunciation, focusing on one particular piece of software, "FluSpeak, as a typical example." Thirty-six Korean English as a Foreign Language (EFL) college students participated in an experiment in which they listened to 15 sentences…

  12. Using Automatic Speech Recognition Technology with Elicited Oral Response Testing

    Science.gov (United States)

    Cox, Troy L.; Davies, Randall S.

    2012-01-01

    This study examined the use of automatic speech recognition (ASR) scored elicited oral response (EOR) tests to assess the speaking ability of English language learners. It also examined the relationship between ASR-scored EOR and other language proficiency measures and the ability of the ASR to rate speakers without bias to gender or native…

  13. Jackson's Parrot: Samuel Beckett, Aphasic Speech Automatisms, and Psychosomatic Language.

    Science.gov (United States)

    Salisbury, Laura; Code, Chris

    2016-06-01

    This article explores the relationship between automatic and involuntary language in the work of Samuel Beckett and late nineteenth-century neurological conceptions of language that emerged from aphasiology. Using the work of John Hughlings Jackson alongside contemporary neuroscientific research, we explore the significance of the lexical and affective symmetries between Beckett's compulsive and profoundly embodied language and aphasic speech automatisms. The interdisciplinary work in this article explores the paradox of how and why Beckett was able to search out a longed-for language of feeling that might disarticulate the classical bond between the language, intention, rationality and the human, in forms of expression that seem automatic and "readymade".

  14. Automatic Emotion Recognition in Speech: Possibilities and Significance

    Directory of Open Access Journals (Sweden)

    Milana Bojanić

    2009-12-01

    Full Text Available Automatic speech recognition and spoken language understanding are crucial steps towards a natural humanmachine interaction. The main task of the speech communication process is the recognition of the word sequence, but the recognition of prosody, emotion and stress tags may be of particular importance as well. This paper discusses thepossibilities of recognition emotion from speech signal in order to improve ASR, and also provides the analysis of acoustic features that can be used for the detection of speaker’s emotion and stress. The paper also provides a short overview of emotion and stress classification techniques. The importance and place of emotional speech recognition is shown in the domain of human-computer interactive systems and transaction communication model. The directions for future work are given at the end of this work.

  15. Automatic speech segmentation using throat-acoustic correlation coefficients

    Science.gov (United States)

    Mussabayev, Rustam Rafikovich; Kalimoldayev, Maksat N.; Amirgaliyev, Yedilkhan N.; Mussabayev, Timur R.

    2016-11-01

    This work considers one of the approaches to the solution of the task of discrete speech signal automatic segmentation. The aim of this work is to construct such an algorithm which should meet the following requirements: segmentation of a signal into acoustically homogeneous segments, high accuracy and segmentation speed, unambiguity and reproducibility of segmentation results, lack of necessity of preliminary training with the use of a special set consisting of manually segmented signals. Development of the algorithm which corresponds to the given requirements was conditioned by the necessity of formation of automatically segmented speech databases that have a large volume. One of the new approaches to the solution of this task is viewed in this article. For this purpose we use the new type of informative features named TAC-coefficients (Throat-Acoustic Correlation coefficients) which provide sufficient segmentation accuracy and effi- ciency.

  16. Leveraging Automatic Speech Recognition Errors to Detect Challenging Speech Segments in TED Talks

    Science.gov (United States)

    Mirzaei, Maryam Sadat; Meshgi, Kourosh; Kawahara, Tatsuya

    2016-01-01

    This study investigates the use of Automatic Speech Recognition (ASR) systems to epitomize second language (L2) listeners' problems in perception of TED talks. ASR-generated transcripts of videos often involve recognition errors, which may indicate difficult segments for L2 listeners. This paper aims to discover the root-causes of the ASR errors…

  17. The influence of age, hearing, and working memory on the speech comprehension benefit derived from an automatic speech recognition system

    NARCIS (Netherlands)

    Zekveld, A.A.; Kramer, S.E.; Kessens, J.M.; Vlaming, M.S.M.G.; Houtgast, T.

    2009-01-01

    Objective: The aim of the current study was to examine whether partly incorrect subtitles that are automatically generated by an Automatic Speech Recognition (ASR) system, improve speech comprehension by listeners with hearing impairment. In an earlier study (Zekveld et al. 2008), we showed that spe

  18. Automatic Speech Signal Analysis for Clinical Diagnosis and Assessment of Speech Disorders

    CERN Document Server

    Baghai-Ravary, Ladan

    2013-01-01

    Automatic Speech Signal Analysis for Clinical Diagnosis and Assessment of Speech Disorders provides a survey of methods designed to aid clinicians in the diagnosis and monitoring of speech disorders such as dysarthria and dyspraxia, with an emphasis on the signal processing techniques, statistical validity of the results presented in the literature, and the appropriateness of methods that do not require specialized equipment, rigorously controlled recording procedures or highly skilled personnel to interpret results. Such techniques offer the promise of a simple and cost-effective, yet objective, assessment of a range of medical conditions, which would be of great value to clinicians. The ideal scenario would begin with the collection of examples of the clients’ speech, either over the phone or using portable recording devices operated by non-specialist nursing staff. The recordings could then be analyzed initially to aid diagnosis of conditions, and subsequently to monitor the clients’ progress and res...

  19. Automatic Speech Acquisition and Recognition for Spacesuit Audio Systems

    Science.gov (United States)

    Ye, Sherry

    2015-01-01

    NASA has a widely recognized but unmet need for novel human-machine interface technologies that can facilitate communication during astronaut extravehicular activities (EVAs), when loud noises and strong reverberations inside spacesuits make communication challenging. WeVoice, Inc., has developed a multichannel signal-processing method for speech acquisition in noisy and reverberant environments that enables automatic speech recognition (ASR) technology inside spacesuits. The technology reduces noise by exploiting differences between the statistical nature of signals (i.e., speech) and noise that exists in the spatial and temporal domains. As a result, ASR accuracy can be improved to the level at which crewmembers will find the speech interface useful. System components and features include beam forming/multichannel noise reduction, single-channel noise reduction, speech feature extraction, feature transformation and normalization, feature compression, and ASR decoding. Arithmetic complexity models were developed and will help designers of real-time ASR systems select proper tasks when confronted with constraints in computational resources. In Phase I of the project, WeVoice validated the technology. The company further refined the technology in Phase II and developed a prototype for testing and use by suited astronauts.

  20. The benefit obtained from visually displayed text from an automatic speech recognizer during listening to speech presented in noise

    NARCIS (Netherlands)

    Zekveld, A.A.; Kramer, S.E.; Kessens, J.M.; Vlaming, M.S.M.G.; Houtgast, T.

    2008-01-01

    OBJECTIVES: The aim of this study was to evaluate the benefit that listeners obtain from visually presented output from an automatic speech recognition (ASR) system during listening to speech in noise. DESIGN: Auditory-alone and audiovisual speech reception thresholds (SRTs) were measured. The SRT

  1. Robust, accurate and fast automatic segmentation of the spinal cord.

    Science.gov (United States)

    De Leener, Benjamin; Kadoury, Samuel; Cohen-Adad, Julien

    2014-09-01

    Spinal cord segmentation provides measures of atrophy and facilitates group analysis via inter-subject correspondence. Automatizing this procedure enables studies with large throughput and minimizes user bias. Although several automatic segmentation methods exist, they are often restricted in terms of image contrast and field-of-view. This paper presents a new automatic segmentation method (PropSeg) optimized for robustness, accuracy and speed. The algorithm is based on the propagation of a deformable model and is divided into three parts: firstly, an initialization step detects the spinal cord position and orientation using a circular Hough transform on multiple axial slices rostral and caudal to the starting plane and builds an initial elliptical tubular mesh. Secondly, a low-resolution deformable model is propagated along the spinal cord. To deal with highly variable contrast levels between the spinal cord and the cerebrospinal fluid, the deformation is coupled with a local contrast-to-noise adaptation at each iteration. Thirdly, a refinement process and a global deformation are applied on the propagated mesh to provide an accurate segmentation of the spinal cord. Validation was performed in 15 healthy subjects and two patients with spinal cord injury, using T1- and T2-weighted images of the entire spinal cord and on multiecho T2*-weighted images. Our method was compared against manual segmentation and against an active surface method. Results show high precision for all the MR sequences. Dice coefficients were 0.9 for the T1- and T2-weighted cohorts and 0.86 for the T2*-weighted images. The proposed method runs in less than 1min on a normal computer and can be used to quantify morphological features such as cross-sectional area along the whole spinal cord.

  2. Robust Automatic Speech Recognition in Impulsive Noise Environment

    Institute of Scientific and Technical Information of China (English)

    DINGPei; CAOZhigang

    2005-01-01

    This paper presents an efficient method to directly suppress the effect of impulsive noise for robust Automatic speech recognition (ASR). In this method, according to the noise sensitivity of each feature dimension,the observation vectors are divided into several parts, eachof which is assigned to a proper threshold. In recognition stage, the unreliable probability preponderance of incorrect competing path caused by impulsive noise is eliminated by Flooring observation probability (FOP) of eachfeature sub-vector at the Gaussian mixture level, so that the correct path will recover the priority of being chosen in decoding. Experimental results also demonstrate that the proposed method can significantly improve the recognition accuracy both in machinegun noise and simulated impulsive noise environments, while maintaining high performance for clean speech recognition.

  3. Automatic correction of part-of-speech corpora

    OpenAIRE

    Reichel, Uwe D.; Bucar Shigemori, Lia Saki

    2008-01-01

    In this study a simple method for automatic correction of part-ofspeech corpora is presented, which works as follows: Initially two or more already available part-of-speech taggers are applied on the data. Then a sample of differing outputs is taken to train a classifier to predict for each difference which of the taggers (if any) delivered the correct output. As classifiers we employed instance-based learning, a C4.5 decision tree and a Bayesian classifier. Their performances ranged fr...

  4. Studies on inter-speaker variability in speech and its application in automatic speech recognition

    Indian Academy of Sciences (India)

    S Umesh

    2011-10-01

    In this paper, we give an overview of the problem of inter-speaker variability and its study in many diverse areas of speech signal processing. We first give an overview of vowel-normalization studies that minimize variations in the acoustic representation of vowel realizations by different speakers. We then describe the universal-warping approach to speaker normalization which unifies many of the vowel normalization approaches and also shows the relation between speech production, perception and auditory processing. We then address the problem of inter-speaker variability in automatic speech recognition (ASR) and describe techniques that are used to reduce these effects and thereby improve the performance of speaker-independent ASR systems.

  5. Speech Acquisition and Automatic Speech Recognition for Integrated Spacesuit Audio Systems

    Science.gov (United States)

    Huang, Yiteng; Chen, Jingdong; Chen, Shaoyan

    2010-01-01

    A voice-command human-machine interface system has been developed for spacesuit extravehicular activity (EVA) missions. A multichannel acoustic signal processing method has been created for distant speech acquisition in noisy and reverberant environments. This technology reduces noise by exploiting differences in the statistical nature of signal (i.e., speech) and noise that exists in the spatial and temporal domains. As a result, the automatic speech recognition (ASR) accuracy can be improved to the level at which crewmembers would find the speech interface useful. The developed speech human/machine interface will enable both crewmember usability and operational efficiency. It can enjoy a fast rate of data/text entry, small overall size, and can be lightweight. In addition, this design will free the hands and eyes of a suited crewmember. The system components and steps include beam forming/multi-channel noise reduction, single-channel noise reduction, speech feature extraction, feature transformation and normalization, feature compression, model adaption, ASR HMM (Hidden Markov Model) training, and ASR decoding. A state-of-the-art phoneme recognizer can obtain an accuracy rate of 65 percent when the training and testing data are free of noise. When it is used in spacesuits, the rate drops to about 33 percent. With the developed microphone array speech-processing technologies, the performance is improved and the phoneme recognition accuracy rate rises to 44 percent. The recognizer can be further improved by combining the microphone array and HMM model adaptation techniques and using speech samples collected from inside spacesuits. In addition, arithmetic complexity models for the major HMMbased ASR components were developed. They can help real-time ASR system designers select proper tasks when in the face of constraints in computational resources.

  6. Automatic Speech Recognition Systems for the Evaluation of Voice and Speech Disorders in Head and Neck Cancer

    Directory of Open Access Journals (Sweden)

    Andreas Maier

    2010-01-01

    Full Text Available In patients suffering from head and neck cancer, speech intelligibility is often restricted. For assessment and outcome measurements, automatic speech recognition systems have previously been shown to be appropriate for objective and quick evaluation of intelligibility. In this study we investigate the applicability of the method to speech disorders caused by head and neck cancer. Intelligibility was quantified by speech recognition on recordings of a standard text read by 41 German laryngectomized patients with cancer of the larynx or hypopharynx and 49 German patients who had suffered from oral cancer. The speech recognition provides the percentage of correctly recognized words of a sequence, that is, the word recognition rate. Automatic evaluation was compared to perceptual ratings by a panel of experts and to an age-matched control group. Both patient groups showed significantly lower word recognition rates than the control group. Automatic speech recognition yielded word recognition rates which complied with experts' evaluation of intelligibility on a significant level. Automatic speech recognition serves as a good means with low effort to objectify and quantify the most important aspect of pathologic speech—the intelligibility. The system was successfully applied to voice and speech disorders.

  7. Post-error Correction in Automatic Speech Recognition Using Discourse Information

    OpenAIRE

    Kang,S.; Kim, J. -H.; Seo, J.

    2014-01-01

    Overcoming speech recognition errors in the field of human�computer interaction is important in ensuring a consistent user experience. This paper proposes a semantic-oriented post-processing approach for the correction of errors in speech recognition. The novelty of the model proposed here is that it re-ranks the n-best hypothesis of speech recognition based on the user's intention, which is analyzed from previous discourse information, while conventional automatic speech reco...

  8. Difficulties in Automatic Speech Recognition of Dysarthric Speakers and Implications for Speech-Based Applications Used by the Elderly: A Literature Review

    Science.gov (United States)

    Young, Victoria; Mihailidis, Alex

    2010-01-01

    Despite their growing presence in home computer applications and various telephony services, commercial automatic speech recognition technologies are still not easily employed by everyone; especially individuals with speech disorders. In addition, relatively little research has been conducted on automatic speech recognition performance with older…

  9. Difficulties in Automatic Speech Recognition of Dysarthric Speakers and Implications for Speech-Based Applications Used by the Elderly: A Literature Review

    Science.gov (United States)

    Young, Victoria; Mihailidis, Alex

    2010-01-01

    Despite their growing presence in home computer applications and various telephony services, commercial automatic speech recognition technologies are still not easily employed by everyone; especially individuals with speech disorders. In addition, relatively little research has been conducted on automatic speech recognition performance with older…

  10. Behavioral and Electrophysiological Evidence for Early and Automatic Detection of Phonological Equivalence in Variable Speech Inputs

    Science.gov (United States)

    Kharlamov, Viktor; Campbell, Kenneth; Kazanina, Nina

    2011-01-01

    Speech sounds are not always perceived in accordance with their acoustic-phonetic content. For example, an early and automatic process of perceptual repair, which ensures conformity of speech inputs to the listener's native language phonology, applies to individual input segments that do not exist in the native inventory or to sound sequences that…

  11. Developing and Evaluating an Oral Skills Training Website Supported by Automatic Speech Recognition Technology

    Science.gov (United States)

    Chen, Howard Hao-Jan

    2011-01-01

    Oral communication ability has become increasingly important to many EFL students. Several commercial software programs based on automatic speech recognition (ASR) technologies are available but their prices are not affordable for many students. This paper will demonstrate how the Microsoft Speech Application Software Development Kit (SASDK), a…

  12. Developing and Evaluating an Oral Skills Training Website Supported by Automatic Speech Recognition Technology

    Science.gov (United States)

    Chen, Howard Hao-Jan

    2011-01-01

    Oral communication ability has become increasingly important to many EFL students. Several commercial software programs based on automatic speech recognition (ASR) technologies are available but their prices are not affordable for many students. This paper will demonstrate how the Microsoft Speech Application Software Development Kit (SASDK), a…

  13. Using Automatic Speech Recognition to Dictate Mathematical Expressions: The Development of the "TalkMaths" Application at Kingston University

    Science.gov (United States)

    Wigmore, Angela; Hunter, Gordon; Pflugel, Eckhard; Denholm-Price, James; Binelli, Vincent

    2009-01-01

    Speech technology--especially automatic speech recognition--has now advanced to a level where it can be of great benefit both to able-bodied people and those with various disabilities. In this paper we describe an application "TalkMaths" which, using the output from a commonly-used conventional automatic speech recognition system,…

  14. Reference-free automatic quality assessment of tracheoesophageal speech.

    Science.gov (United States)

    Huang, Andy; Falk, Tiago H; Chan, Wai-Yip; Parsa, Vijay; Doyle, Philip

    2009-01-01

    Evaluation of the quality of tracheoesophageal (TE) speech using machines instead of human experts can enhance the voice rehabilitation process for patients who have undergone total laryngectomy and voice restoration. Towards the goal of devising a reference-free TE speech quality estimation algorithm, we investigate the efficacy of speech signal features that are used in standard telephone-speech quality assessment algorithms, in conjunction with a recently introduced speech modulation spectrum measure. Tests performed on two TE speech databases demonstrate that the modulation spectral measure and a subset of features in the standard ITU-T P.563 algorithm estimate TE speech quality with better correlation (up to 0.9) than previously proposed features.

  15. Man-system interface based on automatic speech recognition: integration to a virtual control desk

    Energy Technology Data Exchange (ETDEWEB)

    Jorge, Carlos Alexandre F.; Mol, Antonio Carlos A.; Pereira, Claudio M.N.A.; Aghina, Mauricio Alves C., E-mail: calexandre@ien.gov.b, E-mail: mol@ien.gov.b, E-mail: cmnap@ien.gov.b, E-mail: mag@ien.gov.b [Instituto de Engenharia Nuclear (IEN/CNEN-RJ), Rio de Janeiro, RJ (Brazil); Nomiya, Diogo V., E-mail: diogonomiya@gmail.co [Universidade Federal do Rio de Janeiro (UFRJ), RJ (Brazil)

    2009-07-01

    This work reports the implementation of a man-system interface based on automatic speech recognition, and its integration to a virtual nuclear power plant control desk. The later is aimed to reproduce a real control desk using virtual reality technology, for operator training and ergonomic evaluation purpose. An automatic speech recognition system was developed to serve as a new interface with users, substituting computer keyboard and mouse. They can operate this virtual control desk in front of a computer monitor or a projection screen through spoken commands. The automatic speech recognition interface developed is based on a well-known signal processing technique named cepstral analysis, and on artificial neural networks. The speech recognition interface is described, along with its integration with the virtual control desk, and results are presented. (author)

  16. Machine learning based sample extraction for automatic speech recognition using dialectal Assamese speech.

    Science.gov (United States)

    Agarwalla, Swapna; Sarma, Kandarpa Kumar

    2016-06-01

    Automatic Speaker Recognition (ASR) and related issues are continuously evolving as inseparable elements of Human Computer Interaction (HCI). With assimilation of emerging concepts like big data and Internet of Things (IoT) as extended elements of HCI, ASR techniques are found to be passing through a paradigm shift. Oflate, learning based techniques have started to receive greater attention from research communities related to ASR owing to the fact that former possess natural ability to mimic biological behavior and that way aids ASR modeling and processing. The current learning based ASR techniques are found to be evolving further with incorporation of big data, IoT like concepts. Here, in this paper, we report certain approaches based on machine learning (ML) used for extraction of relevant samples from big data space and apply them for ASR using certain soft computing techniques for Assamese speech with dialectal variations. A class of ML techniques comprising of the basic Artificial Neural Network (ANN) in feedforward (FF) and Deep Neural Network (DNN) forms using raw speech, extracted features and frequency domain forms are considered. The Multi Layer Perceptron (MLP) is configured with inputs in several forms to learn class information obtained using clustering and manual labeling. DNNs are also used to extract specific sentence types. Initially, from a large storage, relevant samples are selected and assimilated. Next, a few conventional methods are used for feature extraction of a few selected types. The features comprise of both spectral and prosodic types. These are applied to Recurrent Neural Network (RNN) and Fully Focused Time Delay Neural Network (FFTDNN) structures to evaluate their performance in recognizing mood, dialect, speaker and gender variations in dialectal Assamese speech. The system is tested under several background noise conditions by considering the recognition rates (obtained using confusion matrices and manually) and computation time

  17. Noise robust automatic speech recognition with adaptive quantile based noise estimation and speech band emphasizing filter bank

    DEFF Research Database (Denmark)

    Bonde, Casper Stork; Graversen, Carina; Gregersen, Andreas Gregers;

    2005-01-01

    An important topic in Automatic Speech Recognition (ASR) is to reduce the effect of noise, in particular when mismatch exists between the training and application conditions. Many noise robutness schemes within the feature processing domain use as a prerequisite a noise estimate prior...... to the appearance of the speech signal which require noise robust voice activity detection and assumptions of stationary noise. However, both of these requirements are often not met and it is therefore of particular interest to investigate methods like the Quantile Based Noise Estimation (QBNE) mehtod which...... estimates the noise during speech and non-speech sections without the use of a voice activity detector. While the standard QBNE-method uses a fixed pre-defined quantile accross all frequency bands, this paper suggests adaptive QBNE (AQBNE) which adapts the quantile individually to each frequency band...

  18. Developing a broadband automatic speech recognition system for Afrikaans

    CSIR Research Space (South Africa)

    De Wet, Febe

    2011-08-01

    Full Text Available Afrikaans is one of the eleven official languages of South Africa. It is classified as an under-resourced language. No annotated broadband speech corpora currently exist for Afrikaans. This article reports on the development of speech resources...

  19. Automatic classification and accurate size measurement of blank mask defects

    Science.gov (United States)

    Bhamidipati, Samir; Paninjath, Sankaranarayanan; Pereira, Mark; Buck, Peter

    2015-07-01

    A blank mask and its preparation stages, such as cleaning or resist coating, play an important role in the eventual yield obtained by using it. Blank mask defects' impact analysis directly depends on the amount of available information such as the number of defects observed, their accurate locations and sizes. Mask usability qualification at the start of the preparation process, is crudely based on number of defects. Similarly, defect information such as size is sought to estimate eventual defect printability on the wafer. Tracking of defect characteristics, specifically size and shape, across multiple stages, can further be indicative of process related information such as cleaning or coating process efficiencies. At the first level, inspection machines address the requirement of defect characterization by detecting and reporting relevant defect information. The analysis of this information though is still largely a manual process. With advancing technology nodes and reducing half-pitch sizes, a large number of defects are observed; and the detailed knowledge associated, make manual defect review process an arduous task, in addition to adding sensitivity to human errors. Cases where defect information reported by inspection machine is not sufficient, mask shops rely on other tools. Use of CDSEM tools is one such option. However, these additional steps translate into increased costs. Calibre NxDAT based MDPAutoClassify tool provides an automated software alternative to the manual defect review process. Working on defect images generated by inspection machines, the tool extracts and reports additional information such as defect location, useful for defect avoidance[4][5]; defect size, useful in estimating defect printability; and, defect nature e.g. particle, scratch, resist void, etc., useful for process monitoring. The tool makes use of smart and elaborate post-processing algorithms to achieve this. Their elaborateness is a consequence of the variety and

  20. Multiclassifier fusion of an ultrasonic lip reader in automatic speech recognition

    Science.gov (United States)

    Jennnings, David L.

    1994-12-01

    This thesis investigates the use of two active ultrasonic devices in collecting lip information for performing and enhancing automatic speech recognition. The two devices explored are called the 'Ultrasonic Mike' and the 'Lip Lock Loop.' The devices are tested in a speaker dependent isolated word recognition task with a vocabulary consisting of the spoken digits from zero to nine. Two automatic lip readers are designed and tested based on the output of the ultrasonic devices. The automatic lip readers use template matching and dynamic time warping to determine the best candidate for a given test utterance. The automatic lip readers alone achieve accuracies of 65-89%, depending on the number of reference templates used. Next the automatic lip reader is combined with a conventional automatic speech recognizer. Both classifier level fusion and feature level fusion are investigated. Feature fusion is based on combining the feature vectors prior to dynamic time warping. Classifier fusion is based on a pseudo probability mass function derived from the dynamic time warping distances. The combined systems are tested with various levels of acoustic noise added. In one typical test, at a signal to noise ratio of 0dB, the acoustic recognizer's accuracy alone was 78%, the automatic lip reader's accuracy was 69%, but the combined accuracy was 93%. This experiment demonstrates that a simple ultrasonic lip motion detector, that has an output data rate 12,500 times less than a typical video camera, can significantly improve the accuracy of automatic speech recognition in noise.

  1. Automatic initial and final segmentation in cleft palate speech of Mandarin speakers

    Science.gov (United States)

    Liu, Yin; Yin, Heng; Zhang, Junpeng; Zhang, Jing; Zhang, Jiang

    2017-01-01

    The speech unit segmentation is an important pre-processing step in the analysis of cleft palate speech. In Mandarin, one syllable is composed of two parts: initial and final. In cleft palate speech, the resonance disorders occur at the finals and the voiced initials, while the articulation disorders occur at the unvoiced initials. Thus, the initials and finals are the minimum speech units, which could reflect the characteristics of cleft palate speech disorders. In this work, an automatic initial/final segmentation method is proposed. It is an important preprocessing step in cleft palate speech signal processing. The tested cleft palate speech utterances are collected from the Cleft Palate Speech Treatment Center in the Hospital of Stomatology, Sichuan University, which has the largest cleft palate patients in China. The cleft palate speech data includes 824 speech segments, and the control samples contain 228 speech segments. The syllables are extracted from the speech utterances firstly. The proposed syllable extraction method avoids the training stage, and achieves a good performance for both voiced and unvoiced speech. Then, the syllables are classified into with “quasi-unvoiced” or with “quasi-voiced” initials. Respective initial/final segmentation methods are proposed to these two types of syllables. Moreover, a two-step segmentation method is proposed. The rough locations of syllable and initial/final boundaries are refined in the second segmentation step, in order to improve the robustness of segmentation accuracy. The experiments show that the initial/final segmentation accuracies for syllables with quasi-unvoiced initials are higher than quasi-voiced initials. For the cleft palate speech, the mean time error is 4.4ms for syllables with quasi-unvoiced initials, and 25.7ms for syllables with quasi-voiced initials, and the correct segmentation accuracy P30 for all the syllables is 91.69%. For the control samples, P30 for all the syllables is 91

  2. Sensitivity Based Segmentation and Identification in Automatic Speech Recognition.

    Science.gov (United States)

    1984-03-30

    by a network constructed from phonemic, phonetic , and phonological rules. Regardless of the speech processing system used, Klatt 1 2 has described...analysis, and its use in the segmentation and identification of the phonetic units of speech, that was initiated during the 1982 Summer Faculty Research...practicable framework for incorporation of acoustic- phonetic variance as well as time and talker normalization. XOI iF- ? ’:: .:- .- . . l ] 2 D

  3. Fusing Eye-gaze and Speech Recognition for Tracking in an Automatic Reading Tutor

    DEFF Research Database (Denmark)

    Rasmussen, Morten Højfeldt; Tan, Zheng-Hua

    2013-01-01

    In this paper we present a novel approach for automatically tracking the reading progress using a combination of eye-gaze tracking and speech recognition. The two are fused by first generating word probabilities based on eye-gaze information and then using these probabilities to augment the langu......In this paper we present a novel approach for automatically tracking the reading progress using a combination of eye-gaze tracking and speech recognition. The two are fused by first generating word probabilities based on eye-gaze information and then using these probabilities to augment...

  4. A Review on Speech Corpus Development for Automatic Speech Recognition in Indian Languages

    OpenAIRE

    Cini kurian

    2015-01-01

    Corpus development gained much attention due to recent statistics based natural language processing. It has new applications in Language Technology, linguistic research, language education and information exchange. Corpus based Language research has an innovative outlook which will discard the aged linguistic theories. Speech corpus is the essential resources for building a speech recognizer. One of the main challenges faced by speech scientist is the unavailability of these resources. Very f...

  5. An open-set detection evaluation methodology for automatic emotion recognition in speech

    NARCIS (Netherlands)

    Truong, K.P.; Leeuwen, D.A. van

    2007-01-01

    In this paper, we present a detection approach and an ‘open-set’ detection evaluation methodology for automatic emotion recognition in speech. The traditional classification approach does not seem to be suitable and flexible enough for typical emotion recognition tasks. For example, classification

  6. Errors in Automatic Speech Recognition versus Difficulties in Second Language Listening

    Science.gov (United States)

    Mirzaei, Maryam Sadat; Meshgi, Kourosh; Akita, Yuya; Kawahara, Tatsuya

    2015-01-01

    Automatic Speech Recognition (ASR) technology has become a part of contemporary Computer-Assisted Language Learning (CALL) systems. ASR systems however are being criticized for their erroneous performance especially when utilized as a mean to develop skills in a Second Language (L2) where errors are not tolerated. Nevertheless, these errors can…

  7. Automatic Speech Recognition Technology as an Effective Means for Teaching Pronunciation

    Science.gov (United States)

    Elimat, Amal Khalil; AbuSeileek, Ali Farhan

    2014-01-01

    This study aimed to explore the effect of using automatic speech recognition technology (ASR) on the third grade EFL students' performance in pronunciation, whether teaching pronunciation through ASR is better than regular instruction, and the most effective teaching technique (individual work, pair work, or group work) in teaching pronunciation…

  8. An open-set detection evaluation methodology for automatic emotion recognition in speech

    NARCIS (Netherlands)

    Truong, K.P.; Leeuwen, D.A. van

    2007-01-01

    In this paper, we present a detection approach and an ‘open-set’ detection evaluation methodology for automatic emotion recognition in speech. The traditional classification approach does not seem to be suitable and flexible enough for typical emotion recognition tasks. For example, classification d

  9. AUTOMATIC SPEECH RECOGNITION SYSTEM CONCERNING THE MOROCCAN DIALECTE (Darija and Tamazight)

    OpenAIRE

    A. EL GHAZI; Daoui, C.; Idrissi, N

    2012-01-01

    In this work we present an automatic speech recognition system for Moroccan dialect mainly: Darija (Arab dialect) and Tamazight. Many approaches have been used to model the Arabic and Tamazightphonetic units. In this paper, we propose to use the hidden Markov model (HMM) for modeling these phoneticunits. Experimental results show that the proposed approach further improves the recognition.

  10. Sparse coding of the modulation spectrum for noise-robust automatic speech recognition

    NARCIS (Netherlands)

    Ahmadi, S.; Ahadi, S.M.; Cranen, B.; Boves, L.W.J.

    2014-01-01

    The full modulation spectrum is a high-dimensional representation of one-dimensional audio signals. Most previous research in automatic speech recognition converted this very rich representation into the equivalent of a sequence of short-time power spectra, mainly to simplify the computation of the

  11. Evaluating Automatic Speech Recognition-Based Language Learning Systems: A Case Study

    Science.gov (United States)

    van Doremalen, Joost; Boves, Lou; Colpaert, Jozef; Cucchiarini, Catia; Strik, Helmer

    2016-01-01

    The purpose of this research was to evaluate a prototype of an automatic speech recognition (ASR)-based language learning system that provides feedback on different aspects of speaking performance (pronunciation, morphology and syntax) to students of Dutch as a second language. We carried out usability reviews, expert reviews and user tests to…

  12. The Effect of Automatic Speech Recognition Eyespeak Software on Iraqi Students' English Pronunciation: A Pilot Study

    Science.gov (United States)

    Sidgi, Lina Fathi Sidig; Shaari, Ahmad Jelani

    2017-01-01

    The use of technology, such as computer-assisted language learning (CALL), is used in teaching and learning in the foreign language classrooms where it is most needed. One promising emerging technology that supports language learning is automatic speech recognition (ASR). Integrating such technology, especially in the instruction of pronunciation…

  13. The Usefulness of Automatic Speech Recognition (ASR) Eyespeak Software in Improving Iraqi EFL Students' Pronunciation

    Science.gov (United States)

    Sidgi, Lina Fathi Sidig; Shaari, Ahmad Jelani

    2017-01-01

    The present study focuses on determining whether automatic speech recognition (ASR) technology is reliable for improving English pronunciation to Iraqi EFL students. Non-native learners of English are generally concerned about improving their pronunciation skills, and Iraqi students face difficulties in pronouncing English sounds that are not…

  14. Evaluating Automatic Speech Recognition-Based Language Learning Systems: A Case Study

    Science.gov (United States)

    van Doremalen, Joost; Boves, Lou; Colpaert, Jozef; Cucchiarini, Catia; Strik, Helmer

    2016-01-01

    The purpose of this research was to evaluate a prototype of an automatic speech recognition (ASR)-based language learning system that provides feedback on different aspects of speaking performance (pronunciation, morphology and syntax) to students of Dutch as a second language. We carried out usability reviews, expert reviews and user tests to…

  15. Speech-based Annotation of Heterogeneous Multimedia Content Using Automatic Speech Recognition

    NARCIS (Netherlands)

    Huijbregts, M.A.H.; Ordelman, Roeland J.F.; de Jong, Franciska M.G.

    2007-01-01

    This paper reports on the setup and evaluation of robust speech recognition system parts, geared towards transcript generation for heterogeneous, real-life media collections. The system is deployed for generating speech transcripts for the NIST/TRECVID-2007 test collection, part of a Dutch real-life

  16. Cross-modal enhancement of the MMN to speech-sounds indicates early and automatic integration of letters and speech-sounds.

    Science.gov (United States)

    Froyen, Dries; Van Atteveldt, Nienke; Bonte, Milene; Blomert, Leo

    2008-01-01

    Recently brain imaging evidence indicated that letter/speech-sound integration, necessary for establishing fluent reading, takes place in auditory association areas and that the integration is influenced by stimulus onset asynchrony (SOA) between the letter and the speech-sound. In the present study, we used a specific ERP measure known for its automatic character, the mismatch negativity (MMN), to investigate the time course and automaticity of letter/speech-sound integration. We studied the effect of visual letters and SOA on the MMN elicited by a deviant speech-sound. We found a clear enhancement of the MMN by simultaneously presenting a letter, but without changing the auditory stimulation. This enhancement diminishes linearly with increasing SOA. These results suggest that letters and speech-sounds are processed as compound stimuli early and automatically in the auditory association cortex of fluent readers and that this processing is strongly dependent on timing.

  17. Preliminary Analysis of Automatic Speech Recognition and Synthesis Technology.

    Science.gov (United States)

    1983-05-01

    but also suprasegmental (prosodic) information, such as appropriate stress levels and intonation patterns, to improve their output. This is a definite...Society of America Paper, 1975. " Suprasegmentals ." Cambridge, Mass.: MIT Press, 1970. M4akhoul, J., Viswanathan, R., and Huggins, W. F. "A Mixed...the " suprasegmentals " or influences of duration, fundamental frequency, and speech product ion power upon basic phonemes. These ...- 102- influences

  18. Native language shapes automatic neural processing of speech.

    Science.gov (United States)

    Intartaglia, Bastien; White-Schwoch, Travis; Meunier, Christine; Roman, Stéphane; Kraus, Nina; Schön, Daniele

    2016-08-01

    The development of the phoneme inventory is driven by the acoustic-phonetic properties of one's native language. Neural representation of speech is known to be shaped by language experience, as indexed by cortical responses, and recent studies suggest that subcortical processing also exhibits this attunement to native language. However, most work to date has focused on the differences between tonal and non-tonal languages that use pitch variations to convey phonemic categories. The aim of this cross-language study is to determine whether subcortical encoding of speech sounds is sensitive to language experience by comparing native speakers of two non-tonal languages (French and English). We hypothesized that neural representations would be more robust and fine-grained for speech sounds that belong to the native phonemic inventory of the listener, and especially for the dimensions that are phonetically relevant to the listener such as high frequency components. We recorded neural responses of American English and French native speakers, listening to natural syllables of both languages. Results showed that, independently of the stimulus, American participants exhibited greater neural representation of the fundamental frequency compared to French participants, consistent with the importance of the fundamental frequency to convey stress patterns in English. Furthermore, participants showed more robust encoding and more precise spectral representations of the first formant when listening to the syllable of their native language as compared to non-native language. These results align with the hypothesis that language experience shapes sensory processing of speech and that this plasticity occurs as a function of what is meaningful to a listener.

  19. Post-error Correction in Automatic Speech Recognition Using Discourse Information

    Directory of Open Access Journals (Sweden)

    KANG, S.

    2014-05-01

    Full Text Available Overcoming speech recognition errors in the field of human�computer interaction is important in ensuring a consistent user experience. This paper proposes a semantic-oriented post-processing approach for the correction of errors in speech recognition. The novelty of the model proposed here is that it re-ranks the n-best hypothesis of speech recognition based on the user's intention, which is analyzed from previous discourse information, while conventional automatic speech recognition systems focus only on acoustic and language model scores for the current sentence. The proposed model successfully reduces the word error rate and semantic error rate by 3.65% and 8.61%, respectively.

  20. An Automatic Prolongation Detection Approach in Continuous Speech With Robustness Against Speaking Rate Variations

    Science.gov (United States)

    Esmaili, Iman; Dabanloo, Nader Jafarnia; Vali, Mansour

    2017-01-01

    In recent years, many methods have been introduced for supporting the diagnosis of stuttering for automatic detection of prolongation in the speech of people who stutter. However, less attention has been paid to treatment processes in which clients learn to speak more slowly. The aim of this study was to develop a method to help speech-language pathologists (SLPs) during diagnosis and treatment sessions. To this end, speech signals were initially parameterized to perceptual linear predictive (PLP) features. To detect the prolonged segments, the similarities between successive frames of speech signals were calculated based on correlation similarity measures. The segments were labeled as prolongation when the duration of highly similar successive frames exceeded a threshold specified by the speaking rate. The proposed method was evaluated by UCLASS and self-recorded Persian speech databases. The results were also compared with three high-performance studies in automatic prolongation detection. The best accuracies of prolongation detection were 99 and 97.1% for UCLASS and Persian databases, respectively. The proposed method also indicated promising robustness against artificial variation of speaking rate from 70 to 130% of normal speaking rate. PMID:28487827

  1. Automatic speech recognition (ASR) and its use as a tool for assessment or therapy of voice, speech, and language disorders.

    Science.gov (United States)

    Kitzing, Peter; Maier, Andreas; Ahlander, Viveka Lyberg

    2009-01-01

    In general opinion computerized automatic speech recognition (ASR) seems to be regarded as a method only to accomplish transcriptions from spoken language to written text and as such quite insecure and rather cumbersome. However, due to great advances in computer technology and informatics methodology ASR has nowadays become quite dependable and easier to handle, and the number of applications has increased considerably. After some introductory background information on ASR a number of applications of great interest for professionals in voice, speech, and language therapy are pointed out. In the foreseeable future, the keyboard and mouse will by means of ASR technology be replaced in many functions by a microphone as the human-computer interface, and the computer will talk back via its loud-speaker. It seems important that professionals engaged in the care of oral communication disorders take part in this development so their clients may get the optimal benefit from this new technology.

  2. Arabic Language Learning Assisted by Computer, based on Automatic Speech Recognition

    CERN Document Server

    Terbeh, Naim

    2012-01-01

    This work consists of creating a system of the Computer Assisted Language Learning (CALL) based on a system of Automatic Speech Recognition (ASR) for the Arabic language using the tool CMU Sphinx3 [1], based on the approach of HMM. To this work, we have constructed a corpus of six hours of speech recordings with a number of nine speakers. we find in the robustness to noise a grounds for the choice of the HMM approach [2]. the results achieved are encouraging since our corpus is made by only nine speakers, but they are always reasons that open the door for other improvement works.

  3. How Accurately Can the Google Web Speech API Recognize and Transcribe Japanese L2 English Learners' Oral Production?

    Science.gov (United States)

    Ashwell, Tim; Elam, Jesse R.

    2017-01-01

    The ultimate aim of our research project was to use the Google Web Speech API to automate scoring of elicited imitation (EI) tests. However, in order to achieve this goal, we had to take a number of preparatory steps. We needed to assess how accurate this speech recognition tool is in recognizing native speakers' production of the test items; we…

  4. Deformable meshes for medical image segmentation accurate automatic segmentation of anatomical structures

    CERN Document Server

    Kainmueller, Dagmar

    2014-01-01

    ? Segmentation of anatomical structures in medical image data is an essential task in clinical practice. Dagmar Kainmueller introduces methods for accurate fully automatic segmentation of anatomical structures in 3D medical image data. The author's core methodological contribution is a novel deformation model that overcomes limitations of state-of-the-art Deformable Surface approaches, hence allowing for accurate segmentation of tip- and ridge-shaped features of anatomical structures. As for practical contributions, she proposes application-specific segmentation pipelines for a range of anatom

  5. Development an Automatic Speech to Facial Animation Conversion for Improve Deaf Lives

    Directory of Open Access Journals (Sweden)

    S. Hamidreza Kasaei

    2011-05-01

    Full Text Available In this paper, we propose design and initial implementation of a robust system which can automatically translates voice into text and text to sign language animations. Sign Language
    Translation Systems could significantly improve deaf lives especially in communications, exchange of information and employment of machine for translation conversations from one language to another has. Therefore, considering these points, it seems necessary to study the speech recognition. Usually, the voice recognition algorithms address three major challenges. The first is extracting feature form speech and the second is when limited sound gallery are available for recognition, and the final challenge is to improve speaker dependent to speaker independent voice recognition. Extracting feature form speech is an important stage in our method. Different procedures are available for extracting feature form speech. One of the commonest of which used in speech
    recognition systems is Mel-Frequency Cepstral Coefficients (MFCCs. The algorithm starts with preprocessing and signal conditioning. Next extracting feature form speech using Cepstral coefficients will be done. Then the result of this process sends to segmentation part. Finally recognition part recognizes the words and then converting word recognized to facial animation. The project is still in progress and some new interesting methods are described in the current report.

  6. Can prosody aid the automatic classification of dialog acts in conversational speech?

    Science.gov (United States)

    Shriberg, E; Bates, R; Stolcke, A; Taylor, P; Jurafsky, D; Ries, K; Coccaro, N; Martin, R; Meteer, M; van Ess-Dykema, C

    1998-01-01

    Identifying whether an utterance is a statement, question, greeting, and so forth is integral to effective automatic understanding of natural dialog. Little is known, however, about how such dialog acts (DAs) can be automatically classified in truly natural conversation. This study asks whether current approaches, which use mainly word information, could be improved by adding prosodic information. The study is based on more than 1000 conversations from the Switchboard corpus. DAs were hand-annotated, and prosodic features (duration, pause, F0, energy, and speaking rate) were automatically extracted for each DA. In training, decision trees based on these features were inferred; trees were then applied to unseen test data to evaluate performance. Performance was evaluated for prosody models alone, and after combining the prosody models with word information--either from true words or from the output of an automatic speech recognizer. For an overall classification task, as well as three subtasks, prosody made significant contributions to classification. Feature-specific analyses further revealed that although canonical features (such as F0 for questions) were important, less obvious features could compensate if canonical features were removed. Finally, in each task, integrating the prosodic model with a DA-specific statistical language model improved performance over that of the language model alone, especially for the case of recognized words. Results suggest that DAs are redundantly marked in natural conversation, and that a variety of automatically extractable prosodic features could aid dialog processing in speech applications.

  7. [Repetitive phenomenona in the spontaneous speech of aphasic patients: perseveration, stereotypy, echolalia, automatism and recurring utterance].

    Science.gov (United States)

    Wallesch, C W; Brunner, R J; Seemüller, E

    1983-12-01

    Repetitive phenomena in spontaneous speech were investigated in 30 patients with chronic infarctions of the left hemisphere which included Broca's and/or Wernicke's area and/or the basal ganglia. Perseverations, stereotypies, and echolalias occurred with all types of brain lesions, automatisms and recurring utterances only with those patients, whose infarctions involved Wernicke's area and basal ganglia. These patients also showed more echolalic responses. The results are discussed in view of the role of the basal ganglia as motor program generators.

  8. Speech intelligibility enhancement through maxillary dental rehabilitation with telescopic prostheses and complete dentures: a prospective study using automatic, computer-based speech analysis.

    Science.gov (United States)

    Knipfer, Christian; Bocklet, Tobias; Noeth, Elmar; Schuster, Maria; Sokol, Biljana; Eitner, Stephan; Nkenke, Emeka; Stelzle, Florian

    2012-01-01

    A completely edentulous or partially edentulous maxilla involving missing anterior teeth may impact speech production and lead to reduced speech intelligibility. The aim of this study was to prospectively evaluate the effect of a dental prosthetic rehabilitation on speech intelligibility in patients with a toothless or interrupted maxillary arch by means of an automatic, standardized speech recognition system. The speech intelligibility of 45 patients with complete tooth loss or a loss including missing anterior teeth in the maxilla was evaluated by means of a polyphone-based automatic speech recognition system that assessed the percentage of correctly recognized words (word accuracy). To replace inadequate maxillary removable dentures, 20 patients from the overall sample had been rehabilitated with complete dentures and 25 patients with telescopic prostheses. Speech recordings were made in four recording sessions (with and without existing prostheses and then at 1 week and 6 months after placement of newly fabricated prostheses). Significantly higher speech intelligibility was observed in both patient groups compared to the original results without the dentures inserted. After 6 months of adaptation, both groups had reached a level of speech quality that was comparable to the healthy control group. However, patients receiving new telescopic prostheses showed significantly higher levels of speech intelligibility compared to those receiving new complete dentures. Within 6 months, speech intelligibility did not significantly improve from the level found 1 week after insertion of new prostheses for both groups. Patients benefit from the fabrication of new dentures in terms of speech intelligibility, regardless of the type of prosthesis. However, telescopic crown prostheses yield significantly better speech quality compared to complete dentures.

  9. Effect of speech-intrinsic variations on human and automatic recognition of spoken phonemes.

    Science.gov (United States)

    Meyer, Bernd T; Brand, Thomas; Kollmeier, Birger

    2011-01-01

    The aim of this study is to quantify the gap between the recognition performance of human listeners and an automatic speech recognition (ASR) system with special focus on intrinsic variations of speech, such as speaking rate and effort, altered pitch, and the presence of dialect and accent. Second, it is investigated if the most common ASR features contain all information required to recognize speech in noisy environments by using resynthesized ASR features in listening experiments. For the phoneme recognition task, the ASR system achieved the human performance level only when the signal-to-noise ratio (SNR) was increased by 15 dB, which is an estimate for the human-machine gap in terms of the SNR. The major part of this gap is attributed to the feature extraction stage, since human listeners achieve comparable recognition scores when the SNR difference between unaltered and resynthesized utterances is 10 dB. Intrinsic variabilities result in strong increases of error rates, both in human speech recognition (HSR) and ASR (with a relative increase of up to 120%). An analysis of phoneme duration and recognition rates indicates that human listeners are better able to identify temporal cues than the machine at low SNRs, which suggests incorporating information about the temporal dynamics of speech into ASR systems.

  10. Automatic evaluation of speech rhythm instability and acceleration in dysarthrias associated with basal ganglia dysfunction

    Directory of Open Access Journals (Sweden)

    Jan eRusz

    2015-07-01

    Full Text Available Speech rhythm abnormalities are commonly present in patients with different neurodegenerative disorders. These alterations are hypothesized to be a consequence of disruption to the basal ganglia circuitry involving dysfunction of motor planning, programming and execution, which can be detected by a syllable repetition paradigm. Therefore, the aim of the present study was to design a robust signal processing technique that allows the automatic detection of spectrally-distinctive nuclei of syllable vocalizations and to determine speech features that represent rhythm instability and acceleration. A further aim was to elucidate specific patterns of dysrhythmia across various neurodegenerative disorders that share disruption of basal ganglia function. Speech samples based on repetition of the syllable /pa/ at a self-determined steady pace were acquired from 109 subjects, including 22 with Parkinson's disease (PD, 11 progressive supranuclear palsy (PSP, 9 multiple system atrophy (MSA, 24 ephedrone-induced parkinsonism (EP, 20 Huntington's disease (HD, and 23 healthy controls. Subsequently, an algorithm for the automatic detection of syllables as well as features representing rhythm instability and rhythm acceleration were designed. The proposed detection algorithm was able to correctly identify syllables and remove erroneous detections due to excessive inspiration and nonspeech sounds with a very high accuracy of 99.6%. Instability of vocal pace performance was observed in PSP, MSA, EP and HD groups. Significantly increased pace acceleration was observed only in the PD group. Although not significant, a tendency for pace acceleration was observed also in the PSP and MSA groups. Our findings underline the crucial role of the basal ganglia in the execution and maintenance of automatic speech motor sequences. We envisage the current approach to become the first step towards the development of acoustic technologies allowing automated assessment of rhythm

  11. An exploration of the potential of Automatic Speech Recognition to assist and enable receptive communication in higher education

    Directory of Open Access Journals (Sweden)

    Mike Wald

    2006-12-01

    Full Text Available The potential use of Automatic Speech Recognition to assist receptive communication is explored. The opportunities and challenges that this technology presents students and staff to provide captioning of speech online or in classrooms for deaf or hard of hearing students and assist blind, visually impaired or dyslexic learners to read and search learning material more readily by augmenting synthetic speech with natural recorded real speech is also discussed and evaluated. The automatic provision of online lecture notes, synchronised with speech, enables staff and students to focus on learning and teaching issues, while also benefiting learners unable to attend the lecture or who find it difficult or impossible to take notes at the same time as listening, watching and thinking.

  12. AUTOMATIC SPEECH RECOGNITION – THE MAIN STAGES OVER LAST 50 YEARS

    Directory of Open Access Journals (Sweden)

    I. B. Tampel

    2015-11-01

    Full Text Available The main stages of automatic speech recognition systems over last 50 years are regarded. The attempt is made to evaluate different methods in the context of approaching to functioning of biological systems. The method implementation based on dynamic programming algorithm and done in 1968 is considered as a benchmark. Shortcomings of the method, which make it possible to use it only for command recognition, are considered. The next method considered is based on a formalism of Markov chains. Based on the notion of coarticulation the necessity of applying context dependent triphones and biphones instead of context independent phonemes is shown. The problems of insufficiency of speech databases for triphone training which lead to state tying methods are explained. The importance of model adaptation and feature normalization methods providing better invariance to speakers, communication channels and additive noise are shown. Deep Neural Networks and Recurrent Networks are considered as the most up-to-date methods. The similarity of deep (multilayer neural networks and biological systems is noted. In conclusion, the problems and drawbacks of the modern systems of automatic speech recognition are described and prognosis of their development is given.

  13. Automatic transcription of continuous speech into syllable-like units for Indian languages

    Indian Academy of Sciences (India)

    G Lakshmi Sarada; A Lakshmi; Hema A Murthy; T Nagarajan

    2009-04-01

    The focus of this paper is to automatically segment and label continuous speech signal into syllable-like units for Indian languages. In this approach, the continuous speech signal is first automatically segmented into syllable-like units using group delay based algorithm. Similar syllable segments are then grouped together using an unsupervised and incremental training (UIT) technique. Isolated style HMM models are generated for each of the clusters during training. During testing, the speech signal is segmented into syllable-like units which are then tested against the HMMs obtained during training. This results in a syllable recognition performance of 42·6% and 39·94% for Tamil and Telugu. A new feature extraction technique that uses features extracted from multiple frame sizes and frame rates during both training and testing is explored for the syllable recognition task. This results in a recognition performance of 48·7% and 45·36%, for Tamil and Telugu respectively. The performance of segmentation followed by labelling is superior to that of a flat start syllable recogniser (27·8% and 28·8% for Tamil and Telugu respectively).

  14. A HYBRID METHOD FOR AUTOMATIC SPEECH RECOGNITION PERFORMANCE IMPROVEMENT IN REAL WORLD NOISY ENVIRONMENT

    Directory of Open Access Journals (Sweden)

    Urmila Shrawankar

    2013-01-01

    Full Text Available It is a well known fact that, speech recognition systems perform well when the system is used in conditions similar to the one used to train the acoustic models. However, mismatches degrade the performance. In adverse environment, it is very difficult to predict the category of noise in advance in case of real world environmental noise and difficult to achieve environmental robustness. After doing rigorous experimental study it is observed that, a unique method is not available that will clean the noisy speech as well as preserve the quality which have been corrupted by real natural environmental (mixed noise. It is also observed that only back-end techniques are not sufficient to improve the performance of a speech recognition system. It is necessary to implement performance improvement techniques at every step of back-end as well as front-end of the Automatic Speech Recognition (ASR model. Current recognition systems solve this problem using a technique called adaptation. This study presents an experimental study that aims two points, first is to implement the hybrid method that will take care of clarifying the speech signal as much as possible with all combinations of filters and enhancement techniques. The second point is to develop a method for training all categories of noise that can adapt the acoustic models for a new environment that will help to improve the performance of the speech recognizer under real world environmental mismatched conditions. This experiment confirms that hybrid adaptation methods improve the ASR performance on both levels, (Signal-to-Noise Ratio SNR improvement as well as word recognition accuracy in real world noisy environment.

  15. Automatic Speech Segmentation Based On Audio and Optical Flow Visual Classification

    Directory of Open Access Journals (Sweden)

    Behnam Torabi

    2014-10-01

    Full Text Available Automatic speech segmentation as an important part of speech recognition system (ASR is highly noise dependent. Noise is made by changes in the communication channel, background, level of speaking etc. In recent years, many researchers have proposed noise cancelation techniques and have added visual features from speaker’s face to reduce the effect of noise on ASR systems. Removing noise from audio signals depends on the type of the noise; so it cannot be used as a general solution. Adding visual features improve this lack of efficiency, but advanced methods of this type need manual extraction of visual features. In this paper we propose a completely automatic system which uses optical flow vectors from speaker’s image sequence to obtain visual features. Then, Hidden Markov Models are trained to segment audio signals from image sequences and audio features based on extracted optical flow. The developed segmentation system based on such method acts totally automatic and become more robust to noise.

  16. The role of binary mask patterns in automatic speech recognition in background noise.

    Science.gov (United States)

    Narayanan, Arun; Wang, DeLiang

    2013-05-01

    Processing noisy signals using the ideal binary mask improves automatic speech recognition (ASR) performance. This paper presents the first study that investigates the role of binary mask patterns in ASR under various noises, signal-to-noise ratios (SNRs), and vocabulary sizes. Binary masks are computed either by comparing the SNR within a time-frequency unit of a mixture signal with a local criterion (LC), or by comparing the local target energy with the long-term average spectral energy of speech. ASR results show that (1) akin to human speech recognition, binary masking significantly improves ASR performance even when the SNR is as low as -60 dB; (2) the ASR performance profiles are qualitatively similar to those obtained in human intelligibility experiments; (3) the difference between the LC and mixture SNR is more correlated to the recognition accuracy than LC; (4) LC at which the performance peaks is lower than 0 dB, which is the threshold that maximizes the SNR gain of processed signals. This broad agreement with human performance is rather surprising. The results also indicate that maximizing the SNR gain is probably not an appropriate goal for improving either human or machine recognition of noisy speech.

  17. An FFT-Based Companding Front End for Noise-Robust Automatic Speech Recognition

    Directory of Open Access Journals (Sweden)

    Turicchia Lorenzo

    2007-01-01

    Full Text Available We describe an FFT-based companding algorithm for preprocessing speech before recognition. The algorithm mimics tone-to-tone suppression and masking in the auditory system to improve automatic speech recognition performance in noise. Moreover, it is also very computationally efficient and suited to digital implementations due to its use of the FFT. In an automotive digits recognition task with the CU-Move database recorded in real environmental noise, the algorithm improves the relative word error by 12.5% at -5 dB signal-to-noise ratio (SNR and by 6.2% across all SNRs (-5 dB SNR to +5 dB SNR. In the Aurora-2 database recorded with artificially added noise in several environments, the algorithm improves the relative word error rate in almost all situations.

  18. An FFT-Based Companding Front End for Noise-Robust Automatic Speech Recognition

    Directory of Open Access Journals (Sweden)

    Bhiksha Raj

    2007-06-01

    Full Text Available We describe an FFT-based companding algorithm for preprocessing speech before recognition. The algorithm mimics tone-to-tone suppression and masking in the auditory system to improve automatic speech recognition performance in noise. Moreover, it is also very computationally efficient and suited to digital implementations due to its use of the FFT. In an automotive digits recognition task with the CU-Move database recorded in real environmental noise, the algorithm improves the relative word error by 12.5% at −5 dB signal-to-noise ratio (SNR and by 6.2% across all SNRs (−5 dB SNR to +15 dB SNR. In the Aurora-2 database recorded with artificially added noise in several environments, the algorithm improves the relative word error rate in almost all situations.

  19. Language modeling for automatic speech recognition of inflective languages an applications-oriented approach using lexical data

    CERN Document Server

    Donaj, Gregor

    2017-01-01

    This book covers language modeling and automatic speech recognition for inflective languages (e.g. Slavic languages), which represent roughly half of the languages spoken in Europe. These languages do not perform as well as English in speech recognition systems and it is therefore harder to develop an application with sufficient quality for the end user. The authors describe the most important language features for the development of a speech recognition system. This is then presented through the analysis of errors in the system and the development of language models and their inclusion in speech recognition systems, which specifically address the errors that are relevant for targeted applications. The error analysis is done with regard to morphological characteristics of the word in the recognized sentences. The book is oriented towards speech recognition with large vocabularies and continuous and even spontaneous speech. Today such applications work with a rather small number of languages compared to the nu...

  20. Peripheral Nonlinear Time Spectrum Features Algorithm for Large Vocabulary Mandarin Automatic Speech Recognition

    Institute of Scientific and Technical Information of China (English)

    Fadhil H. T. Al-dulaimy; WANG Zuoying

    2005-01-01

    This work describes an improved feature extractor algorithm to extract the peripheral features of point x(ti,fj) using a nonlinear algorithm to compute the nonlinear time spectrum (NL-TS) pattern. The algorithm observes n×n neighborhoods of the point in all directions, and then incorporates the peripheral features using the Mel frequency cepstrum components (MFCCs)-based feature extractor of the Tsinghua electronic engineering speech processing (THEESP) for Mandarin automatic speech recognition (MASR) system as replacements of the dynamic features with different feature combinations. In this algorithm, the orthogonal bases are extracted directly from the speech data using discrite cosime transformation (DCT) with 3×3 blocks on an NL-TS pattern as the peripheral features. The new primal bases are then selected and simplified in the form of the operator in the time direction and the operator in the frequency direction. The algorithm has 23.29% improvements of the relative error rate in comparison with the standard MFCC feature-set and the dynamic features in tests using THEESP with the duration distribution-based hidden Markov model (DDBHMM) based on MASR system.

  1. An HMM-Like Dynamic Time Warping Scheme for Automatic Speech Recognition

    Directory of Open Access Journals (Sweden)

    Ing-Jr Ding

    2014-01-01

    Full Text Available In the past, the kernel of automatic speech recognition (ASR is dynamic time warping (DTW, which is feature-based template matching and belongs to the category technique of dynamic programming (DP. Although DTW is an early developed ASR technique, DTW has been popular in lots of applications. DTW is playing an important role for the known Kinect-based gesture recognition application now. This paper proposed an intelligent speech recognition system using an improved DTW approach for multimedia and home automation services. The improved DTW presented in this work, called HMM-like DTW, is essentially a hidden Markov model- (HMM- like method where the concept of the typical HMM statistical model is brought into the design of DTW. The developed HMM-like DTW method, transforming feature-based DTW recognition into model-based DTW recognition, will be able to behave as the HMM recognition technique and therefore proposed HMM-like DTW with the HMM-like recognition model will have the capability to further perform model adaptation (also known as speaker adaptation. A series of experimental results in home automation-based multimedia access service environments demonstrated the superiority and effectiveness of the developed smart speech recognition system by HMM-like DTW.

  2. Speaker diarization and speech recognition in the semi-automatization of audio description: An exploratory study on future possibilities?

    Directory of Open Access Journals (Sweden)

    Héctor Delgado

    2015-12-01

    Full Text Available This article presents an overview of the technological components used in the process of audio description, and suggests a new scenario in which speech recognition, machine translation, and text-to-speech, with the corresponding human revision, could be used to increase audio description provision. The article focuses on a process in which both speaker diarization and speech recognition are used in order to obtain a semi-automatic transcription of the audio description track. The technical process is presented and experimental results are summarized.

  3. Automatic lung segmentation in CT images with accurate handling of the hilar region.

    Science.gov (United States)

    De Nunzio, Giorgio; Tommasi, Eleonora; Agrusti, Antonella; Cataldo, Rosella; De Mitri, Ivan; Favetta, Marco; Maglio, Silvio; Massafra, Andrea; Quarta, Maurizio; Torsello, Massimo; Zecca, Ilaria; Bellotti, Roberto; Tangaro, Sabina; Calvini, Piero; Camarlinghi, Niccolò; Falaschi, Fabio; Cerello, Piergiorgio; Oliva, Piernicola

    2011-02-01

    A fully automated and three-dimensional (3D) segmentation method for the identification of the pulmonary parenchyma in thorax X-ray computed tomography (CT) datasets is proposed. It is meant to be used as pre-processing step in the computer-assisted detection (CAD) system for malignant lung nodule detection that is being developed by the Medical Applications in a Grid Infrastructure Connection (MAGIC-5) Project. In this new approach the segmentation of the external airways (trachea and bronchi), is obtained by 3D region growing with wavefront simulation and suitable stop conditions, thus allowing an accurate handling of the hilar region, notoriously difficult to be segmented. Particular attention was also devoted to checking and solving the problem of the apparent 'fusion' between the lungs, caused by partial-volume effects, while 3D morphology operations ensure the accurate inclusion of all the nodules (internal, pleural, and vascular) in the segmented volume. The new algorithm was initially developed and tested on a dataset of 130 CT scans from the Italung-CT trial, and was then applied to the ANODE09-competition images (55 scans) and to the LIDC database (84 scans), giving very satisfactory results. In particular, the lung contour was adequately located in 96% of the CT scans, with incorrect segmentation of the external airways in the remaining cases. Segmentation metrics were calculated that quantitatively express the consistency between automatic and manual segmentations: the mean overlap degree of the segmentation masks is 0.96 ± 0.02, and the mean and the maximum distance between the mask borders (averaged on the whole dataset) are 0.74 ± 0.05 and 4.5 ± 1.5, respectively, which confirms that the automatic segmentations quite correctly reproduce the borders traced by the radiologist. Moreover, no tissue containing internal and pleural nodules was removed in the segmentation process, so that this method proved to be fit for the use in the

  4. Suprasegmental speech cues are automatically processed by the human brain: a mismatch negativity study.

    Science.gov (United States)

    Honbolygó, Ferenc; Csépe, Valéria; Ragó, Anett

    2004-06-03

    This study investigates the electrical brain activity correlates of the automatic detection of suprasegmental and local speech cues by using a passive oddball paradigm, in which the standard Hungarian word 'banán' ('banana' in English) was contrasted with two deviants: a voiceless phoneme deviant ('panán'), and a stress deviant, where the stress was on the second syllable, instead of the obligatory first one. As a result, we obtained the mismatch negativity component (MMN) of event-related brain potentials in each condition. The stress deviant elicited two MMNs: one as a response to the lack of stress as compared to the standard stimulus, and another to the additional stress. Our results support that the MMN is as valuable in investigating processing characteristics of suprasegmental features as in that of phonemic features. MMN data may provide further insight into pre-attentive processes contributing to spoken word recognition.

  5. Long term Suboxone™ emotional reactivity as measured by automatic detection in speech.

    Science.gov (United States)

    Hill, Edward; Han, David; Dumouchel, Pierre; Dehak, Najim; Quatieri, Thomas; Moehs, Charles; Oscar-Berman, Marlene; Giordano, John; Simpatico, Thomas; Barh, Debmalya; Blum, Kenneth

    2013-01-01

    Addictions to illicit drugs are among the nation's most critical public health and societal problems. The current opioid prescription epidemic and the need for buprenorphine/naloxone (Suboxone®; SUBX) as an opioid maintenance substance, and its growing street diversion provided impetus to determine affective states ("true ground emotionality") in long-term SUBX patients. Toward the goal of effective monitoring, we utilized emotion-detection in speech as a measure of "true" emotionality in 36 SUBX patients compared to 44 individuals from the general population (GP) and 33 members of Alcoholics Anonymous (AA). Other less objective studies have investigated emotional reactivity of heroin, methadone and opioid abstinent patients. These studies indicate that current opioid users have abnormal emotional experience, characterized by heightened response to unpleasant stimuli and blunted response to pleasant stimuli. However, this is the first study to our knowledge to evaluate "true ground" emotionality in long-term buprenorphine/naloxone combination (Suboxone™). We found in long-term SUBX patients a significantly flat affect (p<0.01), and they had less self-awareness of being happy, sad, and anxious compared to both the GP and AA groups. We caution definitive interpretation of these seemingly important results until we compare the emotional reactivity of an opioid abstinent control using automatic detection in speech. These findings encourage continued research strategies in SUBX patients to target the specific brain regions responsible for relapse prevention of opioid addiction.

  6. Exploiting independent filter bandwidth of human factor cepstral coefficients in automatic speech recognition

    Science.gov (United States)

    Skowronski, Mark D.; Harris, John G.

    2004-09-01

    Mel frequency cepstral coefficients (MFCC) are the most widely used speech features in automatic speech recognition systems, primarily because the coefficients fit well with the assumptions used in hidden Markov models and because of the superior noise robustness of MFCC over alternative feature sets such as linear prediction-based coefficients. The authors have recently introduced human factor cepstral coefficients (HFCC), a modification of MFCC that uses the known relationship between center frequency and critical bandwidth from human psychoacoustics to decouple filter bandwidth from filter spacing. In this work, the authors introduce a variation of HFCC called HFCC-E in which filter bandwidth is linearly scaled in order to investigate the effects of wider filter bandwidth on noise robustness. Experimental results show an increase in signal-to-noise ratio of 7 dB over traditional MFCC algorithms when filter bandwidth increases in HFCC-E. An important attribute of both HFCC and HFCC-E is that the algorithms only differ from MFCC in the filter bank coefficients: increased noise robustness using wider filters is achieved with no additional computational cost.

  7. Towards an Intelligent Acoustic Front End for Automatic Speech Recognition: Built-in Speaker Normalization

    Directory of Open Access Journals (Sweden)

    Umit H. Yapanel

    2008-08-01

    Full Text Available A proven method for achieving effective automatic speech recognition (ASR due to speaker differences is to perform acoustic feature speaker normalization. More effective speaker normalization methods are needed which require limited computing resources for real-time performance. The most popular speaker normalization technique is vocal-tract length normalization (VTLN, despite the fact that it is computationally expensive. In this study, we propose a novel online VTLN algorithm entitled built-in speaker normalization (BISN, where normalization is performed on-the-fly within a newly proposed PMVDR acoustic front end. The novel algorithm aspect is that in conventional frontend processing with PMVDR and VTLN, two separating warping phases are needed; while in the proposed BISN method only one single speaker dependent warp is used to achieve both the PMVDR perceptual warp and VTLN warp simultaneously. This improved integration unifies the nonlinear warping performed in the front end and reduces simultaneously. This improved integration unifies the nonlinear warping performed in the front end and reduces computational requirements, thereby offering advantages for real-time ASR systems. Evaluations are performed for (i an in-car extended digit recognition task, where an on-the-fly BISN implementation reduces the relative word error rate (WER by 24%, and (ii for a diverse noisy speech task (SPINE 2, where the relative WER improvement was 9%, both relative to the baseline speaker normalization method.

  8. Towards an Intelligent Acoustic Front End for Automatic Speech Recognition: Built-in Speaker Normalization

    Directory of Open Access Journals (Sweden)

    Yapanel UmitH

    2008-01-01

    Full Text Available A proven method for achieving effective automatic speech recognition (ASR due to speaker differences is to perform acoustic feature speaker normalization. More effective speaker normalization methods are needed which require limited computing resources for real-time performance. The most popular speaker normalization technique is vocal-tract length normalization (VTLN, despite the fact that it is computationally expensive. In this study, we propose a novel online VTLN algorithm entitled built-in speaker normalization (BISN, where normalization is performed on-the-fly within a newly proposed PMVDR acoustic front end. The novel algorithm aspect is that in conventional frontend processing with PMVDR and VTLN, two separating warping phases are needed; while in the proposed BISN method only one single speaker dependent warp is used to achieve both the PMVDR perceptual warp and VTLN warp simultaneously. This improved integration unifies the nonlinear warping performed in the front end and reduces simultaneously. This improved integration unifies the nonlinear warping performed in the front end and reduces computational requirements, thereby offering advantages for real-time ASR systems. Evaluations are performed for (i an in-car extended digit recognition task, where an on-the-fly BISN implementation reduces the relative word error rate (WER by 24%, and (ii for a diverse noisy speech task (SPINE 2, where the relative WER improvement was 9%, both relative to the baseline speaker normalization method.

  9. Long term Suboxone™ emotional reactivity as measured by automatic detection in speech.

    Directory of Open Access Journals (Sweden)

    Edward Hill

    Full Text Available Addictions to illicit drugs are among the nation's most critical public health and societal problems. The current opioid prescription epidemic and the need for buprenorphine/naloxone (Suboxone®; SUBX as an opioid maintenance substance, and its growing street diversion provided impetus to determine affective states ("true ground emotionality" in long-term SUBX patients. Toward the goal of effective monitoring, we utilized emotion-detection in speech as a measure of "true" emotionality in 36 SUBX patients compared to 44 individuals from the general population (GP and 33 members of Alcoholics Anonymous (AA. Other less objective studies have investigated emotional reactivity of heroin, methadone and opioid abstinent patients. These studies indicate that current opioid users have abnormal emotional experience, characterized by heightened response to unpleasant stimuli and blunted response to pleasant stimuli. However, this is the first study to our knowledge to evaluate "true ground" emotionality in long-term buprenorphine/naloxone combination (Suboxone™. We found in long-term SUBX patients a significantly flat affect (p<0.01, and they had less self-awareness of being happy, sad, and anxious compared to both the GP and AA groups. We caution definitive interpretation of these seemingly important results until we compare the emotional reactivity of an opioid abstinent control using automatic detection in speech. These findings encourage continued research strategies in SUBX patients to target the specific brain regions responsible for relapse prevention of opioid addiction.

  10. Automatically high accurate and efficient photomask defects management solution for advanced lithography manufacture

    Science.gov (United States)

    Zhu, Jun; Chen, Lijun; Ma, Lantao; Li, Dejian; Jiang, Wei; Pan, Lihong; Shen, Huiting; Jia, Hongmin; Hsiang, Chingyun; Cheng, Guojie; Ling, Li; Chen, Shijie; Wang, Jun; Liao, Wenkui; Zhang, Gary

    2014-04-01

    Defect review is a time consuming job. Human error makes result inconsistent. The defects located on don't care area would not hurt the yield and no need to review them such as defects on dark area. However, critical area defects can impact yield dramatically and need more attention to review them such as defects on clear area. With decrease in integrated circuit dimensions, mask defects are always thousands detected during inspection even more. Traditional manual or simple classification approaches are unable to meet efficient and accuracy requirement. This paper focuses on automatic defect management and classification solution using image output of Lasertec inspection equipment and Anchor pattern centric image process technology. The number of mask defect found during an inspection is always in the range of thousands or even more. This system can handle large number defects with quick and accurate defect classification result. Our experiment includes Die to Die and Single Die modes. The classification accuracy can reach 87.4% and 93.3%. No critical or printable defects are missing in our test cases. The missing classification defects are 0.25% and 0.24% in Die to Die mode and Single Die mode. This kind of missing rate is encouraging and acceptable to apply on production line. The result can be output and reloaded back to inspection machine to have further review. This step helps users to validate some unsure defects with clear and magnification images when captured images can't provide enough information to make judgment. This system effectively reduces expensive inline defect review time. As a fully inline automated defect management solution, the system could be compatible with current inspection approach and integrated with optical simulation even scoring function and guide wafer level defect inspection.

  11. Fitmunk: improving protein structures by accurate, automatic modeling of side-chain conformations.

    Science.gov (United States)

    Porebski, Przemyslaw Jerzy; Cymborowski, Marcin; Pasenkiewicz-Gierula, Marta; Minor, Wladek

    2016-02-01

    Improvements in crystallographic hardware and software have allowed automated structure-solution pipelines to approach a near-`one-click' experience for the initial determination of macromolecular structures. However, in many cases the resulting initial model requires a laborious, iterative process of refinement and validation. A new method has been developed for the automatic modeling of side-chain conformations that takes advantage of rotamer-prediction methods in a crystallographic context. The algorithm, which is based on deterministic dead-end elimination (DEE) theory, uses new dense conformer libraries and a hybrid energy function derived from experimental data and prior information about rotamer frequencies to find the optimal conformation of each side chain. In contrast to existing methods, which incorporate the electron-density term into protein-modeling frameworks, the proposed algorithm is designed to take advantage of the highly discriminatory nature of electron-density maps. This method has been implemented in the program Fitmunk, which uses extensive conformational sampling. This improves the accuracy of the modeling and makes it a versatile tool for crystallographic model building, refinement and validation. Fitmunk was extensively tested on over 115 new structures, as well as a subset of 1100 structures from the PDB. It is demonstrated that the ability of Fitmunk to model more than 95% of side chains accurately is beneficial for improving the quality of crystallographic protein models, especially at medium and low resolutions. Fitmunk can be used for model validation of existing structures and as a tool to assess whether side chains are modeled optimally or could be better fitted into electron density. Fitmunk is available as a web service at http://kniahini.med.virginia.edu/fitmunk/server/ or at http://fitmunk.bitbucket.org/.

  12. An automatic and accurate method of full heart segmentation from CT image based on linear gradient model

    Science.gov (United States)

    Yang, Zili

    2017-07-01

    Heart segmentation is an important auxiliary method in the diagnosis of many heart diseases, such as coronary heart disease and atrial fibrillation, and in the planning of tumor radiotherapy. Most of the existing methods for full heart segmentation treat the heart as a whole part and cannot accurately extract the bottom of the heart. In this paper, we propose a new method based on linear gradient model to segment the whole heart from the CT images automatically and accurately. Twelve cases were tested in order to test this method and accurate segmentation results were achieved and identified by clinical experts. The results can provide reliable clinical support.

  13. Segmenting into Adequate Units for Automatic Recognition of Emotion-Related Episodes: A Speech-Based Approach

    Directory of Open Access Journals (Sweden)

    Anton Batliner

    2010-01-01

    Full Text Available We deal with the topic of segmenting emotion-related (emotional/affective episodes into adequate units for analysis and automatic processing/classification—a topic that has not been addressed adequately so far. We concentrate on speech and illustrate promising approaches by using a database with children's emotional speech. We argue in favour of the word as basic unit and map sequences of words on both syntactic and ‘‘emotionally consistent” chunks and report classification performances for an exhaustive modelling of our data by mapping word-based paralinguistic emotion labels onto three classes representing valence (positive, neutral, negative, and onto a fourth rest (garbage class.

  14. The Effect of Automatic Speech Recognition EyeSpeak Software on Iraqi Students’ English Pronunciation: A Pilot Study

    OpenAIRE

    Lina Fathi Sidig Sidgi; Ahmad Jelani Shaari

    2017-01-01

    The use of technology, such as computer-assisted language learning (CALL), is used in teaching and learning in the foreign language classrooms where it is most needed. One promising emerging technology that supports language learning is automatic speech recognition (ASR). Integrating such technology, especially in the instruction of pronunciation in the classroom, is important in helping students to achieve correct pronunciation. In Iraq, English is a foreign language, and it is not surprisin...

  15. Robust Automatic Speech Recognition Features using Complex Wavelet Packet Transform Coefficients

    Directory of Open Access Journals (Sweden)

    Tjong Wan Sen

    2013-09-01

    Full Text Available To improve the performance of phoneme based Automatic Speech Recognition (ASR in noisy environment; we developed a new technique that could add robustness to clean phonemes features. These robust features are obtained from Complex Wavelet Packet Transform (CWPT coefficients. Since the CWPT coefficients represent all different frequency bands of the input signal, decomposing the input signal into complete CWPT tree would also cover all frequencies involved in recognition process. For time overlapping signals with different frequency contents, e. g. phoneme signal with noises, its CWPT coefficients are the combination of CWPT coefficients of phoneme signal and CWPT coefficients of noises. The CWPT coefficients of phonemes signal would be changed according to frequency components contained in noises. Since the numbers of phonemes in every language are relatively small (limited and already well known, one could easily derive principal component vectors from clean training dataset using Principal Component Analysis (PCA. These principal component vectors could be used then to add robustness and minimize noises effects in testing phase. Simulation results, using Alpha Numeric 4 (AN4 from Carnegie Mellon University and NOISEX-92 examples from Rice University, showed that this new technique could be used as features extractor that improves the robustness of phoneme based ASR systems in various adverse noisy conditions and still preserves the performance in clean environments.

  16. Robust Automatic Speech Recognition Features using Complex Wavelet Packet Transform Coefficients

    Directory of Open Access Journals (Sweden)

    TjongWan Sen

    2009-11-01

    Full Text Available To improve the performance of phoneme based Automatic Speech Recognition (ASR in noisy environment; we developed a new technique that could add robustness to clean phonemes features. These robust features are obtained from Complex Wavelet Packet Transform (CWPT coefficients. Since the CWPT coefficients represent all different frequency bands of the input signal, decomposing the input signal into complete CWPT tree would also cover all frequencies involved in recognition process. For time overlapping signals with different frequency contents, e. g. phoneme signal with noises, its CWPT coefficients are the combination of CWPT coefficients of phoneme signal and CWPT coefficients of noises. The CWPT coefficients of phonemes signal would be changed according to frequency components contained in noises. Since the numbers of phonemes in every language are relatively small (limited and already well known, one could easily derive principal component vectors from clean training dataset using Principal Component Analysis (PCA. These principal component vectors could be used then to add robustness and minimize noises effects in testing phase. Simulation results, using Alpha Numeric 4 (AN4 from Carnegie Mellon University and NOISEX-92 examples from Rice University, showed that this new technique could be used as features extractor that improves the robustness of phoneme based ASR systems in various adverse noisy conditions and still preserves the performance in clean environments.

  17. The Usefulness of Automatic Speech Recognition (ASR Eyespeak Software in Improving Iraqi EFL Students’ Pronunciation

    Directory of Open Access Journals (Sweden)

    Lina Fathi Sidig Sidgi

    2017-02-01

    Full Text Available The present study focuses on determining whether automatic speech recognition (ASR technology is reliable for improving English pronunciation to Iraqi EFL students. Non-native learners of English are generally concerned about improving their pronunciation skills, and Iraqi students face difficulties in pronouncing English sounds that are not found in their native language (Arabic. This study is concerned with ASR and its effectiveness in overcoming this difficulty. The data were obtained from twenty participants randomly selected from first-year college students at Al-Turath University College from the Department of English in Baghdad-Iraq. The students had participated in a two month pronunciation instruction course using ASR Eyespeak software. At the end of the pronunciation instruction course using ASR Eyespeak software, the students completed a questionnaire to get their opinions about the usefulness of the ASR Eyespeak in improving their pronunciation. The findings of the study revealed that the students found ASR Eyespeak software very useful in improving their pronunciation and helping them realise their pronunciation mistakes. They also reported that learning pronunciation with ASR Eyespeak enjoyable.

  18. Automatic speech recognizer based on the Spanish spoken in Valdivia, Chile

    Science.gov (United States)

    Sanchez, Maria L.; Poblete, Victor H.; Sommerhoff, Jorge

    2004-05-01

    The performance of an automatic speech recognizer is affected by training process (dependent on or independent of the speaker) and the size of the vocabulary. The language used in this study was the Spanish spoken in the city of Valdivia, Chile. A representative sample of 14 students and six professionals all natives of Valdivia (ten women and ten men) were used to complete the study. The sample ranged in age between 20 and 30 years old. Two systems were programmed based on the classical principles: digitalizing, end point detection, linear prediction coding, cepstral coefficients, dynamic time warping, and a final decision stage with a previous step of training: (i) one dependent speaker (15 words: five colors and ten numbers), (ii) one independent speaker (30 words: ten verbs, ten nouns, and ten adjectives). A simple didactical application, with options to choose colors, numbers and drawings of the verbs, nouns and adjectives, was designed to be used with a personal computer. In both programs, the tests carried out showed a tendency towards errors in short words with monosyllables like ``flor,'' and ``sol.'' The best results were obtained in words with three syllables like ``disparar'' and ``mojado.'' [Work supported by Proyecto DID UACh N S-200278.

  19. What’s Wrong With Automatic Speech Recognition (ASR) and How Can We Fix It?

    Science.gov (United States)

    2013-03-01

    Alex Acero , Proc. ICASSP 2005). The authors attempt to use an underlying hidden generative model and demonstrate improved performance on phonetic...Gunawardana and Alex Acero , International Conference on Speech Communication and Technology, International Speech Communication Association...Geoffrey Zweig and Alex Acero , ASRU 2011 (accepted) (pdf)(+tech report) o "The Subspace Gaussian Mixture Model – a Structured Model for Speech

  20. Automatic Speech Recognition and Training for Severely Dysarthric Users of Assistive Technology: The STARDUST Project

    Science.gov (United States)

    Parker, Mark; Cunningham, Stuart; Enderby, Pam; Hawley, Mark; Green, Phil

    2006-01-01

    The STARDUST project developed robust computer speech recognizers for use by eight people with severe dysarthria and concomitant physical disability to access assistive technologies. Independent computer speech recognizers trained with normal speech are of limited functional use by those with severe dysarthria due to limited and inconsistent…

  1. Automatic evaluation of voice and speech intelligibility after treatment of head and neck cancer

    NARCIS (Netherlands)

    Clapham, R.P.

    2017-01-01

    Cancer of the head and neck and its medical treatment and management, can have a negative impact on how a person sounds and talks. For the speech pathologist, rating a person’s speech intelligibility and voice quality is an important part of patient management. Rating someone’s speech or voice,

  2. Highly accurate SVM model with automatic feature selection for word sense disambiguation

    Institute of Scientific and Technical Information of China (English)

    王浩; 陈贵林; 吴连献

    2004-01-01

    A novel algorithm for word sense disambiguation(WSD) that is based on SVM model improved with automatic feature selection is introduced. This learning method employs rich contextual features to predict the proper senses for specific words. Experimental results show that this algorithm can achieve an execellent performance on the set of data released during the SENSEEVAL-2 competition. We present the results obtained and discuss the transplantation of this algorithm to other languages such as Chinese. Experimental results on Chinese corpus show that our algorithm achieves an accuracy of 70.0 % even with small training data.

  3. Interfacing sound stream segregation to automatic speech recognition-Preliminary results on listening to several sounds simultaneously

    Energy Technology Data Exchange (ETDEWEB)

    Okuno, Hiroshi G.; Nakatani, Tomohiro; Kawabata, Takeshi [NTT Basic Research Laboratories, Kanagawa (Japan)

    1996-12-31

    This paper reports the preliminary results of experiments on listening to several sounds at once. Two issues are addressed: segregating speech streams from a mixture of sounds, and interfacing speech stream segregation with automatic speech recognition (ASR). Speech stream segregation (SSS) is modeled as a process of extracting harmonic fragments, grouping these extracted harmonic fragments, and substituting some sounds for non-harmonic parts of groups. This system is implemented by extending the harmonic-based stream segregation system reported at AAAI-94 and IJCAI-95. The main problem in interfacing SSS with HMM-based ASR is how to improve the recognition performance which is degraded by spectral distortion of segregated sounds caused mainly by the binaural input, grouping, and residue substitution. Our solution is to re-train the parameters of the HMM with training data binauralized for four directions, to group harmonic fragments according to their directions, and to substitute the residue of harmonic fragments for non-harmonic parts of each group. Experiments with 500 mixtures of two women`s utterances of a word showed that the cumulative accuracy of word recognition up to the 10th candidate of each woman`s utterance is, on average, 75%.

  4. Automatic Rotation Recovery Algorithm for Accurate Digital Image and Video Watermarks Extraction

    Directory of Open Access Journals (Sweden)

    Nasr addin Ahmed Salem Al-maweri

    2016-11-01

    Full Text Available Research in digital watermarking has evolved rapidly in the current decade. This evolution brought various different methods and algorithms for watermarking digital images and videos. Introduced methods in the field varies from weak to robust according to how tolerant the method is implemented to keep the existence of the watermark in the presence of attacks. Rotation attacks applied to the watermarked media is one of the serious attacks which many, if not most, algorithms cannot survive. In this paper, a new automatic rotation recovery algorithm is proposed. This algorithm can be plugged to any image or video watermarking algorithm extraction component. The main job for this method is to detect the geometrical distortion happens to the watermarked image/images sequence; recover the distorted scene to its original state in a blind and automatic way and then send it to be used by the extraction procedure. The work is limited to have a recovery process to zero padded rotations for now, cropped images after rotation is left as future work. The proposed algorithm is tested on top of extraction component. Both recovery accuracy and the extracted watermarks accuracy showed high performance level.

  5. Novel Techniques for Dialectal Arabic Speech Recognition

    CERN Document Server

    Elmahdy, Mohamed; Minker, Wolfgang

    2012-01-01

    Novel Techniques for Dialectal Arabic Speech describes approaches to improve automatic speech recognition for dialectal Arabic. Since speech resources for dialectal Arabic speech recognition are very sparse, the authors describe how existing Modern Standard Arabic (MSA) speech data can be applied to dialectal Arabic speech recognition, while assuming that MSA is always a second language for all Arabic speakers. In this book, Egyptian Colloquial Arabic (ECA) has been chosen as a typical Arabic dialect. ECA is the first ranked Arabic dialect in terms of number of speakers, and a high quality ECA speech corpus with accurate phonetic transcription has been collected. MSA acoustic models were trained using news broadcast speech. In order to cross-lingually use MSA in dialectal Arabic speech recognition, the authors have normalized the phoneme sets for MSA and ECA. After this normalization, they have applied state-of-the-art acoustic model adaptation techniques like Maximum Likelihood Linear Regression (MLLR) and M...

  6. Automatic recognition of spontaneous emotions in speech using acoustic and lexical features

    NARCIS (Netherlands)

    Raaijmakers, S.; Truong, K.P.

    2008-01-01

    We developed acoustic and lexical classifiers, based on a boosting algorithm, to assess the separability on arousal and valence dimensions in spontaneous emotional speech. The spontaneous emotional speech data was acquired by inviting subjects to play a first-person shooter video game. Our acoustic

  7. Automatic generation of reaction energy databases from highly accurate atomization energy benchmark sets.

    Science.gov (United States)

    Margraf, Johannes T; Ranasinghe, Duminda S; Bartlett, Rodney J

    2017-03-31

    In this contribution, we discuss how reaction energy benchmark sets can automatically be created from arbitrary atomization energy databases. As an example, over 11 000 reaction energies derived from the W4-11 database, as well as some relevant subsets are reported. Importantly, there is only very modest computational overhead involved in computing >11 000 reaction energies compared to 140 atomization energies, since the rate-determining step for either benchmark is performing the same 140 quantum chemical calculations. The performance of commonly used electronic structure methods for the new database is analyzed. This allows investigating the relationship between the performances for atomization and reaction energy benchmarks based on an identical set of molecules. The atomization energy is found to be a weak predictor for the overall usefulness of a method. The performance of density functional approximations in light of the number of empirically optimized parameters used in their design is also discussed.

  8. Accurate and robust fully-automatic QCA: method and numerical validation.

    Science.gov (United States)

    Hernández-Vela, Antonio; Gatta, Carlo; Escalera, Sergio; Igual, Laura; Martin-Yuste, Victoria; Radeva, Petia

    2011-01-01

    The Quantitative Coronary Angiography (QCA) is a methodology used to evaluate the arterial diseases and, in particular, the degree of stenosis. In this paper we propose AQCA, a fully automatic method for vessel segmentation based on graph cut theory. Vesselness, geodesic paths and a new multi-scale edgeness map are used to compute a globally optimal artery segmentation. We evaluate the method performance in a rigorous numerical way on two datasets. The method can detect an artery with precision 92.9 +/- 5% and sensitivity 94.2 +/- 6%. The average absolute distance error between detected and ground truth centerline is 1.13 +/- 0.11 pixels (about 0.27 +/- 0.025 mm) and the absolute relative error in the vessel caliber estimation is 2.93% with almost no bias. Moreover, the method can discriminate between arteries and catheter with an accuracy of 96.4%.

  9. Subjective Quality Measurement of Speech Its Evaluation, Estimation and Applications

    CERN Document Server

    Kondo, Kazuhiro

    2012-01-01

    It is becoming crucial to accurately estimate and monitor speech quality in various ambient environments to guarantee high quality speech communication. This practical hands-on book shows speech intelligibility measurement methods so that the readers can start measuring or estimating speech intelligibility of their own system. The book also introduces subjective and objective speech quality measures, and describes in detail speech intelligibility measurement methods. It introduces a diagnostic rhyme test which uses rhyming word-pairs, and includes: An investigation into the effect of word familiarity on speech intelligibility. Speech intelligibility measurement of localized speech in virtual 3-D acoustic space using the rhyme test. Estimation of speech intelligibility using objective measures, including the ITU standard PESQ measures, and automatic speech recognizers.

  10. Using competing speech to estimate articulatory automatization in children: the possible effect of masking level and subject grade.

    Science.gov (United States)

    Manning, W H; Scheer, B R

    1978-09-01

    In order to study the possible influence of masking level and subject grade on a procedure for determining a child's articulatory automatization (Manning et al., 1976) 47 first and second grade and 49 third and fourth grade children were administered the McDonald Deep Test of Articulation under one of five conditions of auditory masking (earphones only, or presentation of competing speech at 50, 60, 70, or 80 dB SPL). Results indicated no significant difference in subject performance across the factors of masking level and subject grade. The findings suggest that these factors do not appear to be critical in the clinical application of the suggested procedure for estimating children's automatization of newly acquired phonemes.

  11. Automatically feeded firewood machine, efficient, safe and accurate; Automaattisyoettoeinen klapikone. Tehokas, turvallinen ja varma

    Energy Technology Data Exchange (ETDEWEB)

    Riikkilae, M. [Metsaelehti, Helsinki (Finland)

    1994-12-31

    A new firewood chopping machine, developed by engineering workshop Kalevi Peurala, has an automated feed of the stems, which speeds up the chopping. The feeding roll sucks in the stem so the operator can go to fetch a new stem simultaneously. Additionally, the automatic feed lightens the chopping work. There is no need for holding the stem as the cutting and chopping blades split the wood into firewood. The automation also minimizes the dangers of the chopping work because the machine is totally encapsulated. The operator need not to go nearer the machine than to about one meters distance from the machine. Power supply is totally mechanical, arranged using a Cardan shaft. The power demand is low. The operator push the stem into the feeding channel, there a hook rising from the bottom of the channel draws the block between the draw-in roller and the guidance channel, pressed downwards with spring. The roller feeds the stem simultaneously towards the cutting blades. The blades roll in diverse directions. The 25 cm broad blades are pressed into the wood in slightly inclined position simultaneously from the top and the bottom of the cutter. The carving cutting diminishes the power demand. The splitting occurs simultaneously with the cutting. In the middle of the both cutting blades there are splitting dog which penetrate the block both downwards and upwards so the block splits simultaneously as it is cut. The length of the chopwood is controlled by changing the chain wheels. The machine contains chain wheels for 30 cm and 45 cm chopwood as standard equipment. The changing of the chain wheels takes about five minutes. Special tools are not needed. The maximum diameter of the processible wood is 20 cm. The machine is patented in Finland. The price of the machine is 30 000 FIM

  12. Fast, automatic, and accurate catheter reconstruction in HDR brachytherapy using an electromagnetic 3D tracking system

    Energy Technology Data Exchange (ETDEWEB)

    Poulin, Eric; Racine, Emmanuel; Beaulieu, Luc, E-mail: Luc.Beaulieu@phy.ulaval.ca [Département de physique, de génie physique et d’optique et Centre de recherche sur le cancer de l’Université Laval, Université Laval, Québec, Québec G1V 0A6, Canada and Département de radio-oncologie et Axe Oncologie du Centre de recherche du CHU de Québec, CHU de Québec, 11 Côte du Palais, Québec, Québec G1R 2J6 (Canada); Binnekamp, Dirk [Integrated Clinical Solutions and Marketing, Philips Healthcare, Veenpluis 4-6, Best 5680 DA (Netherlands)

    2015-03-15

    Purpose: In high dose rate brachytherapy (HDR-B), current catheter reconstruction protocols are relatively slow and error prone. The purpose of this technical note is to evaluate the accuracy and the robustness of an electromagnetic (EM) tracking system for automated and real-time catheter reconstruction. Methods: For this preclinical study, a total of ten catheters were inserted in gelatin phantoms with different trajectories. Catheters were reconstructed using a 18G biopsy needle, used as an EM stylet and equipped with a miniaturized sensor, and the second generation Aurora{sup ®} Planar Field Generator from Northern Digital Inc. The Aurora EM system provides position and orientation value with precisions of 0.7 mm and 0.2°, respectively. Phantoms were also scanned using a μCT (GE Healthcare) and Philips Big Bore clinical computed tomography (CT) system with a spatial resolution of 89 μm and 2 mm, respectively. Reconstructions using the EM stylet were compared to μCT and CT. To assess the robustness of the EM reconstruction, five catheters were reconstructed twice and compared. Results: Reconstruction time for one catheter was 10 s, leading to a total reconstruction time inferior to 3 min for a typical 17-catheter implant. When compared to the μCT, the mean EM tip identification error was 0.69 ± 0.29 mm while the CT error was 1.08 ± 0.67 mm. The mean 3D distance error was found to be 0.66 ± 0.33 mm and 1.08 ± 0.72 mm for the EM and CT, respectively. EM 3D catheter trajectories were found to be more accurate. A maximum difference of less than 0.6 mm was found between successive EM reconstructions. Conclusions: The EM reconstruction was found to be more accurate and precise than the conventional methods used for catheter reconstruction in HDR-B. This approach can be applied to any type of catheters and applicators.

  13. Estimation of Phoneme-Specific HMM Topologies for the Automatic Recognition of Dysarthric Speech

    Directory of Open Access Journals (Sweden)

    Santiago-Omar Caballero-Morales

    2013-01-01

    Full Text Available Dysarthria is a frequently occurring motor speech disorder which can be caused by neurological trauma, cerebral palsy, or degenerative neurological diseases. Because dysarthria affects phonation, articulation, and prosody, spoken communication of dysarthric speakers gets seriously restricted, affecting their quality of life and confidence. Assistive technology has led to the development of speech applications to improve the spoken communication of dysarthric speakers. In this field, this paper presents an approach to improve the accuracy of HMM-based speech recognition systems. Because phonatory dysfunction is a main characteristic of dysarthric speech, the phonemes of a dysarthric speaker are affected at different levels. Thus, the approach consists in finding the most suitable type of HMM topology (Bakis, Ergodic for each phoneme in the speaker’s phonetic repertoire. The topology is further refined with a suitable number of states and Gaussian mixture components for acoustic modelling. This represents a difference when compared with studies where a single topology is assumed for all phonemes. Finding the suitable parameters (topology and mixtures components is performed with a Genetic Algorithm (GA. Experiments with a well-known dysarthric speech database showed statistically significant improvements of the proposed approach when compared with the single topology approach, even for speakers with severe dysarthria.

  14. An automatic method for fast and accurate liver segmentation in CT images using a shape detection level set method

    Science.gov (United States)

    Lee, Jeongjin; Kim, Namkug; Lee, Ho; Seo, Joon Beom; Won, Hyung Jin; Shin, Yong Moon; Shin, Yeong Gil

    2007-03-01

    Automatic liver segmentation is still a challenging task due to the ambiguity of liver boundary and the complex context of nearby organs. In this paper, we propose a faster and more accurate way of liver segmentation in CT images with an enhanced level set method. The speed image for level-set propagation is smoothly generated by increasing number of iterations in anisotropic diffusion filtering. This prevents the level-set propagation from stopping in front of local minima, which prevails in liver CT images due to irregular intensity distributions of the interior liver region. The curvature term of shape modeling level-set method captures well the shape variations of the liver along the slice. Finally, rolling ball algorithm is applied for including enhanced vessels near the liver boundary. Our approach are tested and compared to manual segmentation results of eight CT scans with 5mm slice distance using the average distance and volume error. The average distance error between corresponding liver boundaries is 1.58 mm and the average volume error is 2.2%. The average processing time for the segmentation of each slice is 5.2 seconds, which is much faster than the conventional ones. Accurate and fast result of our method will expedite the next stage of liver volume quantification for liver transplantations.

  15. AUTOMATIC CLASSIFICATION OF QUESTION TURNS IN SPONTANEOUS SPEECH USING LEXICAL AND PROSODIC EVIDENCE.

    Science.gov (United States)

    Ananthakrishnan, Sankaranarayanan; Ghosh, Prasanta; Narayanan, Shrikanth

    2008-01-01

    The ability to identify speech acts reliably is desirable in any spoken language system that interacts with humans. Minimally, such a system should be capable of distinguishing between question-bearing turns and other types of utterances. However, this is a non-trivial task, since spontaneous speech tends to have incomplete syntactic, and even ungrammatical, structure and is characterized by disfluencies, repairs and other non-linguistic vocalizations that make simple rule based pattern learning difficult. In this paper, we present a system for identifying question-bearing turns in spontaneous multi-party speech (ICSI Meeting Corpus) using lexical and prosodic evidence. On a balanced test set, our system achieves an accuracy of 71.9% for the binary question vs. non-question classification task. Further, we investigate the robustness of our proposed technique to uncertainty in the lexical feature stream (e.g. caused by speech recognition errors). Our experiments indicate that classification accuracy of the proposed method is robust to errors in the text stream, dropping only about 0.8% for every 10% increase in word error rate (WER).

  16. User Experience of a Mobile Speaking Application with Automatic Speech Recognition for EFL Learning

    Science.gov (United States)

    Ahn, Tae youn; Lee, Sangmin-Michelle

    2016-01-01

    With the spread of mobile devices, mobile phones have enormous potential regarding their pedagogical use in language education. The goal of this study is to analyse user experience of a mobile-based learning system that is enhanced by speech recognition technology for the improvement of EFL (English as a foreign language) learners' speaking…

  17. User Experience of a Mobile Speaking Application with Automatic Speech Recognition for EFL Learning

    Science.gov (United States)

    Ahn, Tae youn; Lee, Sangmin-Michelle

    2016-01-01

    With the spread of mobile devices, mobile phones have enormous potential regarding their pedagogical use in language education. The goal of this study is to analyse user experience of a mobile-based learning system that is enhanced by speech recognition technology for the improvement of EFL (English as a foreign language) learners' speaking…

  18. The Use of an Autonomous Pedagogical Agent and Automatic Speech Recognition for Teaching Sight Words to Students with Autism Spectrum Disorder

    Science.gov (United States)

    Saadatzi, Mohammad Nasser; Pennington, Robert C.; Welch, Karla C.; Graham, James H.; Scott, Renee E.

    2017-01-01

    In the current study, we examined the effects of an instructional package comprised of an autonomous pedagogical agent, automatic speech recognition, and constant time delay during the instruction of reading sight words aloud to young adults with autism spectrum disorder. We used a concurrent multiple baseline across participants design to…

  19. On the Selection of Non-Invasive Methods Based on Speech Analysis Oriented to Automatic Alzheimer Disease Diagnosis

    Directory of Open Access Journals (Sweden)

    Unai Martinez de Lizardui

    2013-05-01

    Full Text Available The work presented here is part of a larger study to identify novel technologies and biomarkers for early Alzheimer disease (AD detection and it focuses on evaluating the suitability of a new approach for early AD diagnosis by non-invasive methods. The purpose is to examine in a pilot study the potential of applying intelligent algorithms to speech features obtained from suspected patients in order to contribute to the improvement of diagnosis of AD and its degree of severity. In this sense, Artificial Neural Networks (ANN have been used for the automatic classification of the two classes (AD and control subjects. Two human issues have been analyzed for feature selection: Spontaneous Speech and Emotional Response. Not only linear features but also non-linear ones, such as Fractal Dimension, have been explored. The approach is non invasive, low cost and without any side effects. Obtained experimental results were very satisfactory and promising for early diagnosis and classification of AD patients.

  20. A Global Approach to Accurate and Automatic Quantitative Analysis of NMR Spectra by Complex Least-Squares Curve Fitting

    Science.gov (United States)

    Martin, Y. L.

    The performance of quantitative analysis of 1D NMR spectra depends greatly on the choice of the NMR signal model. Complex least-squares analysis is well suited for optimizing the quantitative determination of spectra containing a limited number of signals (20). From a general point of view it is concluded, on the basis of mathematical considerations and numerical simulations, that, in the absence of truncation of the free-induction decay, complex least-squares curve fitting either in the time or in the frequency domain and linear-prediction methods are in fact nearly equivalent and give identical results. However, in the situation considered, complex least-squares analysis in the frequency domain is more flexible since it enables the quality of convergence to be appraised at every resonance position. An efficient data-processing strategy has been developed which makes use of an approximate conjugate-gradient algorithm. All spectral parameters (frequency, damping factors, amplitudes, phases, initial delay associated with intensity, and phase parameters of a baseline correction) are simultaneously managed in an integrated approach which is fully automatable. The behavior of the error as a function of the signal-to-noise ratio is theoretically estimated, and the influence of apodization is discussed. The least-squares curve fitting is theoretically proved to be the most accurate approach for quantitative analysis of 1D NMR data acquired with reasonable signal-to-noise ratio. The method enables complex spectral residuals to be sorted out. These residuals, which can be cumulated thanks to the possibility of correcting for frequency shifts and phase errors, extract systematic components, such as isotopic satellite lines, and characterize the shape and the intensity of the spectral distortion with respect to the Lorentzian model. This distortion is shown to be nearly independent of the chemical species, of the nature of the molecular site, and of the type of nucleus, but

  1. Long Term Suboxone™ Emotional Reactivity As Measured by Automatic Detection in Speech

    OpenAIRE

    2013-01-01

    Addictions to illicit drugs are among the nation’s most critical public health and societal problems. The current opioid prescription epidemic and the need for buprenorphine/naloxone (Suboxone®; SUBX) as an opioid maintenance substance, and its growing street diversion provided impetus to determine affective states (“true ground emotionality”) in long-term SUBX patients. Toward the goal of effective monitoring, we utilized emotion-detection in speech as a measure of “true” emotionality in 36 ...

  2. An automatic speech recognition system with speaker-independent identification support

    Science.gov (United States)

    Caranica, Alexandru; Burileanu, Corneliu

    2015-02-01

    The novelty of this work relies on the application of an open source research software toolkit (CMU Sphinx) to train, build and evaluate a speech recognition system, with speaker-independent support, for voice-controlled hardware applications. Moreover, we propose to use the trained acoustic model to successfully decode offline voice commands on embedded hardware, such as an ARMv6 low-cost SoC, Raspberry PI. This type of single-board computer, mainly used for educational and research activities, can serve as a proof-of-concept software and hardware stack for low cost voice automation systems.

  3. Automatic pronunciation error detection in non-native speech: the case of vowel errors in Dutch.

    Science.gov (United States)

    van Doremalen, Joost; Cucchiarini, Catia; Strik, Helmer

    2013-08-01

    This research is aimed at analyzing and improving automatic pronunciation error detection in a second language. Dutch vowels spoken by adult non-native learners of Dutch are used as a test case. A first study on Dutch pronunciation by L2 learners with different L1s revealed that vowel pronunciation errors are relatively frequent and often concern subtle acoustic differences between the realization and the target sound. In a second study automatic pronunciation error detection experiments were conducted to compare existing measures to a metric that takes account of the error patterns observed to capture relevant acoustic differences. The results of the two studies do indeed show that error patterns bear information that can be usefully employed in weighted automatic measures of pronunciation quality. In addition, it appears that combining such a weighted metric with existing measures improves the equal error rate by 6.1 percentage points from 0.297, for the Goodness of Pronunciation (GOP) algorithm, to 0.236.

  4. Security and Hyper-accurate Positioning Monitoring with Automatic Dependent Surveillance-Broadcast (ADS-B) Project

    Data.gov (United States)

    National Aeronautics and Space Administration — Lightning Ridge Technologies, working in collaboration with The Innovation Laboratory, Inc., extend Automatic Dependent Surveillance Broadcast (ADS-B) into a safe,...

  5. Security and Hyper-accurate Positioning Monitoring with Automatic Dependent Surveillance-Broadcast (ADS-B) Project

    Data.gov (United States)

    National Aeronautics and Space Administration — Lightning Ridge Technologies, LLC, working in collaboration with The Innovation Laboratory, Inc., extend Automatic Dependent Surveillance ? Broadcast (ADS-B) into a...

  6. A Robust Front-End Processor combining Mel Frequency Cepstral Coefficient and Sub-band Spectral Centroid Histogram methods for Automatic Speech Recognition

    Directory of Open Access Journals (Sweden)

    R. Thangarajan

    2009-06-01

    Full Text Available Environmental robustness is an important area of research in speech recognition. Mismatch between trained speech models and actual speech to be recognized is due to factors like background noise. It can cause severe degradation in the accuracy of recognizers whichare based on commonly used features like mel-frequency cepstral co-efficient (MFCC and linear predictive coding (LPC. It is well understood that all previous auditory based feature extraction methods perform extremely well in terms of robustness due to the dominantfrequency information present in them. But these methods suffer from high computational cost. Another method called sub-band spectral centroid histograms (SSCH integrates dominant-frequency information with sub-band power information. This method is based onsub-band spectral centroids (SSC which are closely related to spectral peaks for both clean and noisy speech. Since SSC can be computed efficiently from short-term speech power spectrum estimate, SSCH method is quite robust to background additive noise at a lowercomputational cost. It has been noted that MFCC method outperforms SSCH method in the case of clean speech. However in the case of speech with additive noise, MFCC method degrades substantially. In this paper, both MFCC and SSCH feature extraction have beenimplemented in Carnegie Melon University (CMU Sphinx 4.0 and trained and tested on AN4 database for clean and noisy speech. Finally, a robust speech recognizer which automatically employs either MFCC or SSCH feature extraction methods based on the variance of shortterm power of the input utterance is suggested.

  7. Automatic pose initialization for accurate 2D/3D registration applied to abdominal aortic aneurysm endovascular repair

    Science.gov (United States)

    Miao, Shun; Lucas, Joseph; Liao, Rui

    2012-02-01

    Minimally invasive abdominal aortic aneurysm (AAA) stenting can be greatly facilitated by overlaying the preoperative 3-D model of the abdominal aorta onto the intra-operative 2-D X-ray images. Accurate 2-D/3-D registration in 3-D space makes the 2-D/3-D overlay robust to the change of C-Arm angulations. By far, the 2-D/3-D registration methods based on simulated X-ray projection images using multiple image planes have been shown to be able to provide satisfactory 3-D registration accuracy. However, one drawback of the intensity-based 2-D/3-D registration methods is that the similarity measure is usually highly non-convex and hence the optimizer can easily be trapped into local minima. User interaction therefore is often needed in the initialization of the position of the 3-D model in order to get a successful 2-D/3-D registration. In this paper, a novel 3-D pose initialization technique is proposed, as an extension of our previously proposed bi-plane 2-D/3-D registration method for AAA intervention [4]. The proposed method detects vessel bifurcation points and spine centerline in both 2-D and 3-D images, and utilizes landmark information to bring the 3-D volume into a 15mm capture range. The proposed landmark detection method was validated on real dataset, and is shown to be able to provide a good initialization for 2-D/3-D registration in [4], thus making the workflow fully automatic.

  8. Integrating Automatic Speech Recognition and Machine Translation for Better Translation Outputs

    DEFF Research Database (Denmark)

    Liyanapathirana, Jeevanthi

    than typing, making the translation process faster. The spoken translation is analyzed and combined with the machine translation output of the same sentence using different methods. We study a number of different translation models in the context of n-best list rescoring methods. As an alternative...... to the n-best list rescoring, we also use word graphs with the expectation of arriving at a tighter integration of ASR and MT models. Integration methods include constraining ASR models using language and translation models of MT, and vice versa. We currently develop and experiment different methods...... on the Danish – English language pair, with the use of a speech corpora and parallel text. The methods are investigated to check ways that the accuracy of the spoken translation of the translator can be increased with the use of machine translation outputs, which would be useful for potential computer...

  9. Optimizing Automatic Speech Recognition for Low-Proficient Non-Native Speakers

    Directory of Open Access Journals (Sweden)

    Catia Cucchiarini

    2010-01-01

    Full Text Available Computer-Assisted Language Learning (CALL applications for improving the oral skills of low-proficient learners have to cope with non-native speech that is particularly challenging. Since unconstrained non-native ASR is still problematic, a possible solution is to elicit constrained responses from the learners. In this paper, we describe experiments aimed at selecting utterances from lists of responses. The first experiment on utterance selection indicates that the decoding process can be improved by optimizing the language model and the acoustic models, thus reducing the utterance error rate from 29–26% to 10–8%. Since giving feedback on incorrectly recognized utterances is confusing, we verify the correctness of the utterance before providing feedback. The results of the second experiment on utterance verification indicate that combining duration-related features with a likelihood ratio (LR yield an equal error rate (EER of 10.3%, which is significantly better than the EER for the other measures in isolation.

  10. Second-language learning effects on automaticity of speech processing of Japanese phonetic contrasts: An MEG study.

    Science.gov (United States)

    Hisagi, Miwako; Shafer, Valerie L; Miyagawa, Shigeru; Kotek, Hadas; Sugawara, Ayaka; Pantazis, Dimitrios

    2016-12-01

    We examined discrimination of a second-language (L2) vowel duration contrast in English learners of Japanese (JP) with different amounts of experience using the magnetoencephalography mismatch field (MMF) component. Twelve L2 learners were tested before and after a second semester of college-level JP; half attended a regular rate course and half an accelerated course with more hours per week. Results showed no significant change in MMF for either the regular or accelerated learning group from beginning to end of the course. We also compared these groups against nine L2 learners who had completed four semesters of college-level JP. These 4-semester learners did not significantly differ from 2-semester learners, in that only a difference in hemisphere activation (interacting with time) between the two groups approached significance. These findings suggest that targeted training of L2 phonology may be necessary to allow for changes in processing of L2 speech contrasts at an early, automatic level. Copyright © 2016 Elsevier B.V. All rights reserved.

  11. Automatic construction of a large-scale and accurate drug-side-effect association knowledge base from biomedical literature.

    Science.gov (United States)

    Xu, Rong; Wang, QuanQiu

    2014-10-01

    Systems approaches to studying drug-side-effect (drug-SE) associations are emerging as an active research area for drug target discovery, drug repositioning, and drug toxicity prediction. However, currently available drug-SE association databases are far from being complete. Herein, in an effort to increase the data completeness of current drug-SE relationship resources, we present an automatic learning approach to accurately extract drug-SE pairs from the vast amount of published biomedical literature, a rich knowledge source of side effect information for commercial, experimental, and even failed drugs. For the text corpus, we used 119,085,682 MEDLINE sentences and their parse trees. We used known drug-SE associations derived from US Food and Drug Administration (FDA) drug labels as prior knowledge to find relevant sentences and parse trees. We extracted syntactic patterns associated with drug-SE pairs from the resulting set of parse trees. We developed pattern-ranking algorithms to prioritize drug-SE-specific patterns. We then selected a set of patterns with both high precisions and recalls in order to extract drug-SE pairs from the entire MEDLINE. In total, we extracted 38,871 drug-SE pairs from MEDLINE using the learned patterns, the majority of which have not been captured in FDA drug labels to date. On average, our knowledge-driven pattern-learning approach in extracting drug-SE pairs from MEDLINE has achieved a precision of 0.833, a recall of 0.407, and an F1 of 0.545. We compared our approach to a support vector machine (SVM)-based machine learning and a co-occurrence statistics-based approach. We show that the pattern-learning approach is largely complementary to the SVM- and co-occurrence-based approaches with significantly higher precision and F1 but lower recall. We demonstrated by correlation analysis that the extracted drug side effects correlate positively with both drug targets, metabolism, and indications. Copyright © 2014 Elsevier Inc. All

  12. Speech Indexing

    NARCIS (Netherlands)

    Ordelman, R.J.F.; Jong, de F.M.G.; Leeuwen, van D.A.; Blanken, H.M.; de Vries, A.P.; Blok, H.E.; Feng, L.

    2007-01-01

    This chapter will focus on the automatic extraction of information from the speech in multimedia documents. This approach is often referred to as speech indexing and it can be regarded as a subfield of audio indexing that also incorporates for example the analysis of music and sounds. If the objecti

  13. Amharic Speech Recognition for Speech Translation

    OpenAIRE

    Melese, Michael; Besacier, Laurent; Meshesha, Million

    2016-01-01

    International audience; The state-of-the-art speech translation can be seen as a cascade of Automatic Speech Recognition, Statistical Machine Translation and Text-To-Speech synthesis. In this study an attempt is made to experiment on Amharic speech recognition for Amharic-English speech translation in tourism domain. Since there is no Amharic speech corpus, we developed a read-speech corpus of 7.43hr in tourism domain. The Amharic speech corpus has been recorded after translating standard Bas...

  14. A fully automatic tool to perform accurate flood mapping by merging remote sensing imagery and ancillary data

    Science.gov (United States)

    D'Addabbo, Annarita; Refice, Alberto; Lovergine, Francesco; Pasquariello, Guido

    2016-04-01

    Flooding is one of the most frequent and expansive natural hazard. High-resolution flood mapping is an essential step in the monitoring and prevention of inundation hazard, both to gain insight into the processes involved in the generation of flooding events, and from the practical point of view of the precise assessment of inundated areas. Remote sensing data are recognized to be useful in this respect, thanks to the high resolution and regular revisit schedules of state-of-the-art satellites, moreover offering a synoptic overview of the extent of flooding. In particular, Synthetic Aperture Radar (SAR) data present several favorable characteristics for flood mapping, such as their relative insensitivity to the meteorological conditions during acquisitions, as well as the possibility of acquiring independently of solar illumination, thanks to the active nature of the radar sensors [1]. However, flood scenarios are typical examples of complex situations in which different factors have to be considered to provide accurate and robust interpretation of the situation on the ground: the presence of many land cover types, each one with a particular signature in presence of flood, requires modelling the behavior of different objects in the scene in order to associate them to flood or no flood conditions [2]. Generally, the fusion of multi-temporal, multi-sensor, multi-resolution and/or multi-platform Earth observation image data, together with other ancillary information, seems to have a key role in the pursuit of a consistent interpretation of complex scenes. In the case of flooding, distance from the river, terrain elevation, hydrologic information or some combination thereof can add useful information to remote sensing data. Suitable methods, able to manage and merge different kind of data, are so particularly needed. In this work, a fully automatic tool, based on Bayesian Networks (BNs) [3] and able to perform data fusion, is presented. It supplies flood maps

  15. Multi-parametric (ADC/PWI/T2-w) image fusion approach for accurate semi-automatic segmentation of tumorous regions in glioblastoma multiforme.

    Science.gov (United States)

    Fathi Kazerooni, Anahita; Mohseni, Meysam; Rezaei, Sahar; Bakhshandehpour, Gholamreza; Saligheh Rad, Hamidreza

    2015-02-01

    Glioblastoma multiforme (GBM) brain tumor is heterogeneous in nature, so its quantification depends on how to accurately segment different parts of the tumor, i.e. viable tumor, edema and necrosis. This procedure becomes more effective when metabolic and functional information, provided by physiological magnetic resonance (MR) imaging modalities, like diffusion-weighted-imaging (DWI) and perfusion-weighted-imaging (PWI), is incorporated with the anatomical magnetic resonance imaging (MRI). In this preliminary tumor quantification work, the idea is to characterize different regions of GBM tumors in an MRI-based semi-automatic multi-parametric approach to achieve more accurate characterization of pathogenic regions. For this purpose, three MR sequences, namely T2-weighted imaging (anatomical MR imaging), PWI and DWI of thirteen GBM patients, were acquired. To enhance the delineation of the boundaries of each pathogenic region (peri-tumoral edema, viable tumor and necrosis), the spatial fuzzy C-means algorithm is combined with the region growing method. The results show that exploiting the multi-parametric approach along with the proposed semi-automatic segmentation method can differentiate various tumorous regions with over 80 % sensitivity, specificity and dice score. The proposed MRI-based multi-parametric segmentation approach has the potential to accurately segment tumorous regions, leading to an efficient design of the pre-surgical treatment planning.

  16. Assessing the Performance of Automatic Speech Recognition Systems When Used by Native and Non-Native Speakers of Three Major Languages in Dictation Workflows

    DEFF Research Database (Denmark)

    Zapata, Julián; Kirkedal, Andreas Søeborg

    2015-01-01

    In this paper, we report on a two-part experiment aiming to assess and compare the performance of two types of automatic speech recognition (ASR) systems on two different computational platforms when used to augment dictation workflows. The experiment was performed with a sample of speakers...... of three major languages and with different linguistic profiles: non-native English speakers; non-native French speakers; and native Spanish speakers. The main objective of this experiment is to examine ASR performance in translation dictation (TD) and medical dictation (MD) workflows without manual...

  17. Full automatic fiducial marker detection on coil arrays for accurate instrumentation placement during MRI guided breast interventions

    Science.gov (United States)

    Filippatos, Konstantinos; Boehler, Tobias; Geisler, Benjamin; Zachmann, Harald; Twellmann, Thorsten

    2010-02-01

    With its high sensitivity, dynamic contrast-enhanced MR imaging (DCE-MRI) of the breast is today one of the first-line tools for early detection and diagnosis of breast cancer, particularly in the dense breast of young women. However, many relevant findings are very small or occult on targeted ultrasound images or mammography, so that MRI guided biopsy is the only option for a precise histological work-up [1]. State-of-the-art software tools for computer-aided diagnosis of breast cancer in DCE-MRI data offer also means for image-based planning of biopsy interventions. One step in the MRI guided biopsy workflow is the alignment of the patient position with the preoperative MR images. In these images, the location and orientation of the coil localization unit can be inferred from a number of fiducial markers, which for this purpose have to be manually or semi-automatically detected by the user. In this study, we propose a method for precise, full-automatic localization of fiducial markers, on which basis a virtual localization unit can be subsequently placed in the image volume for the purpose of determining the parameters for needle navigation. The method is based on adaptive thresholding for separating breast tissue from background followed by rigid registration of marker templates. In an evaluation of 25 clinical cases comprising 4 different commercial coil array models and 3 different MR imaging protocols, the method yielded a sensitivity of 0.96 at a false positive rate of 0.44 markers per case. The mean distance deviation between detected fiducial centers and ground truth information that was appointed from a radiologist was 0.94mm.

  18. The software for automatic creation of the formal grammars used by speech recognition, computer vision, editable text conversion systems, and some new functions

    Science.gov (United States)

    Kardava, Irakli; Tadyszak, Krzysztof; Gulua, Nana; Jurga, Stefan

    2017-02-01

    For more flexibility of environmental perception by artificial intelligence it is needed to exist the supporting software modules, which will be able to automate the creation of specific language syntax and to make a further analysis for relevant decisions based on semantic functions. According of our proposed approach, of which implementation it is possible to create the couples of formal rules of given sentences (in case of natural languages) or statements (in case of special languages) by helping of computer vision, speech recognition or editable text conversion system for further automatic improvement. In other words, we have developed an approach, by which it can be achieved to significantly improve the training process automation of artificial intelligence, which as a result will give us a higher level of self-developing skills independently from us (from users). At the base of our approach we have developed a software demo version, which includes the algorithm and software code for the entire above mentioned component's implementation (computer vision, speech recognition and editable text conversion system). The program has the ability to work in a multi - stream mode and simultaneously create a syntax based on receiving information from several sources.

  19. Nonlinear spectro-temporal features based on a cochlear model for automatic speech recognition in a noisy situation.

    Science.gov (United States)

    Choi, Yong-Sun; Lee, Soo-Young

    2013-09-01

    A nonlinear speech feature extraction algorithm was developed by modeling human cochlear functions, and demonstrated as a noise-robust front-end for speech recognition systems. The algorithm was based on a model of the Organ of Corti in the human cochlea with such features as such as basilar membrane (BM), outer hair cells (OHCs), and inner hair cells (IHCs). Frequency-dependent nonlinear compression and amplification of OHCs were modeled by lateral inhibition to enhance spectral contrasts. In particular, the compression coefficients had frequency dependency based on the psychoacoustic evidence. Spectral subtraction and temporal adaptation were applied in the time-frame domain. With long-term and short-term adaptation characteristics, these factors remove stationary or slowly varying components and amplify the temporal changes such as onset or offset. The proposed features were evaluated with a noisy speech database and showed better performance than the baseline methods such as mel-frequency cepstral coefficients (MFCCs) and RASTA-PLP in unknown noisy conditions.

  20. Machines a Comprendre la Parole: Methodologie et Bilan de Recherche (Automatic Speech Recognition: Methodology and the State of the Research)

    Science.gov (United States)

    Haton, Jean-Pierre

    1974-01-01

    Still no decisive result has been achieved in the automatic machine recognition of sentences of a natural language. Current research concentrates on developing algorithms for syntactic and semantic analysis. It is obvious that clues from all levels of perception have to be taken into account if a long term solution is ever to be found. (Author/MSE)

  1. 基于DIVA模型的语音-映射单元自动获取%Automatic acquisition of speech sound-target cells based on DIVA model

    Institute of Scientific and Technical Information of China (English)

    张少白; 刘欣

    2013-01-01

    针对DIVA模型中存在的“感知能力与语音生成技巧发育不平衡”问题,提出了一种自动获取语音-映射单元的方法。该方法将人耳模拟为一个具有不同带宽的并联带通滤波器组,分别与模型中21维度的听觉存储空间相关联,对不同听觉的不同反应,分别考虑其频带的屏蔽效应、听觉响度与频率的关系。在读取语音输入信号的过程中,模型能较好地获得初始听觉表示,其方式与婴儿咿呀学语的过程基本一致。仿真实验表明,通过边界定义、相似性比较以及搜索更新等步骤,此方法能很好地进行初始输入模式的自组织匹配,并最终使DIVA模型更具语音获取的自然特性。%Contraposing the shortage of Directions Into Velocities of Articulators ( DIVA) model about“infants per-ceptual abilities do develop faster at first than their speech production skills”, the paper presents an automatic ac-quisition method of speech sound-target cells. The method simulates the human ear as a parallel band-pass filter group with different bandwidth and associates respectively;the filter with the 21-dimensional storage space of audi-tory sense in DIVA model. This method was done in order for different auditory reactions, the shielding effect of fre-quency band, sound loudness, and frequency relation could be considered respectively for this study. In the process of reading the input signal of speech, the model can acquire good initial hearing and the process is consistent with baby's babble. The simulation results show that through boundary definition, similarity comparison, searching and updates and so on, the method has nicer self-organized pattern matching effect for initial input, which makes the DIVA model a more natural characteristic regarding speech acquisition.

  2. Multilevel Analysis in Analyzing Speech Data

    Science.gov (United States)

    Guddattu, Vasudeva; Krishna, Y.

    2011-01-01

    The speech produced by human vocal tract is a complex acoustic signal, with diverse applications in phonetics, speech synthesis, automatic speech recognition, speaker identification, communication aids, speech pathology, speech perception, machine translation, hearing research, rehabilitation and assessment of communication disorders and many…

  3. Cross-modal retrieval of scripted speech audio

    Science.gov (United States)

    Owen, Charles B.; Makedon, Fillia

    1997-12-01

    This paper describes an approach to the problem of searching speech-based digital audio using cross-modal information retrieval. Audio containing speech (speech-based audio) is difficult to search. Open vocabulary speech recognition is advancing rapidly, but cannot yield high accuracy in either search or transcription modalities. However, text can be searched quickly and efficiently with high accuracy. Script- light digital audio is audio that has an available transcription. This is a surprisingly large class of content including legal testimony, broadcasting, dramatic productions and political meetings and speeches. An automatic mechanism for deriving the synchronization between the transcription and the audio allows for very accurate retrieval of segments of that audio. The mechanism described in this paper is based on building a transcription graph from the text and computing biphone probabilities for the audio. A modified beam search algorithm is presented to compute the alignment.

  4. Study on automatic prediction of sentential stress for Chinese Putonghua Text-to-Speech system with natural style

    Institute of Scientific and Technical Information of China (English)

    SHAO Yanqiu; HAN Jiqing; ZHAO Yongzhen; LIU Ting

    2007-01-01

    Stress is an important parameter for prosody processing in speech synthesis. In this paper, we compare the acoustic features of neutral tone syllables and strong stress syllables with moderate stress syllables, including pitch, syllable duration, intensity and pause length after syllable. The relation between duration and pitch, as well as the Third Tone (T3) and pitch are also studied. Three stress prediction models based on ANN, i.e. the acoustic model,the linguistic model and the mixed model, are presented for predicting Chinese sentential stress.The results show that the mixed model performs better than the other two models. In order to solve the problem of the diversity of manual labeling, an evaluation index of support ratio is proposed.

  5. Comparison of Forced-Alignment Speech Recognition and Humans for Generating Reference VAD

    DEFF Research Database (Denmark)

    Kraljevski, Ivan; Tan, Zheng-Hua; Paola Bissiri, Maria

    2015-01-01

    This present paper aims to answer the question whether forced-alignment speech recognition can be used as an alternative to humans in generating reference Voice Activity Detection (VAD) transcriptions. An investigation of the level of agreement between automatic/manual VAD transcriptions and the ......This present paper aims to answer the question whether forced-alignment speech recognition can be used as an alternative to humans in generating reference Voice Activity Detection (VAD) transcriptions. An investigation of the level of agreement between automatic/manual VAD transcriptions...... and the reference ones produced by a human expert was carried out. Thereafter, statistical analysis was employed on the automatically produced and the collected manual transcriptions. Experimental results confirmed that forced-alignment speech recognition can provide accurate and consistent VAD labels....

  6. Comparison of Forced-Alignment Speech Recognition and Humans for Generating Reference VAD

    DEFF Research Database (Denmark)

    Kraljevski, Ivan; Tan, Zheng-Hua; Paola Bissiri, Maria

    2015-01-01

    This present paper aims to answer the question whether forced-alignment speech recognition can be used as an alternative to humans in generating reference Voice Activity Detection (VAD) transcriptions. An investigation of the level of agreement between automatic/manual VAD transcriptions and the ......This present paper aims to answer the question whether forced-alignment speech recognition can be used as an alternative to humans in generating reference Voice Activity Detection (VAD) transcriptions. An investigation of the level of agreement between automatic/manual VAD transcriptions...... and the reference ones produced by a human expert was carried out. Thereafter, statistical analysis was employed on the automatically produced and the collected manual transcriptions. Experimental results confirmed that forced-alignment speech recognition can provide accurate and consistent VAD labels....

  7. Speech processing in mobile environments

    CERN Document Server

    Rao, K Sreenivasa

    2014-01-01

    This book focuses on speech processing in the presence of low-bit rate coding and varying background environments. The methods presented in the book exploit the speech events which are robust in noisy environments. Accurate estimation of these crucial events will be useful for carrying out various speech tasks such as speech recognition, speaker recognition and speech rate modification in mobile environments. The authors provide insights into designing and developing robust methods to process the speech in mobile environments. Covering temporal and spectral enhancement methods to minimize the effect of noise and examining methods and models on speech and speaker recognition applications in mobile environments.

  8. Recognizing GSM Digital Speech

    OpenAIRE

    Gallardo-Antolín, Ascensión; Peláez-Moreno, Carmen; Díaz-de-María, Fernando

    2005-01-01

    The Global System for Mobile (GSM) environment encompasses three main problems for automatic speech recognition (ASR) systems: noisy scenarios, source coding distortion, and transmission errors. The first one has already received much attention; however, source coding distortion and transmission errors must be explicitly addressed. In this paper, we propose an alternative front-end for speech recognition over GSM networks. This front-end is specially conceived to be effective against source c...

  9. Recognizing GSM Digital Speech

    OpenAIRE

    2005-01-01

    The Global System for Mobile (GSM) environment encompasses three main problems for automatic speech recognition (ASR) systems: noisy scenarios, source coding distortion, and transmission errors. The first one has already received much attention; however, source coding distortion and transmission errors must be explicitly addressed. In this paper, we propose an alternative front-end for speech recognition over GSM networks. This front-end is specially conceived to be effective against source c...

  10. A Blueprint for a Comprehensive Australian English Auditory-Visual Speech Corpus

    NARCIS (Netherlands)

    Burnham, D.; Ambikairajah, E.; Arciuli, J.; Bennamoun, M.; Best, C.T.; Bird, S.; Butcher, A.R.; Cassidy, S.; Chetty, G.; Cox, F.M.; Cutler, A.; Dale, R.; Epps, J.R.; Fletcher, J.M.; Goecke, R.; Grayden, D.B.; Hajek, J.T.; Ingram, J.C.; Ishihara, S.; Kemp, N.; Kinoshita, Y.; Kuratate, T.; Lewis, T.W.; Loakes, D.E.; Onslow, M.; Powers, D.M.; Rose, P.; Togneri, R.; Tran, D.; Wagner, M.

    2009-01-01

    Large auditory-visual (AV) speech corpora are the grist of modern research in speech science, but no such corpus exists for Australian English. This is unfortunate, for speech science is the brains behind speech technology and applications such as text-to-speech (TTS) synthesis, automatic speech rec

  11. Pattern recognition in speech and language processing

    CERN Document Server

    Chou, Wu

    2003-01-01

    Minimum Classification Error (MSE) Approach in Pattern Recognition, Wu ChouMinimum Bayes-Risk Methods in Automatic Speech Recognition, Vaibhava Goel and William ByrneA Decision Theoretic Formulation for Adaptive and Robust Automatic Speech Recognition, Qiang HuoSpeech Pattern Recognition Using Neural Networks, Shigeru KatagiriLarge Vocabulary Speech Recognition Based on Statistical Methods, Jean-Luc GauvainToward Spontaneous Speech Recognition and Understanding, Sadaoki FuruiSpeaker Authentication, Qi Li and Biing-Hwang JuangHMMs for Language Processing Problems, Ri

  12. Automatic Reading

    Institute of Scientific and Technical Information of China (English)

    胡迪

    2007-01-01

    <正>Reading is the key to school success and,like any skill,it takes practice.A child learns to walk by practising until he no longer has to think about how to put one foot in front of the other.The great athlete practises until he can play quickly,accurately and without thinking.Ed- ucators call it automaticity.

  13. Reflectance Intensity Assisted Automatic and Accurate Extrinsic Calibration of 3D LiDAR and Panoramic Camera Using a Printed Chessboard

    Science.gov (United States)

    Wang, Weimin; Sakurada, Ken; Kawaguchi, Nobuo

    2017-08-01

    This paper presents a novel method for fully automatic and convenient extrinsic calibration of a 3D LiDAR and a panoramic camera with a normally printed chessboard. The proposed method is based on the 3D corner estimation of the chessboard from the sparse point cloud generated by one frame scan of the LiDAR. To estimate the corners, we formulate a full-scale model of the chessboard and fit it to the segmented 3D points of the chessboard. The model is fitted by optimizing the cost function under constraints of correlation between the reflectance intensity of laser and the color of the chessboard's patterns. Powell's method is introduced for resolving the discontinuity problem in optimization. The corners of the fitted model are considered as the 3D corners of the chessboard. Once the corners of the chessboard in the 3D point cloud are estimated, the extrinsic calibration of the two sensors is converted to a 3D-2D matching problem. The corresponding 3D-2D points are used to calculate the absolute pose of the two sensors with Unified Perspective-n-Point (UPnP). Further, the calculated parameters are regarded as initial values and are refined using the Levenberg-Marquardt method. The performance of the proposed corner detection method from the 3D point cloud is evaluated using simulations. The results of experiments, conducted on a Velodyne HDL-32e LiDAR and a Ladybug3 camera under the proposed re-projection error metric, qualitatively and quantitatively demonstrate the accuracy and stability of the final extrinsic calibration parameters.

  14. Collecting and evaluating speech recognition corpora for 11 South African languages

    CSIR Research Space (South Africa)

    Badenhorst, J

    2011-08-01

    Full Text Available The authors describe the Lwazi corpus for automatic speech recognition (ASR), a new telephone speech corpus which contains data from the eleven official languages of South Africa. Because of practical constraints, the amount of speech per language...

  15. Collecting and evaluating speech recognition corpora for nine Southern Bantu languages

    CSIR Research Space (South Africa)

    Badenhorst, JAC

    2009-03-01

    Full Text Available The authors describes the Lwazi corpus for automatic speech recognition (ASR), a new telephone speech corpus which includes data from nine Southern Bantu languages. Because of practical constraints, the amount of speech per language is relatively...

  16. WE-A-17A-10: Fast, Automatic and Accurate Catheter Reconstruction in HDR Brachytherapy Using An Electromagnetic 3D Tracking System

    Energy Technology Data Exchange (ETDEWEB)

    Poulin, E; Racine, E; Beaulieu, L [CHU de Quebec - Universite Laval, Quebec, Quebec (Canada); Binnekamp, D [Integrated Clinical Solutions and Marketing, Philips Healthcare, Best, DA (Netherlands)

    2014-06-15

    Purpose: In high dose rate brachytherapy (HDR-B), actual catheter reconstruction protocols are slow and errors prompt. The purpose of this study was to evaluate the accuracy and robustness of an electromagnetic (EM) tracking system for improved catheter reconstruction in HDR-B protocols. Methods: For this proof-of-principle, a total of 10 catheters were inserted in gelatin phantoms with different trajectories. Catheters were reconstructed using a Philips-design 18G biopsy needle (used as an EM stylet) and the second generation Aurora Planar Field Generator from Northern Digital Inc. The Aurora EM system exploits alternating current technology and generates 3D points at 40 Hz. Phantoms were also scanned using a μCT (GE Healthcare) and Philips Big Bore clinical CT system with a resolution of 0.089 mm and 2 mm, respectively. Reconstructions using the EM stylet were compared to μCT and CT. To assess the robustness of the EM reconstruction, 5 catheters were reconstructed twice and compared. Results: Reconstruction time for one catheter was 10 seconds or less. This would imply that for a typical clinical implant of 17 catheters, the total reconstruction time would be less than 3 minutes. When compared to the μCT, the mean EM tip identification error was 0.69 ± 0.29 mm while the CT error was 1.08 ± 0.67 mm. The mean 3D distance error was found to be 0.92 ± 0.37 mm and 1.74 ± 1.39 mm for the EM and CT, respectively. EM 3D catheter trajectories were found to be significantly more accurate (unpaired t-test, p < 0.05). A mean difference of less than 0.5 mm was found between successive EM reconstructions. Conclusion: The EM reconstruction was found to be faster, more accurate and more robust than the conventional methods used for catheter reconstruction in HDR-B. This approach can be applied to any type of catheters and applicators. We would like to disclose that the equipments, used in this study, is coming from a collaboration with Philips Medical.

  17. Speech transmission index from running speech: A neural network approach

    Science.gov (United States)

    Li, F. F.; Cox, T. J.

    2003-04-01

    Speech transmission index (STI) is an important objective parameter concerning speech intelligibility for sound transmission channels. It is normally measured with specific test signals to ensure high accuracy and good repeatability. Measurement with running speech was previously proposed, but accuracy is compromised and hence applications limited. A new approach that uses artificial neural networks to accurately extract the STI from received running speech is developed in this paper. Neural networks are trained on a large set of transmitted speech examples with prior knowledge of the transmission channels' STIs. The networks perform complicated nonlinear function mappings and spectral feature memorization to enable accurate objective parameter extraction from transmitted speech. Validations via simulations demonstrate the feasibility of this new method on a one-net-one-speech extract basis. In this case, accuracy is comparable with normal measurement methods. This provides an alternative to standard measurement techniques, and it is intended that the neural network method can facilitate occupied room acoustic measurements.

  18. Tone realisation in a Yoruba speech recognition corpus

    CSIR Research Space (South Africa)

    Van Niekerk, D

    2012-05-01

    Full Text Available The authors investigate the acoustic realisation of tone in short continuous utterances in Yoruba. Fundamental frequency contours are extracted for automatically aligned syllables from a speech corpus of 33 speakers collected for speech recognition...

  19. Public Speech.

    Science.gov (United States)

    Green, Thomas F.

    1994-01-01

    Discusses the importance of public speech in society, noting the power of public speech to create a world and a public. The paper offers a theory of public speech, identifies types of public speech, and types of public speech fallacies. Two ways of speaking of the public and of public life are distinguished. (SM)

  20. Speech Recognition on Mobile Devices

    DEFF Research Database (Denmark)

    Tan, Zheng-Hua; Lindberg, Børge

    2010-01-01

    The enthusiasm of deploying automatic speech recognition (ASR) on mobile devices is driven both by remarkable advances in ASR technology and by the demand for efficient user interfaces on such devices as mobile phones and personal digital assistants (PDAs). This chapter presents an overview of ASR...... in the mobile context covering motivations, challenges, fundamental techniques and applications. Three ASR architectures are introduced: embedded speech recognition, distributed speech recognition and network speech recognition. Their pros and cons and implementation issues are discussed. Applications within...... command and control, text entry and search are presented with an emphasis on mobile text entry....

  1. Automatic discovery on auxiliary word usage-based part-of-speech and segmentation errors for Chinese language%基于助词用法的汉语词性、分词错误自动发现

    Institute of Scientific and Technical Information of China (English)

    韩英杰; 张坤丽; 昝红英; 柴玉梅

    2011-01-01

    During the construction of auxiliary words knowledge base, used rule-based automatic annotation on auxiliary word's usage.After automatic annotation, found words part-of-speech and segmentation errors in annotated corpus.The discovery is benefit for the high quality chinese corpus and the development of the processing depth.%在构建助词知识库、标注大规模语料过程中使用了基于规则的助词用法自动标注的方法;对标注后的语料,发现基于规则的助词用法自动标注方法能够自动发现语料的部分词性、分词错误.这些错误的发现对研制高质量的语料库起到了积极的促进作用,并将语料加工深度向前推进.

  2. Speech Problems

    Science.gov (United States)

    ... of your treatment plan may include seeing a speech therapist , a person who is trained to treat speech disorders. How often you have to see the speech therapist will vary — you'll probably start out seeing ...

  3. INTEGRATING MACHINE TRANSLATION AND SPEECH SYNTHESIS COMPONENT FOR ENGLISH TO DRAVIDIAN LANGUAGE SPEECH TO SPEECH TRANSLATION SYSTEM

    Directory of Open Access Journals (Sweden)

    J. SANGEETHA

    2015-02-01

    Full Text Available This paper provides an interface between the machine translation and speech synthesis system for converting English speech to Tamil text in English to Tamil speech to speech translation system. The speech translation system consists of three modules: automatic speech recognition, machine translation and text to speech synthesis. Many procedures for incorporation of speech recognition and machine translation have been projected. Still speech synthesis system has not yet been measured. In this paper, we focus on integration of machine translation and speech synthesis, and report a subjective evaluation to investigate the impact of speech synthesis, machine translation and the integration of machine translation and speech synthesis components. Here we implement a hybrid machine translation (combination of rule based and statistical machine translation and concatenative syllable based speech synthesis technique. In order to retain the naturalness and intelligibility of synthesized speech Auto Associative Neural Network (AANN prosody prediction is used in this work. The results of this system investigation demonstrate that the naturalness and intelligibility of the synthesized speech are strongly influenced by the fluency and correctness of the translated text.

  4. Activation and application of an obligatory phonotactic constraint in German during automatic speech processing is revealed by human event-related potentials.

    Science.gov (United States)

    Steinberg, Johanna; Truckenbrodt, Hubert; Jacobsen, Thomas

    2010-07-01

    In auditory speech processing, implicit linguistic knowledge is activated and applied on phonetic and segment-related phonological processing level even if the perceived sound sequence is outside the focus of attention. In this study, the effects of language-specific phonotactic restrictions on pre-attentive auditory speech processing were investigated, using the Mismatch Negativity component of the human event-related brain potential. In German grammar, the distribution of the velar and the palatal dorsal fricative is limited by an obligatory phonotactic constraint, Dorsal Fricative Assimilation, which demands that a vowel and a following dorsal fricative must have the same specifications for articulatory backness. For passive oddball stimulation, we used three phonotactically correct VC syllables and one incorrect VC syllable, composed of the vowels [epsilon] and [open o] and the fricatives [ç] and []. Stimuli were contrasted pairwise in experimental oddball blocks in a way that they differed in regard to their respective vowel but shared the fricative. Additionally to the usual Mismatch Negativity which is attributable to the change of the initial vowel and which was elicited by all deviants, we observed a second negative deflection in the deviant ERP elicited by the phonotactically ill-formed syllable only. This negativity cannot be attributed to any acoustical or phonemic difference between standard and deviant, it rather reflects the effect of a phonotactic evaluation process after both sounds of the syllable were identified. Our finding suggests that implicit phonotactic knowledge is activated and applied even outside the focus of the participants' attention.

  5. A real-time phoneme counting algorithm and application for speech rate monitoring.

    Science.gov (United States)

    Aharonson, Vered; Aharonson, Eran; Raichlin-Levi, Katia; Sotzianu, Aviv; Amir, Ofer; Ovadia-Blechman, Zehava

    2017-03-01

    Adults who stutter can learn to control and improve their speech fluency by modifying their speaking rate. Existing speech therapy technologies can assist this practice by monitoring speaking rate and providing feedback to the patient, but cannot provide an accurate, quantitative measurement of speaking rate. Moreover, most technologies are too complex and costly to be used for home practice. We developed an algorithm and a smartphone application that monitor a patient's speaking rate in real time and provide user-friendly feedback to both patient and therapist. Our speaking rate computation is performed by a phoneme counting algorithm which implements spectral transition measure extraction to estimate phoneme boundaries. The algorithm is implemented in real time in a mobile application that presents its results in a user-friendly interface. The application incorporates two modes: one provides the patient with visual feedback of his/her speech rate for self-practice and another provides the speech therapist with recordings, speech rate analysis and tools to manage the patient's practice. The algorithm's phoneme counting accuracy was validated on ten healthy subjects who read a paragraph at slow, normal and fast paces, and was compared to manual counting of speech experts. Test-retest and intra-counter reliability were assessed. Preliminary results indicate differences of -4% to 11% between automatic and human phoneme counting. Differences were largest for slow speech. The application can thus provide reliable, user-friendly, real-time feedback for speaking rate control practice.

  6. 维吾尔语语音识别中发音变异现象%Uyghur pronunciation variations in automatic speech recognition systems

    Institute of Scientific and Technical Information of China (English)

    杨雅婷; 马博; 王磊; 吐尔洪·吾司曼; 李晓

    2011-01-01

    The recognition rate in Uyghur standard speech recognition systems is relatively low due to pronunciation variations in the corpus.The assimilation,weakening,loss,and vowel harmony in Uyghur speech were analyzed to provide a better understanding of the voice and rhythm characteristics.The Uyghur accent pronunciation variation rules are summarized by combining knowledge-based and data-driven methods with mapping of vowel and consonant pairs and developing of a phoneme confusion matrix.%维语口语发音中很多音素相对标准语产生了发音变异,基于标准语音的识别系统在识别带有发音变异的口语语料时识别率较低。该文针对维吾尔语同化、弱化、脱落、元音和谐等语流音变难点进行分析,对语音、韵律特性进行知识融合与技术创新,运用基于数据驱动和基于专家经验相结合的方法对维吾尔语方言口语中存在的发音变异现象进行研究,统计元音、辅音多发音变化映射对,建立音素混淆矩阵,为维吾尔语方言口语语音识别研究奠定基础。

  7. Automatic differentiation bibliography

    Energy Technology Data Exchange (ETDEWEB)

    Corliss, G.F. (comp.)

    1992-07-01

    This is a bibliography of work related to automatic differentiation. Automatic differentiation is a technique for the fast, accurate propagation of derivative values using the chain rule. It is neither symbolic nor numeric. Automatic differentiation is a fundamental tool for scientific computation, with applications in optimization, nonlinear equations, nonlinear least squares approximation, stiff ordinary differential equation, partial differential equations, continuation methods, and sensitivity analysis. This report is an updated version of the bibliography which originally appeared in Automatic Differentiation of Algorithms: Theory, Implementation, and Application.

  8. Key Arithmetic of Image Accurate Registration for Automatic PCB Drill Photoelectric Inspection%PCB钻孔自动光电检测精确配准关键算法

    Institute of Scientific and Technical Information of China (English)

    黄永林; 叶玉堂; 陈镇龙; 刘霖

    2011-01-01

    In order to realize the fast automatic PCB drill inspection, a fast arithmetic of image registration between standard PCB and testing PCB is proposed. The basic process of the image algorithm and the system structure are introduced. The process of registration is mainly consisted of approximate registration and accurate registration. Edge extraction and Hough transformation are used to obtain 4 comers coordinate of PCB firstly; the coordinates of location hole of the testing PCB can be obtained after approximate registration; the coordinates of circle center of location hole between the testing PCB and standard PCB are used to process accurate registration. Experience shows the method greatly improved the accuracy of image registration. The speed of the arithmetic also meets the needs of the real time process.%为了实现快速印制电路板(PCB)钻孔自动检测,提出了一种PCB待测板和标准板快速自动配置算法.介绍了PCB钻孔自动光学检测系统的基本结构.采用粗匹配和精匹配两步进行配准的方法,通过边缘提取和Hough变换得到待测板4个角点坐标,利用角点坐标和标准板角点的关系进行粗配准,并找到待测板的定位孔;然后利用待测板定位孔圆心和标准板定位孔圆心的关系进行精确配准.实验证明,该算法计算速度和精度完全能够满足实际要求.

  9. Automatic translation among spoken languages

    Science.gov (United States)

    Walter, Sharon M.; Costigan, Kelly

    1994-01-01

    The Machine Aided Voice Translation (MAVT) system was developed in response to the shortage of experienced military field interrogators with both foreign language proficiency and interrogation skills. Combining speech recognition, machine translation, and speech generation technologies, the MAVT accepts an interrogator's spoken English question and translates it into spoken Spanish. The spoken Spanish response of the potential informant can then be translated into spoken English. Potential military and civilian applications for automatic spoken language translation technology are discussed in this paper.

  10. Speech Recognition System Architecture for Gujarati Language

    National Research Council Canada - National Science Library

    Jinal H Tailor; Dipti B Shah

    2016-01-01

    .... To achieve good accuracy and efficiency of Automatic Speech Recognition (ASR) system for Indian Gujarati language is challenging task due to its morphology, language barriers, different dialects, and unavailability of resources...

  11. Automatic Sarcasm Detection: A Survey

    OpenAIRE

    Joshi, Aditya; Bhattacharyya, Pushpak; Carman, Mark James

    2016-01-01

    Automatic sarcasm detection is the task of predicting sarcasm in text. This is a crucial step to sentiment analysis, considering prevalence and challenges of sarcasm in sentiment-bearing text. Beginning with an approach that used speech-based features, sarcasm detection has witnessed great interest from the sentiment analysis community. This paper is the first known compilation of past work in automatic sarcasm detection. We observe three milestones in the research so far: semi-supervised pat...

  12. How should a speech recognizer work?

    NARCIS (Netherlands)

    Scharenborg, O.E.; Norris, D.G.; Bosch, L.F.M. ten; McQueen, J.M.

    2005-01-01

    Although researchers studying human speech recognition (HSR) and automatic speech recognition (ASR) share a common interest in how information processing systems (human or machine) recognize spoken language, there is little communication between the two disciplines. We suggest that this lack of comm

  13. Second Language Learners and Speech Act Comprehension

    Science.gov (United States)

    Holtgraves, Thomas

    2007-01-01

    Recognizing the specific speech act ( Searle, 1969) that a speaker performs with an utterance is a fundamental feature of pragmatic competence. Past research has demonstrated that native speakers of English automatically recognize speech acts when they comprehend utterances (Holtgraves & Ashley, 2001). The present research examined whether this…

  14. Pronunciation Modeling for Large Vocabulary Speech Recognition

    Science.gov (United States)

    Kantor, Arthur

    2010-01-01

    The large pronunciation variability of words in conversational speech is one of the major causes of low accuracy in automatic speech recognition (ASR). Many pronunciation modeling approaches have been developed to address this problem. Some explicitly manipulate the pronunciation dictionary as well as the set of the units used to define the…

  15. Second Language Learners and Speech Act Comprehension

    Science.gov (United States)

    Holtgraves, Thomas

    2007-01-01

    Recognizing the specific speech act ( Searle, 1969) that a speaker performs with an utterance is a fundamental feature of pragmatic competence. Past research has demonstrated that native speakers of English automatically recognize speech acts when they comprehend utterances (Holtgraves & Ashley, 2001). The present research examined whether this…

  16. Robust automatic intelligibility assessment techniques evaluated on speakers treated for head and neck cancer

    NARCIS (Netherlands)

    Middag, C.; Clapham, R.; van Son, R.; Martens, J-P.

    2014-01-01

    It is generally acknowledged that an unbiased and objective assessment of the communication deficiency caused by a speech disorder calls for automatic speech processing tools. In this paper, a new automatic intelligibility assessment method is presented. The method can predict running speech intelli

  17. Speech Matters

    DEFF Research Database (Denmark)

    Hasse Jørgensen, Stina

    2011-01-01

    About Speech Matters - Katarina Gregos, the Greek curator's exhibition at the Danish Pavillion, the Venice Biannual 2011.......About Speech Matters - Katarina Gregos, the Greek curator's exhibition at the Danish Pavillion, the Venice Biannual 2011....

  18. Transcribing Disordered Speech: By Target or by Production?

    Science.gov (United States)

    Ball, Martin J.

    2008-01-01

    The ability to transcribe disordered speech is a vital tool for speech-language pathologists, as accurate description of a client's speech output is needed for both diagnosis and effective intervention. Clients in the speech clinic often use sounds that are not part of the target sound system and which may, in some cases, be sounds not found in…

  19. Speech Development

    Science.gov (United States)

    ... The speech-language pathologist should consistently assess your child’s speech and language development, as well as screen for hearing problems (with ... and caregivers play a vital role in a child’s speech and language development. It is important that you talk to your ...

  20. Annotating Speech Corpus for Prosody Modeling in Indian Language Text to Speech Systems

    Directory of Open Access Journals (Sweden)

    Kiruthiga S

    2012-01-01

    Full Text Available A spoken language system, it may either be a speech synthesis or a speech recognition system, starts with building a speech corpora. We give a detailed survey of issues and a methodology that selects the appropriate speech unit in building a speech corpus for Indian language Text to Speech systems. The paper ultimately aims to improve the intelligibility of the synthesized speech in Text to Speech synthesis systems. To begin with, an appropriate text file should be selected for building the speech corpus. Then a corresponding speech file is generated and stored. This speech file is the phonetic representation of the selected text file. The speech file is processed in different levels viz., paragraphs, sentences, phrases, words, syllables and phones. These are called the speech units of the file. Researches have been done taking these units as the basic unit for processing. This paper analyses the researches done using phones, diphones, triphones, syllables and polysyllables as their basic unit for speech synthesis. The paper also provides a recommended set of combinations for polysyllables. Concatenative speech synthesis involves the concatenation of these basic units to synthesize an intelligent, natural sounding speech. The speech units are annotated with relevant prosodic information about each unit, manually or automatically, based on an algorithm. The database consisting of the units along with their annotated information is called as the annotated speech corpus. A Clustering technique is used in the annotated speech corpus that provides way to select the appropriate unit for concatenation, based on the lowest total join cost of the speech unit.

  1. Text as a Supplement to Speech in Young and Older Adults a)

    Science.gov (United States)

    Krull, Vidya; Humes, Larry E.

    2015-01-01

    Objective The purpose of this experiment was to quantify the contribution of visual text to auditory speech recognition in background noise. Specifically, we tested the hypothesis that partially accurate visual text from an automatic speech recognizer could be used successfully to supplement speech understanding in difficult listening conditions in older adults, with normal or impaired hearing. Our working hypotheses were based on what is known regarding audiovisual speech perception in the elderly from speechreading literature. We hypothesized that: 1) combining auditory and visual text information will result in improved recognition accuracy compared to auditory or visual text information alone; 2) benefit from supplementing speech with visual text (auditory and visual enhancement) in young adults will be greater than that in older adults; and 3) individual differences in performance on perceptual measures would be associated with cognitive abilities. Design Fifteen young adults with normal hearing, fifteen older adults with normal hearing, and fifteen older adults with hearing loss participated in this study. All participants completed sentence recognition tasks in auditory-only, text-only, and combined auditory-text conditions. The auditory sentence stimuli were spectrally shaped to restore audibility for the older participants with impaired hearing. All participants also completed various cognitive measures, including measures of working memory, processing speed, verbal comprehension, perceptual and cognitive speed, processing efficiency, inhibition, and the ability to form wholes from parts. Group effects were examined for each of the perceptual and cognitive measures. Audiovisual benefit was calculated relative to performance on auditory-only and visual-text only conditions. Finally, the relationship between perceptual measures and other independent measures were examined using principal-component factor analyses, followed by regression analyses. Results

  2. Alternative Speech Communication System for Persons with Severe Speech Disorders

    Science.gov (United States)

    Selouani, Sid-Ahmed; Sidi Yakoub, Mohammed; O'Shaughnessy, Douglas

    2009-12-01

    Assistive speech-enabled systems are proposed to help both French and English speaking persons with various speech disorders. The proposed assistive systems use automatic speech recognition (ASR) and speech synthesis in order to enhance the quality of communication. These systems aim at improving the intelligibility of pathologic speech making it as natural as possible and close to the original voice of the speaker. The resynthesized utterances use new basic units, a new concatenating algorithm and a grafting technique to correct the poorly pronounced phonemes. The ASR responses are uttered by the new speech synthesis system in order to convey an intelligible message to listeners. Experiments involving four American speakers with severe dysarthria and two Acadian French speakers with sound substitution disorders (SSDs) are carried out to demonstrate the efficiency of the proposed methods. An improvement of the Perceptual Evaluation of the Speech Quality (PESQ) value of 5% and more than 20% is achieved by the speech synthesis systems that deal with SSD and dysarthria, respectively.

  3. Current trends in multilingual speech processing

    Indian Academy of Sciences (India)

    Hervé Bourlard; John Dines; Mathew Magimai-Doss; Philip N Garner; David Imseng; Petr Motlicek; Hui Liang; Lakshmi Saheer; Fabio Valente

    2011-10-01

    In this paper, we describe recent work at Idiap Research Institute in the domain of multilingual speech processing and provide some insights into emerging challenges for the research community. Multilingual speech processing has been a topic of ongoing interest to the research community for many years and the field is now receiving renewed interest owing to two strong driving forces. Firstly, technical advances in speech recognition and synthesis are posing new challenges and opportunities to researchers. For example, discriminative features are seeing wide application by the speech recognition community, but additional issues arise when using such features in a multilingual setting. Another example is the apparent convergence of speech recognition and speech synthesis technologies in the form of statistical parametric methodologies. This convergence enables the investigation of new approaches to unified modelling for automatic speech recognition and text-to-speech synthesis (TTS) as well as cross-lingual speaker adaptation for TTS. The second driving force is the impetus being provided by both government and industry for technologies to help break down domestic and international language barriers, these also being barriers to the expansion of policy and commerce. Speech-to-speech and speech-to-text translation are thus emerging as key technologies at the heart of which lies multilingual speech processing.

  4. System and method for characterizing voiced excitations of speech and acoustic signals, removing acoustic noise from speech, and synthesizing speech

    Energy Technology Data Exchange (ETDEWEB)

    Burnett, Greg C.; Holzrichter, John F.; Ng, Lawrence C.

    2006-02-14

    The present invention is a system and method for characterizing human (or animate) speech voiced excitation functions and acoustic signals, for removing unwanted acoustic noise which often occurs when a speaker uses a microphone in common environments, and for synthesizing personalized or modified human (or other animate) speech upon command from a controller. A low power EM sensor is used to detect the motions of windpipe tissues in the glottal region of the human speech system before, during, and after voiced speech is produced by a user. From these tissue motion measurements, a voiced excitation function can be derived. Further, the excitation function provides speech production information to enhance noise removal from human speech and it enables accurate transfer functions of speech to be obtained. Previously stored excitation and transfer functions can be used for synthesizing personalized or modified human speech. Configurations of EM sensor and acoustic microphone systems are described to enhance noise cancellation and to enable multiple articulator measurements.

  5. System and method for characterizing voiced excitations of speech and acoustic signals, removing acoustic noise from speech, and synthesizing speech

    Energy Technology Data Exchange (ETDEWEB)

    Burnett, Greg C. (Livermore, CA); Holzrichter, John F. (Berkeley, CA); Ng, Lawrence C. (Danville, CA)

    2006-08-08

    The present invention is a system and method for characterizing human (or animate) speech voiced excitation functions and acoustic signals, for removing unwanted acoustic noise which often occurs when a speaker uses a microphone in common environments, and for synthesizing personalized or modified human (or other animate) speech upon command from a controller. A low power EM sensor is used to detect the motions of windpipe tissues in the glottal region of the human speech system before, during, and after voiced speech is produced by a user. From these tissue motion measurements, a voiced excitation function can be derived. Further, the excitation function provides speech production information to enhance noise removal from human speech and it enables accurate transfer functions of speech to be obtained. Previously stored excitation and transfer functions can be used for synthesizing personalized or modified human speech. Configurations of EM sensor and acoustic microphone systems are described to enhance noise cancellation and to enable multiple articulator measurements.

  6. System and method for characterizing voiced excitations of speech and acoustic signals, removing acoustic noise from speech, and synthesizing speech

    Energy Technology Data Exchange (ETDEWEB)

    Burnett, Greg C.; Holzrichter, John F.; Ng, Lawrence C.

    2004-03-23

    The present invention is a system and method for characterizing human (or animate) speech voiced excitation functions and acoustic signals, for removing unwanted acoustic noise which often occurs when a speaker uses a microphone in common environments, and for synthesizing personalized or modified human (or other animate) speech upon command from a controller. A low power EM sensor is used to detect the motions of windpipe tissues in the glottal region of the human speech system before, during, and after voiced speech is produced by a user. From these tissue motion measurements, a voiced excitation function can be derived. Further, the excitation function provides speech production information to enhance noise removal from human speech and it enables accurate transfer functions of speech to be obtained. Previously stored excitation and transfer functions can be used for synthesizing personalized or modified human speech. Configurations of EM sensor and acoustic microphone systems are described to enhance noise cancellation and to enable multiple articulator measurements.

  7. The Phase Spectra Based Feature for Robust Speech Recognition

    Directory of Open Access Journals (Sweden)

    Abbasian ALI

    2009-07-01

    Full Text Available Speech recognition in adverse environment is one of the major issue in automatic speech recognition nowadays. While most current speech recognition system show to be highly efficient for ideal environment but their performance go down extremely when they are applied in real environment because of noise effected speech. In this paper a new feature representation based on phase spectra and Perceptual Linear Prediction (PLP has been suggested which can be used for robust speech recognition. It is shown that this new features can improve the performance of speech recognition not only in clean condition but also in various levels of noise condition when it is compared to PLP features.

  8. Epoch-based analysis of speech signals

    Indian Academy of Sciences (India)

    B Yegnanarayana; Suryakanth V Gangashetty

    2011-10-01

    Speech analysis is traditionally performed using short-time analysis to extract features in time and frequency domains. The window size for the analysis is fixed somewhat arbitrarily, mainly to account for the time varying vocal tract system during production. However, speech in its primary mode of excitation is produced due to impulse-like excitation in each glottal cycle. Anchoring the speech analysis around the glottal closure instants (epochs) yields significant benefits for speech analysis. Epoch-based analysis of speech helps not only to segment the speech signals based on speech production characteristics, but also helps in accurate analysis of speech. It enables extraction of important acoustic-phonetic features such as glottal vibrations, formants, instantaneous fundamental frequency, etc. Epoch sequence is useful to manipulate prosody in speech synthesis applications. Accurate estimation of epochs helps in characterizing voice quality features. Epoch extraction also helps in speech enhancement and multispeaker separation. In this tutorial article, the importance of epochs for speech analysis is discussed, and methods to extract the epoch information are reviewed. Applications of epoch extraction for some speech applications are demonstrated.

  9. Perception of Speech Sounds in School-Aged Children with Speech Sound Disorders.

    Science.gov (United States)

    Preston, Jonathan L; Irwin, Julia R; Turcios, Jacqueline

    2015-11-01

    Children with speech sound disorders may perceive speech differently than children with typical speech development. The nature of these speech differences is reviewed with an emphasis on assessing phoneme-specific perception for speech sounds that are produced in error. Category goodness judgment, or the ability to judge accurate and inaccurate tokens of speech sounds, plays an important role in phonological development. The software Speech Assessment and Interactive Learning System, which has been effectively used to assess preschoolers' ability to perform goodness judgments, is explored for school-aged children with residual speech errors (RSEs). However, data suggest that this particular task may not be sensitive to perceptual differences in school-aged children. The need for the development of clinical tools for assessment of speech perception in school-aged children with RSE is highlighted, and clinical suggestions are provided. Thieme Medical Publishers 333 Seventh Avenue, New York, NY 10001, USA.

  10. CONCURRENT SPEECHES SEPARATION USING WRAPPED DISCRETE FOURIER TRANSFORM

    Institute of Scientific and Technical Information of China (English)

    Zhang Xichun; Li Yunjie; Zhang Jun; Wei Gang

    2005-01-01

    This letter proposes a new method for concurrent voiced speech separation. Firstly the Wrapped Discrete Fourier Transform (WDFT) is used to decompose the harmonic spectra of the mixed speeches. Then the individual speech is reconstructed by using the sinusoidal speech model. By taking advantage of the non-uniform frequency resolution of WDFT, harmonic spectra parameters can be estimated and separated accurately. Experimental results on mixed vowels separation show that the proposed method can recover the original speeches effectively.

  11. Speech recognition from spectral dynamics

    Indian Academy of Sciences (India)

    Hynek Hermansky

    2011-10-01

    Information is carried in changes of a signal. The paper starts with revisiting Dudley’s concept of the carrier nature of speech. It points to its close connection to modulation spectra of speech and argues against short-term spectral envelopes as dominant carriers of the linguistic information in speech. The history of spectral representations of speech is briefly discussed. Some of the history of gradual infusion of the modulation spectrum concept into Automatic recognition of speech (ASR) comes next, pointing to the relationship of modulation spectrum processing to wellaccepted ASR techniques such as dynamic speech features or RelAtive SpecTrAl (RASTA) filtering. Next, the frequency domain perceptual linear prediction technique for deriving autoregressive models of temporal trajectories of spectral power in individual frequency bands is reviewed. Finally, posterior-based features, which allow for straightforward application of modulation frequency domain information, are described. The paper is tutorial in nature, aims at a historical global overview of attempts for using spectral dynamics in machine recognition of speech, and does not always provide enough detail of the described techniques. However, extensive references to earlier work are provided to compensate for the lack of detail in the paper.

  12. Speech is Golden

    DEFF Research Database (Denmark)

    Juel Henrichsen, Peter

    2014-01-01

    Most of the Danish municipalities are ready to begin to adopt automatic speech recognition, but at the same time remain nervous following a long series of bad business cases in the recent past. Complaints are voiced over costly licences and low service levels, typical effects of a de facto monopoly...... on the supply side. The present article reports on a new public action strategy which has taken shape in the course of 2013-14. While Denmark is a small language area, our public sector is well organised and has considerable purchasing power. Across this past year, Danish local authorities have organised around...... of the present article, in the role of economically neutral advisers. The aim of the initiative is to pave the way for the first profitable contract in the field - which we hope to see in 2014 - an event which would precisely break the present deadlock and open up a billion EUR market for speech technology...

  13. A Survey on Statistical Based Single Channel Speech Enhancement Techniques

    Directory of Open Access Journals (Sweden)

    Sunnydayal. V

    2014-11-01

    Full Text Available Speech enhancement is a long standing problem with various applications like hearing aids, automatic recognition and coding of speech signals. Single channel speech enhancement technique is used for enhancement of the speech degraded by additive background noises. The background noise can have an adverse impact on our ability to converse without hindrance or smoothly in very noisy environments, such as busy streets, in a car or cockpit of an airplane. Such type of noises can affect quality and intelligibility of speech. This is a survey paper and its object is to provide an overview of speech enhancement algorithms so that enhance the noisy speech signal which is corrupted by additive noise. The algorithms are mainly based on statistical based approaches. Different estimators are compared. Challenges and Opportunities of speech enhancement are also discussed. This paper helps in choosing the best statistical based technique for speech enhancement

  14. Plowing Speech

    OpenAIRE

    Zla ba sgrol ma

    2009-01-01

    This file contains a plowing speech and a discussion about the speech This collection presents forty-nine audio files including: several folk song genres; folktales and; local history from the Sman shad Valley of Sde dge county World Oral Literature Project

  15. Speech coding

    Energy Technology Data Exchange (ETDEWEB)

    Ravishankar, C., Hughes Network Systems, Germantown, MD

    1998-05-08

    Speech is the predominant means of communication between human beings and since the invention of the telephone by Alexander Graham Bell in 1876, speech services have remained to be the core service in almost all telecommunication systems. Original analog methods of telephony had the disadvantage of speech signal getting corrupted by noise, cross-talk and distortion Long haul transmissions which use repeaters to compensate for the loss in signal strength on transmission links also increase the associated noise and distortion. On the other hand digital transmission is relatively immune to noise, cross-talk and distortion primarily because of the capability to faithfully regenerate digital signal at each repeater purely based on a binary decision. Hence end-to-end performance of the digital link essentially becomes independent of the length and operating frequency bands of the link Hence from a transmission point of view digital transmission has been the preferred approach due to its higher immunity to noise. The need to carry digital speech became extremely important from a service provision point of view as well. Modem requirements have introduced the need for robust, flexible and secure services that can carry a multitude of signal types (such as voice, data and video) without a fundamental change in infrastructure. Such a requirement could not have been easily met without the advent of digital transmission systems, thereby requiring speech to be coded digitally. The term Speech Coding is often referred to techniques that represent or code speech signals either directly as a waveform or as a set of parameters by analyzing the speech signal. In either case, the codes are transmitted to the distant end where speech is reconstructed or synthesized using the received set of codes. A more generic term that is applicable to these techniques that is often interchangeably used with speech coding is the term voice coding. This term is more generic in the sense that the

  16. Automatic sequences

    CERN Document Server

    Haeseler, Friedrich

    2003-01-01

    Automatic sequences are sequences which are produced by a finite automaton. Although they are not random they may look as being random. They are complicated, in the sense of not being not ultimately periodic, they may look rather complicated, in the sense that it may not be easy to name the rule by which the sequence is generated, however there exists a rule which generates the sequence. The concept automatic sequences has special applications in algebra, number theory, finite automata and formal languages, combinatorics on words. The text deals with different aspects of automatic sequences, in particular:· a general introduction to automatic sequences· the basic (combinatorial) properties of automatic sequences· the algebraic approach to automatic sequences· geometric objects related to automatic sequences.

  17. The impact of speech material on speech judgement in children with and without cleft palate.

    Science.gov (United States)

    Klintö, Kristina; Salameh, Eva-Kristina; Svensson, Henry; Lohmander, Anette

    2011-01-01

    The chosen method of speech assessment, including type of speech material, may affect speech judgement in children with cleft palate. To assess the effect of different speech materials on speech judgement in 5-year-old children born with or without cleft palate, as well as the reliability of materials by means of intra- and inter-transcriber agreement of consonant transcriptions. Altogether 40 children were studied, 20 born with cleft palate, 20 without. The children were audio recorded at 5 years of age. Speech materials used were: single-word naming, sentence repetition (both developed for cleft palate speech assessment), retelling of a narrative and conversational speech. The samples were phonetically transcribed and inter- and intra-transcriber agreement was calculated. Percentage correct consonants (PCC), percentage correct places (PCP), percentage correct manners (PCM), and percentage active cleft speech characteristics (CSC) were assessed. In addition, an analysis of phonological simplification processes (PSP) was performed. The PCC and CSC results were significantly more accurate in word naming than in all other speech materials in the children with cleft palate, who also achieved more accurate PCP results in word naming than in sentence repetition and conversational speech. Regarding PCM and PSP, performance was significantly more accurate in word naming than in conversational speech. Children without cleft palate did better, irrespective of the speech material. The medians of intra- and inter-transcriber agreement were good in both groups and all speech materials. The closest agreement in the cleft palate group was seen in word naming and the weakest in the retelling task. The results indicate that word naming is the most reliable speech material when the purpose is to assess the best speech performance of a child with cleft palate. If the purpose is to assess connected speech, sentence repetition is a reliable and also valid speech material, with good

  18. Predicting speech intelligibility in adverse conditions: evaluation of the speech-based envelope power spectrum model

    DEFF Research Database (Denmark)

    2011-01-01

    The speech-based envelope power spectrum model (sEPSM) [Jørgensen and Dau (2011). J. Acoust. Soc. Am., 130 (3), 1475–1487] estimates the envelope signal-to-noise ratio (SNRenv) of distorted speech and accurately describes the speech recognition thresholds (SRT) for normal-hearing listeners...... conditions by comparing predictions to measured data from [Kjems et al. (2009). J. Acoust. Soc. Am. 126 (3), 1415-1426] where speech is mixed with four different interferers, including speech-shaped noise, bottle noise, car noise, and cafe noise. The model accounts well for the differences in intelligibility...

  19. Unusual ictal foreign language automatisms in temporal lobe epilepsy.

    Science.gov (United States)

    Soe, Naing Ko; Lee, Sang Kun

    2014-12-01

    The distinct brain regions could be specifically involved in different languages and the differences in brain activation depending on the language proficiency and on the age of language acquisition. Speech disturbances are observed in the majority of temporal lobe complex motor seizures. Ictal verbalization had significant lateralization value: 90% of patients with this manifestation had seizure focus in the non-dominant temporal lobe. Although, ictal speech automatisms are usually uttered in the patient's native language, ictal speech foreign language automatisms are unusual presentations of non-dominent temporal lobe epilepsy. The release of isolated foreign language area could be possible depending on the pattern of ictal spreading of non-dominant hemisphere. Most of the case reports in ictal speech foreign language automatisms were men. In this case report, we observed ictal foreign language automatisms in middle age Korean woman.

  20. Objective voice and speech analysis of persons with chronic hoarseness by prosodic analysis of speech samples.

    Science.gov (United States)

    Haderlein, Tino; Döllinger, Michael; Matoušek, Václav; Nöth, Elmar

    2016-10-01

    Automatic voice assessment is often performed using sustained vowels. In contrast, speech analysis of read-out texts can be applied to voice and speech assessment. Automatic speech recognition and prosodic analysis were used to find regression formulae between automatic and perceptual assessment of four voice and four speech criteria. The regression was trained with 21 men and 62 women (average age 49.2 years) and tested with another set of 24 men and 49 women (48.3 years), all suffering from chronic hoarseness. They read the text 'Der Nordwind und die Sonne' ('The North Wind and the Sun'). Five voice and speech therapists evaluated the data on 5-point Likert scales. Ten prosodic and recognition accuracy measures (features) were identified which describe all the examined criteria. Inter-rater correlation within the expert group was between r = 0.63 for the criterion 'match of breath and sense units' and r = 0.87 for the overall voice quality. Human-machine correlation was between r = 0.40 for the match of breath and sense units and r = 0.82 for intelligibility. The perceptual ratings of different criteria were highly correlated with each other. Likewise, the feature sets modeling the criteria were very similar. The automatic method is suitable for assessing chronic hoarseness in general and for subgroups of functional and organic dysphonia. In its current version, it is almost as reliable as a randomly picked rater from a group of voice and speech therapists.

  1. An articulatorily constrained, maximum entropy approach to speech recognition and speech coding

    Energy Technology Data Exchange (ETDEWEB)

    Hogden, J.

    1996-12-31

    Hidden Markov models (HMM`s) are among the most popular tools for performing computer speech recognition. One of the primary reasons that HMM`s typically outperform other speech recognition techniques is that the parameters used for recognition are determined by the data, not by preconceived notions of what the parameters should be. This makes HMM`s better able to deal with intra- and inter-speaker variability despite the limited knowledge of how speech signals vary and despite the often limited ability to correctly formulate rules describing variability and invariance in speech. In fact, it is often the case that when HMM parameter values are constrained using the limited knowledge of speech, recognition performance decreases. However, the structure of an HMM has little in common with the mechanisms underlying speech production. Here, the author argues that by using probabilistic models that more accurately embody the process of speech production, he can create models that have all the advantages of HMM`s, but that should more accurately capture the statistical properties of real speech samples--presumably leading to more accurate speech recognition. The model he will discuss uses the fact that speech articulators move smoothly and continuously. Before discussing how to use articulatory constraints, he will give a brief description of HMM`s. This will allow him to highlight the similarities and differences between HMM`s and the proposed technique.

  2. ARMA Modelling for Whispered Speech

    Institute of Scientific and Technical Information of China (English)

    Xue-li LI; Wei-dong ZHOU

    2010-01-01

    The Autoregressive Moving Average (ARMA) model for whispered speech is proposed. Compared with normal speech, whispered speech has no fundamental frequency because of the glottis being semi-opened and turbulent flow being created, and formant shifting exists in the lower frequency region due to the narrowing of the tract in the false vocal fold regions and weak acoustic coupling with the subglottal system. Analysis shows that the effect of the subglottal system is to introduce additional pole-zero pairs into the vocal tract transfer function. Theoretically, the method based on an ARMA process is superior to that based on an AR process in the spectral analysis of the whispered speech. Two methods, the least squared modified Yule-Walker likelihood estimate (LSMY) algorithm and the Frequency-Domain Steiglitz-Mcbride (FDSM) algorithm, are applied to the ARMA model for the whispered speech. The performance evaluation shows that the ARMA model is much more appropriate for representing the whispered speech than the AR model, and the FDSM algorithm provides a more accurate estimation of the whispered speech spectral envelope than the LSMY algorithm with higher computational complexity.

  3. Presentation video retrieval using automatically recovered slide and spoken text

    Science.gov (United States)

    Cooper, Matthew

    2013-03-01

    Video is becoming a prevalent medium for e-learning. Lecture videos contain text information in both the presentation slides and lecturer's speech. This paper examines the relative utility of automatically recovered text from these sources for lecture video retrieval. To extract the visual information, we automatically detect slides within the videos and apply optical character recognition to obtain their text. Automatic speech recognition is used similarly to extract spoken text from the recorded audio. We perform controlled experiments with manually created ground truth for both the slide and spoken text from more than 60 hours of lecture video. We compare the automatically extracted slide and spoken text in terms of accuracy relative to ground truth, overlap with one another, and utility for video retrieval. Results reveal that automatically recovered slide text and spoken text contain different content with varying error profiles. Experiments demonstrate that automatically extracted slide text enables higher precision video retrieval than automatically recovered spoken text.

  4. Use of Speech Analyses within a mobile application for the Assessment of cognitive impairment in elderly people.

    Science.gov (United States)

    König, A; Satt, A; Sorin, A; Hoory, R; David, A; Robert, P H

    2017-08-28

    Various types of dementia and Mild Cognitive Impairment (MCI) are manifested as irregularities in human speech and language, which have proven to be strong predictors for the disease presence and progression. Therefore, automatic speech analytics provided by a mobile application may be a useful tool in providing additional indicators for assessment and detection of early stage dementia and MCI. 165 participants (subjects with subjective cognitive impairment (SCI), MCI patients, Alzheimer's disease (AD) and mixed dementia (MD) patients) were recorded with a mobile application while performing several short vocal cognitive tasks during a regular consultation. These tasks included verbal fluency, picture description, counting down and a free speech task. The voice recordings were processed in two steps: in the first step, vocal markers were extracted using speech signal processing techniques; in the second, the vocal markers were tested to assess their 'power' to distinguish between SCI, MCI, AD and MD. The second step included training automatic classifiers for detecting MCI and AD, based on machine learning methods, and testing the detection accuracy. The fluency and free speech tasks obtain the highest accuracy rates of classifying AD vs. MD vs. MCI vs. SCI. Using the data, we demonstrated classification accuracy as follows: SCI vs AD = 92% accuracy; SCI vs. MD = 92% accuracy; SCI vs. MCI = 86% accuracy and MCI vs. AD = 86%. Our results indicate the potential value of vocal analytics and the use of a mobile application for accurate automatic differentiation between SCI, MCI and AD. This tool can provide the clinician with meaningful information for assessment and monitoring of people with MCI and AD based on non-invasive, simple and low-cost method. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.

  5. User evaluation of a communication system that automatically generates captions to improve telephone communication

    NARCIS (Netherlands)

    Zekveld, A.A.; Kramer, S.E.; Kessens, J.M.; Vlaming, M.S.M.G.; Houtgast, T.

    2009-01-01

    This study examined the subjective benefit obtained from automatically generated captions during telephone-speech comprehension in the presence of babble noise. Short stories were presented by telephone either with or without captions that were generated offline by an automatic speech recognition

  6. Speech-enabled Computer-aided Translation

    DEFF Research Database (Denmark)

    Mesa-Lao, Bartolomé

    2014-01-01

    The present study has surveyed post-editor trainees’ views and attitudes before and after the introduction of speech technology as a front end to a computer-aided translation workbench. The aim of the survey was (i) to identify attitudes and perceptions among post-editor trainees before performing...... a post-editing task using automatic speech recognition (ASR); and (ii) to assess the degree to which post-editors’ attitudes and expectations to the use of speech technology changed after actually using it. The survey was based on two questionnaires: the first one administered before the participants...

  7. "Rate My Therapist": Automated Detection of Empathy in Drug and Alcohol Counseling via Speech and Language Processing.

    Science.gov (United States)

    Xiao, Bo; Imel, Zac E; Georgiou, Panayiotis G; Atkins, David C; Narayanan, Shrikanth S

    2015-01-01

    The technology for evaluating patient-provider interactions in psychotherapy-observational coding-has not changed in 70 years. It is labor-intensive, error prone, and expensive, limiting its use in evaluating psychotherapy in the real world. Engineering solutions from speech and language processing provide new methods for the automatic evaluation of provider ratings from session recordings. The primary data are 200 Motivational Interviewing (MI) sessions from a study on MI training methods with observer ratings of counselor empathy. Automatic Speech Recognition (ASR) was used to transcribe sessions, and the resulting words were used in a text-based predictive model of empathy. Two supporting datasets trained the speech processing tasks including ASR (1200 transcripts from heterogeneous psychotherapy sessions and 153 transcripts and session recordings from 5 MI clinical trials). The accuracy of computationally-derived empathy ratings were evaluated against human ratings for each provider. Computationally-derived empathy scores and classifications (high vs. low) were highly accurate against human-based codes and classifications, with a correlation of 0.65 and F-score (a weighted average of sensitivity and specificity) of 0.86, respectively. Empathy prediction using human transcription as input (as opposed to ASR) resulted in a slight increase in prediction accuracies, suggesting that the fully automatic system with ASR is relatively robust. Using speech and language processing methods, it is possible to generate accurate predictions of provider performance in psychotherapy from audio recordings alone. This technology can support large-scale evaluation of psychotherapy for dissemination and process studies.

  8. In Vitro Evaluation of the iValve : A Novel Hands-Free Speech Valve

    NARCIS (Netherlands)

    van der Houwen, E.B.; van Kalkeren, T.A.; Burgerhof, J.G.; van der Laan, B.F.; Verkerke, G.J.

    2011-01-01

    Objectives: We performed in vitro evaluation of a novel, disposable, automatic hands-free tracheostoma speech valve for laryngectomy patients based upon the principle of inhalation. The commercially available automatic speech valves close upon strong exhalation and open again when the pressure

  9. In Vitro Evaluation of the iValve : A Novel Hands-Free Speech Valve

    NARCIS (Netherlands)

    van der Houwen, E.B.; van Kalkeren, T.A.; Burgerhof, J.G.; van der Laan, B.F.; Verkerke, G.J.

    2011-01-01

    Objectives: We performed in vitro evaluation of a novel, disposable, automatic hands-free tracheostoma speech valve for laryngectomy patients based upon the principle of inhalation. The commercially available automatic speech valves close upon strong exhalation and open again when the pressure drops

  10. Segmentation, diarization and speech transcription : surprise data unraveled

    NARCIS (Netherlands)

    Huijbregts, Marijn Anthonius Henricus

    2008-01-01

    In this thesis, research on large vocabulary continuous speech recognition for unknown audio conditions is presented. For automatic speech recognition systems based on statistical methods, it is important that the conditions of the audio used for training the statistical models match the conditions

  11. Segmentation, Diarization and Speech Transcription: Surprise Data Unraveled

    NARCIS (Netherlands)

    Huijbregts, M.A.H.

    2008-01-01

    In this thesis, research on large vocabulary continuous speech recognition for unknown audio conditions is presented. For automatic speech recognition systems based on statistical methods, it is important that the conditions of the audio used for training the statistical models match the conditions

  12. Sequential Organization and Room Reverberation for Speech Segregation

    Science.gov (United States)

    2012-02-28

    voiced portions account for about 75-80% of spoken English . Voiced speech is characterized by periodicity (or harmonicity), which has been used as a...onset and offset cues to extract unvoiced speech segments. Acoustic- phonetic features are then used to separate unvoiced speech from nonspeech...estimate is relatively accurate due to weak voiced speech at these frequencies. Based on this analysis and acoustic- phonetic characteristics of

  13. Hate speech

    Directory of Open Access Journals (Sweden)

    Anne Birgitta Nilsen

    2014-12-01

    Full Text Available The manifesto of the Norwegian terrorist Anders Behring Breivik is based on the “Eurabia” conspiracy theory. This theory is a key starting point for hate speech amongst many right-wing extremists in Europe, but also has ramifications beyond these environments. In brief, proponents of the Eurabia theory claim that Muslims are occupying Europe and destroying Western culture, with the assistance of the EU and European governments. By contrast, members of Al-Qaeda and other extreme Islamists promote the conspiracy theory “the Crusade” in their hate speech directed against the West. Proponents of the latter theory argue that the West is leading a crusade to eradicate Islam and Muslims, a crusade that is similarly facilitated by their governments. This article presents analyses of texts written by right-wing extremists and Muslim extremists in an effort to shed light on how hate speech promulgates conspiracy theories in order to spread hatred and intolerance.The aim of the article is to contribute to a more thorough understanding of hate speech’s nature by applying rhetorical analysis. Rhetorical analysis is chosen because it offers a means of understanding the persuasive power of speech. It is thus a suitable tool to describe how hate speech works to convince and persuade. The concepts from rhetorical theory used in this article are ethos, logos and pathos. The concept of ethos is used to pinpoint factors that contributed to Osama bin Laden's impact, namely factors that lent credibility to his promotion of the conspiracy theory of the Crusade. In particular, Bin Laden projected common sense, good morals and good will towards his audience. He seemed to have coherent and relevant arguments; he appeared to possess moral credibility; and his use of language demonstrated that he wanted the best for his audience.The concept of pathos is used to define hate speech, since hate speech targets its audience's emotions. In hate speech it is the

  14. Language-Independent Automatic Evaluation of Intelligibility of Chronically Hoarse Persons

    OpenAIRE

    Haderlein, Tino; Middag, Catherine; Martens, Jean-Pierre; Döllinger, Michael; Nöth, Elmar

    2014-01-01

    Objective: Automatic intelligibility assessment using automatic speech recognition is usually language specific. In this study, a language-independent approach is proposed. It uses models that are trained with Flemish speech, and it is applied to assess chronically hoarse German speakers. The research questions are here: is it possible to construct suitable acoustic features that generalize to other languages and a speech disorder, and is the generated model for intelligibility also suitable ...

  15. ASLP-MULAN: Audio speech and language processing for multimedia analytics

    OpenAIRE

    2016-01-01

    Our intention is generating the right mixture of audio, speech and language technologies with big data ones. Some audio, speech and language automatic technologies are available or gaining enough degree of maturity as to be able to help to this objective: automatic speech transcription, query by spoken example, spoken information retrieval, natural language processing, unstructured multimedia contents transcription and description, multimedia files summarization, spoken emotion detection and ...

  16. Speech enhancement

    CERN Document Server

    Benesty, Jacob; Chen, Jingdong

    2006-01-01

    We live in a noisy world! In all applications (telecommunications, hands-free communications, recording, human-machine interfaces, etc.) that require at least one microphone, the signal of interest is usually contaminated by noise and reverberation. As a result, the microphone signal has to be ""cleaned"" with digital signal processing tools before it is played out, transmitted, or stored.This book is about speech enhancement. Different well-known and state-of-the-art methods for noise reduction, with one or multiple microphones, are discussed. By speech enhancement, we mean not only noise red

  17. Methods and apparatus for non-acoustic speech characterization and recognition

    Energy Technology Data Exchange (ETDEWEB)

    Holzrichter, J.F.

    1999-12-21

    By simultaneously recording EM wave reflections and acoustic speech information, the positions and velocities of the speech organs as speech is articulated can be defined for each acoustic speech unit. Well defined time frames and feature vectors describing the speech, to the degree required, can be formed. Such feature vectors can uniquely characterize the speech unit being articulated each time frame. The onset of speech, rejection of external noise, vocalized pitch periods, articulator conditions, accurate timing, the identification of the speaker, acoustic speech unit recognition, and organ mechanical parameters can be determined.

  18. Methods and apparatus for non-acoustic speech characterization and recognition

    Energy Technology Data Exchange (ETDEWEB)

    Holzrichter, John F. (Berkeley, CA)

    1999-01-01

    By simultaneously recording EM wave reflections and acoustic speech information, the positions and velocities of the speech organs as speech is articulated can be defined for each acoustic speech unit. Well defined time frames and feature vectors describing the speech, to the degree required, can be formed. Such feature vectors can uniquely characterize the speech unit being articulated each time frame. The onset of speech, rejection of external noise, vocalized pitch periods, articulator conditions, accurate timing, the identification of the speaker, acoustic speech unit recognition, and organ mechanical parameters can be determined.

  19. Social cognition of indirect speech: Evidence from Parkinson's Disease.

    Science.gov (United States)

    McNamara, Patrick; Holtgraves, Thomas; Durso, Raymon; Harris, Erica

    2010-03-01

    We examined potential neurocognitive mechanisms of indirect speech in support of face management in 28 patients with Parkinson's disease (PD) and 32 elderly controls with chronic disease. In experiment 1, we demonstrated automatic activation of indirect meanings of particularized implicatures in controls but not in PD patients. Failure to automatically engage comprehension of indirect meanings of indirect speech acts in PD patients was correlated with a measure of prefrontal dysfunction. In experiment 2, we showed that while PD patients and controls offered similar interpretations of indirect speech acts, PD participants were overly confident in their interpretations and unaware of errors of interpretation. Efficient reputational adjustment mechanisms apparently require intact striatal-prefrontal networks.

  20. Speech Intelligibility

    Science.gov (United States)

    Brand, Thomas

    Speech intelligibility (SI) is important for different fields of research, engineering and diagnostics in order to quantify very different phenomena like the quality of recordings, communication and playback devices, the reverberation of auditoria, characteristics of hearing impairment, benefit using hearing aids or combinations of these things.

  1. Speech dynamics

    NARCIS (Netherlands)

    Pols, L.C.W.

    2011-01-01

    In order for speech to be informative and communicative, segmental and suprasegmental variation is mandatory. Only this leads to meaningful words and sentences. The building blocks are no stable entities put next to each other (like beads on a string or like printed text), but there are gradual tran

  2. Spotting Prosodic Boundaries in Continuous Speech in French

    CERN Document Server

    Pagel, V; Laprie, Y; Vaissiere, J C

    1998-01-01

    A radio speech corpus of 9mn has been prosodically marked by a phonetician expert, and non expert listeners. this corpus is large enough to train and test an automatic boundary spotting system, namely a time delay neural network fed with F0 values, vowels and pseudo-syllable durations. Results validate both prosodic marking and automatic spotting of prosodic events.

  3. OLIVE: Speech-Based Video Retrieval

    NARCIS (Netherlands)

    Jong, de Franciska; Gauvain, Jean-Luc; Hartog, den Jurgen; Netter, Klaus

    1999-01-01

    This paper describes the Olive project which aims to support automated indexing of video material by use of human language technologies. Olive is making use of speech recognition to automatically derive transcriptions of the sound tracks, generating time-coded linguistic elements which serve as the

  4. Speech recognition employing biologically plausible receptive fields

    DEFF Research Database (Denmark)

    Fereczkowski, Michal; Bothe, Hans-Heinrich

    2011-01-01

    The main idea of the project is to build a widely speaker-independent, biologically motivated automatic speech recognition (ASR) system. The two main differences between our approach and current state-of-the-art ASRs are that i) the features used here are based on the responses of neuronlike spec...

  5. A distributed approach to speech resource collection

    CSIR Research Space (South Africa)

    Molapo, R

    2013-12-01

    Full Text Available The authors describe the integration of several tools to enable the end-to-end development of an Automatic Speech Recognition system in a typical under-resourced language. The authors analyse the data acquired by each of the tools and develop an ASR...

  6. Fishing for meaningful units in connected speech

    DEFF Research Database (Denmark)

    Henrichsen, Peter Juel; Christiansen, Thomas Ulrich

    2009-01-01

    was far lower than for phonemic recognition. Our findings show that it is possible to automatically characterize a linguistic message, without detailed spectral information or presumptions about the target units. Further, fishing for simple meaningful cues and enhancing these selectively would potentially...... be a more effective way of achieving intelligibility transfer, which is the end goal for speech transducing technologies....

  7. OLIVE: Speech-Based Video Retrieval

    NARCIS (Netherlands)

    de Jong, Franciska M.G.; Gauvain, Jean-Luc; den Hartog, Jurgen; den Hartog, Jeremy; Netter, Klaus

    1999-01-01

    This paper describes the Olive project which aims to support automated indexing of video material by use of human language technologies. Olive is making use of speech recognition to automatically derive transcriptions of the sound tracks, generating time-coded linguistic elements which serve as the

  8. Speech Clarity Index (Ψ): A Distance-Based Speech Quality Indicator and Recognition Rate Prediction for Dysarthric Speakers with Cerebral Palsy

    Science.gov (United States)

    Kayasith, Prakasith; Theeramunkong, Thanaruk

    It is a tedious and subjective task to measure severity of a dysarthria by manually evaluating his/her speech using available standard assessment methods based on human perception. This paper presents an automated approach to assess speech quality of a dysarthric speaker with cerebral palsy. With the consideration of two complementary factors, speech consistency and speech distinction, a speech quality indicator called speech clarity index (Ψ) is proposed as a measure of the speaker's ability to produce consistent speech signal for a certain word and distinguished speech signal for different words. As an application, it can be used to assess speech quality and forecast speech recognition rate of speech made by an individual dysarthric speaker before actual exhaustive implementation of an automatic speech recognition system for the speaker. The effectiveness of Ψ as a speech recognition rate predictor is evaluated by rank-order inconsistency, correlation coefficient, and root-mean-square of difference. The evaluations had been done by comparing its predicted recognition rates with ones predicted by the standard methods called the articulatory and intelligibility tests based on the two recognition systems (HMM and ANN). The results show that Ψ is a promising indicator for predicting recognition rate of dysarthric speech. All experiments had been done on speech corpus composed of speech data from eight normal speakers and eight dysarthric speakers.

  9. Experimental comparison between speech transmission index, rapid speech transmission index, and speech intelligibility index.

    Science.gov (United States)

    Larm, Petra; Hongisto, Valtteri

    2006-02-01

    During the acoustical design of, e.g., auditoria or open-plan offices, it is important to know how speech can be perceived in various parts of the room. Different objective methods have been developed to measure and predict speech intelligibility, and these have been extensively used in various spaces. In this study, two such methods were compared, the speech transmission index (STI) and the speech intelligibility index (SII). Also the simplification of the STI, the room acoustics speech transmission index (RASTI), was considered. These quantities are all based on determining an apparent speech-to-noise ratio on selected frequency bands and summing them using a specific weighting. For comparison, some data were needed on the possible differences of these methods resulting from the calculation scheme and also measuring equipment. Their prediction accuracy was also of interest. Measurements were made in a laboratory having adjustable noise level and absorption, and in a real auditorium. It was found that the measurement equipment, especially the selection of the loudspeaker, can greatly affect the accuracy of the results. The prediction accuracy of the RASTI was found acceptable, if the input values for the prediction are accurately known, even though the studied space was not ideally diffuse.

  10. CONVERGING TOWARDS A COMMON SPEECH CODE: IMITATIVE AND PERCEPTUO-MOTOR RECALIBRATION PROCESSES IN SPEECH PRODUCTION

    Directory of Open Access Journals (Sweden)

    Marc eSato

    2013-07-01

    Full Text Available Auditory and somatosensory systems play a key role in speech motor control. In the act of speaking, segmental speech movements are programmed to reach phonemic sensory goals, which in turn are used to estimate actual sensory feedback in order to further control production. The adult’s tendency to automatically imitate a number of acoustic-phonetic characteristics in another speaker's speech however suggests that speech production not only relies on the intended phonemic sensory goals and actual sensory feedback but also on the processing of external speech inputs. These online adaptive changes in speech production, or phonetic convergence effects, are thought to facilitate conversational exchange by contributing to setting a common perceptuo-motor ground between the speaker and the listener. In line with previous studies on phonetic convergence, we here demonstrate, in a non-interactive situation of communication, online unintentional and voluntary imitative changes in relevant acoustic features of acoustic vowel targets (fundamental and first formant frequencies during speech production and imitation. In addition, perceptuo-motor recalibration processes, or after-effects, occurred not only after vowel production and imitation but also after auditory categorization of the acoustic vowel targets. Altogether, these findings demonstrate adaptive plasticity of phonemic sensory-motor goals and suggest that, apart from sensory-motor knowledge, speech production continuously draws on perceptual learning from the external speech environment.

  11. A Research of Speech Emotion Recognition Based on Deep Belief Network and SVM

    Directory of Open Access Journals (Sweden)

    Chenchen Huang

    2014-01-01

    Full Text Available Feature extraction is a very important part in speech emotion recognition, and in allusion to feature extraction in speech emotion recognition problems, this paper proposed a new method of feature extraction, using DBNs in DNN to extract emotional features in speech signal automatically. By training a 5 layers depth DBNs, to extract speech emotion feature and incorporate multiple consecutive frames to form a high dimensional feature. The features after training in DBNs were the input of nonlinear SVM classifier, and finally speech emotion recognition multiple classifier system was achieved. The speech emotion recognition rate of the system reached 86.5%, which was 7% higher than the original method.

  12. Speech communications in noise

    Science.gov (United States)

    1984-07-01

    The physical characteristics of speech, the methods of speech masking measurement, and the effects of noise on speech communication are investigated. Topics include the speech signal and intelligibility, the effects of noise on intelligibility, the articulation index, and various devices for evaluating speech systems.

  13. Ensemble Feature Extraction Modules for Improved Hindi Speech Recognition System

    Directory of Open Access Journals (Sweden)

    Malay Kumar

    2012-05-01

    Full Text Available Speech is the most natural way of communication between human beings. The field of speech recognition generates intrigues of man - machine conversation and due to its versatile applications; automatic speech recognition systems have been designed. In this paper we are presenting a novel approach for Hindi speech recognition by ensemble feature extraction modules of ASR systems and their outputs have been combined using voting technique ROVER. Experimental results have been shown that proposed system will produce better result than traditional ASR systems.

  14. The importance of accurate glacier albedo for estimates of surface mass balance on Vatnajökull: evaluating the surface energy budget in a regional climate model with automatic weather station observations

    Science.gov (United States)

    Steffensen Schmidt, Louise; Aðalgeirsdóttir, Guðfinna; Guðmundsson, Sverrir; Langen, Peter L.; Pálsson, Finnur; Mottram, Ruth; Gascoin, Simon; Björnsson, Helgi

    2017-07-01

    A simulation of the surface climate of Vatnajökull ice cap, Iceland, carried out with the regional climate model HIRHAM5 for the period 1980-2014, is used to estimate the evolution of the glacier surface mass balance (SMB). This simulation uses a new snow albedo parameterization that allows albedo to exponentially decay with time and is surface temperature dependent. The albedo scheme utilizes a new background map of the ice albedo created from observed MODIS data. The simulation is evaluated against observed daily values of weather parameters from five automatic weather stations (AWSs) from the period 2001-2014, as well as in situ SMB measurements from the period 1995-2014. The model agrees well with observations at the AWS sites, albeit with a general underestimation of the net radiation. This is due to an underestimation of the incoming radiation and a general overestimation of the albedo. The average modelled albedo is overestimated in the ablation zone, which we attribute to an overestimation of the thickness of the snow layer and not taking the surface darkening from dirt and volcanic ash deposition during dust storms and volcanic eruptions into account. A comparison with the specific summer, winter, and net mass balance for the whole of Vatnajökull (1995-2014) shows a good overall fit during the summer, with a small mass balance underestimation of 0.04 m w.e. on average, whereas the winter mass balance is overestimated by on average 0.5 m w.e. due to too large precipitation at the highest areas of the ice cap. A simple correction of the accumulation at the highest points of the glacier reduces this to 0.15 m w.e. Here, we use HIRHAM5 to simulate the evolution of the SMB of Vatnajökull for the period 1981-2014 and show that the model provides a reasonable representation of the SMB for this period. However, a major source of uncertainty in the representation of the SMB is the representation of the albedo, and processes currently not accounted for in RCMs

  15. The importance of accurate glacier albedo for estimates of surface mass balance on Vatnajökull: evaluating the surface energy budget in a regional climate model with automatic weather station observations

    Directory of Open Access Journals (Sweden)

    L. S. Schmidt

    2017-07-01

    Full Text Available A simulation of the surface climate of Vatnajökull ice cap, Iceland, carried out with the regional climate model HIRHAM5 for the period 1980–2014, is used to estimate the evolution of the glacier surface mass balance (SMB. This simulation uses a new snow albedo parameterization that allows albedo to exponentially decay with time and is surface temperature dependent. The albedo scheme utilizes a new background map of the ice albedo created from observed MODIS data. The simulation is evaluated against observed daily values of weather parameters from five automatic weather stations (AWSs from the period 2001–2014, as well as in situ SMB measurements from the period 1995–2014. The model agrees well with observations at the AWS sites, albeit with a general underestimation of the net radiation. This is due to an underestimation of the incoming radiation and a general overestimation of the albedo. The average modelled albedo is overestimated in the ablation zone, which we attribute to an overestimation of the thickness of the snow layer and not taking the surface darkening from dirt and volcanic ash deposition during dust storms and volcanic eruptions into account. A comparison with the specific summer, winter, and net mass balance for the whole of Vatnajökull (1995–2014 shows a good overall fit during the summer, with a small mass balance underestimation of 0.04 m w.e. on average, whereas the winter mass balance is overestimated by on average 0.5 m w.e. due to too large precipitation at the highest areas of the ice cap. A simple correction of the accumulation at the highest points of the glacier reduces this to 0.15 m w.e. Here, we use HIRHAM5 to simulate the evolution of the SMB of Vatnajökull for the period 1981–2014 and show that the model provides a reasonable representation of the SMB for this period. However, a major source of uncertainty in the representation of the SMB is the representation of the albedo, and processes

  16. Effective Prediction of Errors by Non-native Speakers Using Decision Tree for Speech Recognition-Based CALL System

    Science.gov (United States)

    Wang, Hongcui; Kawahara, Tatsuya

    CALL (Computer Assisted Language Learning) systems using ASR (Automatic Speech Recognition) for second language learning have received increasing interest recently. However, it still remains a challenge to achieve high speech recognition performance, including accurate detection of erroneous utterances by non-native speakers. Conventionally, possible error patterns, based on linguistic knowledge, are added to the lexicon and language model, or the ASR grammar network. However, this approach easily falls in the trade-off of coverage of errors and the increase of perplexity. To solve the problem, we propose a method based on a decision tree to learn effective prediction of errors made by non-native speakers. An experimental evaluation with a number of foreign students learning Japanese shows that the proposed method can effectively generate an ASR grammar network, given a target sentence, to achieve both better coverage of errors and smaller perplexity, resulting in significant improvement in ASR accuracy.

  17. Speech Enhancement based on Compressive Sensing Algorithm

    Science.gov (United States)

    Sulong, Amart; Gunawan, Teddy S.; Khalifa, Othman O.; Chebil, Jalel

    2013-12-01

    There are various methods, in performance of speech enhancement, have been proposed over the years. The accurate method for the speech enhancement design mainly focuses on quality and intelligibility. The method proposed with high performance level. A novel speech enhancement by using compressive sensing (CS) is a new paradigm of acquiring signals, fundamentally different from uniform rate digitization followed by compression, often used for transmission or storage. Using CS can reduce the number of degrees of freedom of a sparse/compressible signal by permitting only certain configurations of the large and zero/small coefficients, and structured sparsity models. Therefore, CS is significantly provides a way of reconstructing a compressed version of the speech in the original signal by taking only a small amount of linear and non-adaptive measurement. The performance of overall algorithms will be evaluated based on the speech quality by optimise using informal listening test and Perceptual Evaluation of Speech Quality (PESQ). Experimental results show that the CS algorithm perform very well in a wide range of speech test and being significantly given good performance for speech enhancement method with better noise suppression ability over conventional approaches without obvious degradation of speech quality.

  18. System And Method For Characterizing Voiced Excitations Of Speech And Acoustic Signals, Removing Acoustic Noise From Speech, And Synthesizi

    Energy Technology Data Exchange (ETDEWEB)

    Burnett, Greg C. (Livermore, CA); Holzrichter, John F. (Berkeley, CA); Ng, Lawrence C. (Danville, CA)

    2006-04-25

    The present invention is a system and method for characterizing human (or animate) speech voiced excitation functions and acoustic signals, for removing unwanted acoustic noise which often occurs when a speaker uses a microphone in common environments, and for synthesizing personalized or modified human (or other animate) speech upon command from a controller. A low power EM sensor is used to detect the motions of windpipe tissues in the glottal region of the human speech system before, during, and after voiced speech is produced by a user. From these tissue motion measurements, a voiced excitation function can be derived. Further, the excitation function provides speech production information to enhance noise removal from human speech and it enables accurate transfer functions of speech to be obtained. Previously stored excitation and transfer functions can be used for synthesizing personalized or modified human speech. Configurations of EM sensor and acoustic microphone systems are described to enhance noise cancellation and to enable multiple articulator measurements.

  19. Going to a Speech Therapist

    Science.gov (United States)

    ... Video: Getting an X-ray Going to a Speech Therapist KidsHealth > For Kids > Going to a Speech Therapist ... therapists (also called speech-language pathologists ). What Do Speech Therapists Help With? Speech therapists help people of all ...

  20. Automatic analysis of multiparty meetings

    Indian Academy of Sciences (India)

    Steve Renals

    2011-10-01

    This paper is about the recognition and interpretation of multiparty meetings captured as audio, video and other signals. This is a challenging task since the meetings consist of spontaneous and conversational interactions between a number of participants: it is a multimodal, multiparty, multistream problem. We discuss the capture and annotation of the Augmented Multiparty Interaction (AMI) meeting corpus, the development of a meeting speech recognition system, and systems for the automatic segmentation, summarization and social processing of meetings, together with some example applications based on these systems.

  1. Multi-thread Parallel Speech Recognition for Mobile Applications

    Directory of Open Access Journals (Sweden)

    LOJKA Martin

    2014-05-01

    Full Text Available In this paper, the server based solution of the multi-thread large vocabulary automatic speech recognition engine is described along with the Android OS and HTML5 practical application examples. The basic idea was to bring speech recognition available for full variety of applications for computers and especially for mobile devices. The speech recognition engine should be independent of commercial products and services (where the dictionary could not be modified. Using of third-party services could be also a security and privacy problem in specific applications, when the unsecured audio data could not be sent to uncontrolled environments (voice data transferred to servers around the globe. Using our experience with speech recognition applications, we have been able to construct a multi-thread speech recognition serverbased solution designed for simple applications interface (API to speech recognition engine modified to specific needs of particular application.

  2. Speech research

    Science.gov (United States)

    1992-06-01

    Phonology is traditionally seen as the discipline that concerns itself with the building blocks of linguistic messages. It is the study of the structure of sound inventories of languages and of the participation of sounds in rules or processes. Phonetics, in contrast, concerns speech sounds as produced and perceived. Two extreme positions on the relationship between phonological messages and phonetic realizations are represented in the literature. One holds that the primary home for linguistic symbols, including phonological ones, is the human mind, itself housed in the human brain. The second holds that their primary home is the human vocal tract.

  3. Speech Segregation based on Binary Classification

    Science.gov (United States)

    2016-07-15

    network (RNN) that is trained on sequential frame-level features and capable of learning temporal dynamics. Both DNNs and RNNs produce accurate...time frames (e.g., 100 ms to 120 ms), the RNN yields better probabilistic outputs than the DNN, because the RNN can better capture the temporal context...based pitch tracking in very noisy speech,” in IEEE/ACM Transactions on Audio, Speech and Language Processing. More details about this work can be

  4. Toddlers' recognition of noise-vocoded speech.

    Science.gov (United States)

    Newman, Rochelle; Chatterjee, Monita

    2013-01-01

    Despite their remarkable clinical success, cochlear-implant listeners today still receive spectrally degraded information. Much research has examined normally hearing adult listeners' ability to interpret spectrally degraded signals, primarily using noise-vocoded speech to simulate cochlear implant processing. Far less research has explored infants' and toddlers' ability to interpret spectrally degraded signals, despite the fact that children in this age range are frequently implanted. This study examines 27-month-old typically developing toddlers' recognition of noise-vocoded speech in a language-guided looking study. Children saw two images on each trial and heard a voice instructing them to look at one item ("Find the cat!"). Full-spectrum sentences or their noise-vocoded versions were presented with varying numbers of spectral channels. Toddlers showed equivalent proportions of looking to the target object with full-speech and 24- or 8-channel noise-vocoded speech; they failed to look appropriately with 2-channel noise-vocoded speech and showed variable performance with 4-channel noise-vocoded speech. Despite accurate looking performance for speech with at least eight channels, children were slower to respond appropriately as the number of channels decreased. These results indicate that 2-yr-olds have developed the ability to interpret vocoded speech, even without practice, but that doing so requires additional processing. These findings have important implications for pediatric cochlear implantation.

  5. Accurate guitar tuning by cochlear implant musicians.

    Science.gov (United States)

    Lu, Thomas; Huang, Juan; Zeng, Fan-Gang

    2014-01-01

    Modern cochlear implant (CI) users understand speech but find difficulty in music appreciation due to poor pitch perception. Still, some deaf musicians continue to perform with their CI. Here we show unexpected results that CI musicians can reliably tune a guitar by CI alone and, under controlled conditions, match simultaneously presented tones to electric analysis, showed that accurate tuning was achieved by listening to beats rather than discriminating pitch, effectively turning a spectral task into a temporal discrimination task.

  6. Phonetic Alphabet for Speech Recognition of Czech

    Directory of Open Access Journals (Sweden)

    J. Uhlir

    1997-12-01

    Full Text Available In the paper we introduce and discuss an alphabet that has been proposed for phonemicly oriented automatic speech recognition. The alphabet, denoted as a PAC (Phonetic Alphabet for Czech consists of 48 basic symbols that allow for distinguishing all major events occurring in spoken Czech language. The symbols can be used both for phonetic transcription of Czech texts as well as for labeling recorded speech signals. From practical reasons, the alphabet occurs in two versions; one utilizes Czech native characters and the other employs symbols similar to those used for English in the DARPA and NIST alphabets.

  7. Sentence Clustering Using Parts-of-Speech

    Directory of Open Access Journals (Sweden)

    Richard Khoury

    2012-02-01

    Full Text Available Clustering algorithms are used in many Natural Language Processing (NLP tasks. They have proven to be popular and effective tools to use to discover groups of similar linguistic items. In this exploratory paper, we propose a new clustering algorithm to automatically cluster together similar sentences based on the sentences’ part-of-speech syntax. The algorithm generates and merges together the clusters using a syntactic similarity metric based on a hierarchical organization of the parts-of-speech. We demonstrate the features of this algorithm by implementing it in a question type classification system, in order to determine the positive or negative impact of different changes to the algorithm.

  8. Semi-automatic segmentation for 3D motion analysis of the tongue with dynamic MRI.

    Science.gov (United States)

    Lee, Junghoon; Woo, Jonghye; Xing, Fangxu; Murano, Emi Z; Stone, Maureen; Prince, Jerry L

    2014-12-01

    Dynamic MRI has been widely used to track the motion of the tongue and measure its internal deformation during speech and swallowing. Accurate segmentation of the tongue is a prerequisite step to define the target boundary and constrain the tracking to tissue points within the tongue. Segmentation of 2D slices or 3D volumes is challenging because of the large number of slices and time frames involved in the segmentation, as well as the incorporation of numerous local deformations that occur throughout the tongue during motion. In this paper, we propose a semi-automatic approach to segment 3D dynamic MRI of the tongue. The algorithm steps include seeding a few slices at one time frame, propagating seeds to the same slices at different time frames using deformable registration, and random walker segmentation based on these seed positions. This method was validated on the tongue of five normal subjects carrying out the same speech task with multi-slice 2D dynamic cine-MR images obtained at three orthogonal orientations and 26 time frames. The resulting semi-automatic segmentations of a total of 130 volumes showed an average dice similarity coefficient (DSC) score of 0.92 with less segmented volume variability between time frames than in manual segmentations.

  9. Children's perception of their synthetically corrected speech production.

    Science.gov (United States)

    Strömbergsson, Sofia; Wengelin, Asa; House, David

    2014-06-01

    We explore children's perception of their own speech - in its online form, in its recorded form, and in synthetically modified forms. Children with phonological disorder (PD) and children with typical speech and language development (TD) performed tasks of evaluating accuracy of the different types of speech stimuli, either immediately after having produced the utterance or after a delay. In addition, they performed a task designed to assess their ability to detect synthetic modification. Both groups showed high performance in tasks involving evaluation of other children's speech, whereas in tasks of evaluating one's own speech, the children with PD were less accurate than their TD peers. The children with PD were less sensitive to misproductions in immediate conjunction with their production of an utterance, and more accurate after a delay. Within-category modification often passed undetected, indicating a satisfactory quality of the generated speech. Potential clinical benefits of using corrective re-synthesis are discussed.

  10. Speech production, Psychology of

    NARCIS (Netherlands)

    Schriefers, H.J.; Vigliocco, G.

    2015-01-01

    Research on speech production investigates the cognitive processes involved in transforming thoughts into speech. This article starts with a discussion of the methodological issues inherent to research in speech production that illustrates how empirical approaches to speech production must differ fr

  11. A Statistical Quality Model for Data-Driven Speech Animation.

    Science.gov (United States)

    Ma, Xiaohan; Deng, Zhigang

    2012-11-01

    In recent years, data-driven speech animation approaches have achieved significant successes in terms of animation quality. However, how to automatically evaluate the realism of novel synthesized speech animations has been an important yet unsolved research problem. In this paper, we propose a novel statistical model (called SAQP) to automatically predict the quality of on-the-fly synthesized speech animations by various data-driven techniques. Its essential idea is to construct a phoneme-based, Speech Animation Trajectory Fitting (SATF) metric to describe speech animation synthesis errors and then build a statistical regression model to learn the association between the obtained SATF metric and the objective speech animation synthesis quality. Through delicately designed user studies, we evaluate the effectiveness and robustness of the proposed SAQP model. To the best of our knowledge, this work is the first-of-its-kind, quantitative quality model for data-driven speech animation. We believe it is the important first step to remove a critical technical barrier for applying data-driven speech animation techniques to numerous online or interactive talking avatar applications.

  12. Processing of Speech Signals for Physical and Sensory Disabilities

    Science.gov (United States)

    Levitt, Harry

    1995-10-01

    Assistive technology involving voice communication is used primarily by people who are deaf, hard of hearing, or who have speech and/or language disabilities. It is also used to a lesser extent by people with visual or motor disabilities. A very wide range of devices has been developed for people with hearing loss. These devices can be categorized not only by the modality of stimulation [i.e., auditory, visual, tactile, or direct electrical stimulation of the auditory nerve (auditory-neural)] but also in terms of the degree of speech processing that is used. At least four such categories can be distinguished: assistive devices (a) that are not designed specifically for speech, (b) that take the average characteristics of speech into account, (c) that process articulatory or phonetic characteristics of speech, and (d) that embody some degree of automatic speech recognition. Assistive devices for people with speech and/or language disabilities typically involve some form of speech synthesis or symbol generation for severe forms of language disability. Speech synthesis is also used in text-to-speech systems for sightless persons. Other applications of assistive technology involving voice communication include voice control of wheelchairs and other devices for people with mobility disabilities.

  13. 78 FR 49717 - Speech-to-Speech and Internet Protocol (IP) Speech-to-Speech Telecommunications Relay Services...

    Science.gov (United States)

    2013-08-15

    ... reasons that STS ] has not been more widely utilized. Are people with speech disabilities not connected to... COMMISSION 47 CFR Part 64 Speech-to-Speech and Internet Protocol (IP) Speech-to-Speech Telecommunications Relay Services; Telecommunications Relay Services and Speech-to-Speech Services for Individuals With...

  14. Speech Enhancement

    DEFF Research Database (Denmark)

    Benesty, Jacob; Jensen, Jesper Rindom; Christensen, Mads Græsbøll;

    of methods and have been introduced in somewhat different contexts. Linear filtering methods originate in stochastic processes, while subspace methods have largely been based on developments in numerical linear algebra and matrix approximation theory. This book bridges the gap between these two classes......Speech enhancement is a classical problem in signal processing, yet still largely unsolved. Two of the conventional approaches for solving this problem are linear filtering, like the classical Wiener filter, and subspace methods. These approaches have traditionally been treated as different classes...... of methods by showing how the ideas behind subspace methods can be incorporated into traditional linear filtering. In the context of subspace methods, the enhancement problem can then be seen as a classical linear filter design problem. This means that various solutions can more easily be compared...

  15. Filled pause refinement based on the pronunciation probability for lecture speech.

    Science.gov (United States)

    Long, Yan-Hua; Ye, Hong

    2015-01-01

    Nowadays, although automatic speech recognition has become quite proficient in recognizing or transcribing well-prepared fluent speech, the transcription of speech that contains many disfluencies remains problematic, such as spontaneous conversational and lecture speech. Filled pauses (FPs) are the most frequently occurring disfluencies in this type of speech. Most recent studies have shown that FPs are widely believed to increase the error rates for state-of-the-art speech transcription, primarily because most FPs are not well annotated or provided in training data transcriptions and because of the similarities in acoustic characteristics between FPs and some common non-content words. To enhance the speech transcription system, we propose a new automatic refinement approach to detect FPs in British English lecture speech transcription. This approach combines the pronunciation probabilities for each word in the dictionary and acoustic language model scores for FP refinement through a modified speech recognition forced-alignment framework. We evaluate the proposed approach on the Reith Lectures speech transcription task, in which only imperfect training transcriptions are available. Successful results are achieved for both the development and evaluation datasets. Acoustic models trained on different styles of speech genres have been investigated with respect to FP refinement. To further validate the effectiveness of the proposed approach, speech transcription performance has also been examined using systems built on training data transcriptions with and without FP refinement.

  16. Speech therapy with obturator.

    Science.gov (United States)

    Shyammohan, A; Sreenivasulu, D

    2010-12-01

    Rehabilitation of speech is tantamount to closure of defect in cases with velopharyngeal insufficiency. Often the importance of speech therapy is sidelined during the fabrication of obturators. Usually the speech part is taken up only at a later stage and is relegated entirely to a speech therapist without the active involvement of the prosthodontist. The article suggests a protocol for speech therapy in such cases to be done in unison with a prosthodontist.

  17. Automatic Recognition of Element Classes and Boundaries in the Birdsong with Variable Sequences.

    Directory of Open Access Journals (Sweden)

    Takuya Koumura

    Full Text Available Researches on sequential vocalization often require analysis of vocalizations in long continuous sounds. In such studies as developmental ones or studies across generations in which days or months of vocalizations must be analyzed, methods for automatic recognition would be strongly desired. Although methods for automatic speech recognition for application purposes have been intensively studied, blindly applying them for biological purposes may not be an optimal solution. This is because, unlike human speech recognition, analysis of sequential vocalizations often requires accurate extraction of timing information. In the present study we propose automated systems suitable for recognizing birdsong, one of the most intensively investigated sequential vocalizations, focusing on the three properties of the birdsong. First, a song is a sequence of vocal elements, called notes, which can be grouped into categories. Second, temporal structure of birdsong is precisely controlled, meaning that temporal information is important in song analysis. Finally, notes are produced according to certain probabilistic rules, which may facilitate the accurate song recognition. We divided the procedure of song recognition into three sub-steps: local classification, boundary detection, and global sequencing, each of which corresponds to each of the three properties of birdsong. We compared the performances of several different ways to arrange these three steps. As results, we demonstrated a hybrid model of a deep convolutional neural network and a hidden Markov model was effective. We propose suitable arrangements of methods according to whether accurate boundary detection is needed. Also we designed the new measure to jointly evaluate the accuracy of note classification and boundary detection. Our methods should be applicable, with small modification and tuning, to the songs in other species that hold the three properties of the sequential vocalization.

  18. Automatic Recognition of Element Classes and Boundaries in the Birdsong with Variable Sequences

    Science.gov (United States)

    Okanoya, Kazuo

    2016-01-01

    Researches on sequential vocalization often require analysis of vocalizations in long continuous sounds. In such studies as developmental ones or studies across generations in which days or months of vocalizations must be analyzed, methods for automatic recognition would be strongly desired. Although methods for automatic speech recognition for application purposes have been intensively studied, blindly applying them for biological purposes may not be an optimal solution. This is because, unlike human speech recognition, analysis of sequential vocalizations often requires accurate extraction of timing information. In the present study we propose automated systems suitable for recognizing birdsong, one of the most intensively investigated sequential vocalizations, focusing on the three properties of the birdsong. First, a song is a sequence of vocal elements, called notes, which can be grouped into categories. Second, temporal structure of birdsong is precisely controlled, meaning that temporal information is important in song analysis. Finally, notes are produced according to certain probabilistic rules, which may facilitate the accurate song recognition. We divided the procedure of song recognition into three sub-steps: local classification, boundary detection, and global sequencing, each of which corresponds to each of the three properties of birdsong. We compared the performances of several different ways to arrange these three steps. As results, we demonstrated a hybrid model of a deep convolutional neural network and a hidden Markov model was effective. We propose suitable arrangements of methods according to whether accurate boundary detection is needed. Also we designed the new measure to jointly evaluate the accuracy of note classification and boundary detection. Our methods should be applicable, with small modification and tuning, to the songs in other species that hold the three properties of the sequential vocalization. PMID:27442240

  19. Detection and Separation of Speech Events in Meeting Recordings Using a Microphone Array

    Directory of Open Access Journals (Sweden)

    Yamada Miichi

    2007-01-01

    Full Text Available When applying automatic speech recognition (ASR to meeting recordings including spontaneous speech, the performance of ASR is greatly reduced by the overlap of speech events. In this paper, a method of separating the overlapping speech events by using an adaptive beamforming (ABF framework is proposed. The main feature of this method is that all the information necessary for the adaptation of ABF, including microphone calibration, is obtained from meeting recordings based on the results of speech-event detection. The performance of the separation is evaluated via ASR using real meeting recordings.

  20. Detection and Separation of Speech Events in Meeting Recordings Using a Microphone Array

    Directory of Open Access Journals (Sweden)

    Futoshi Asano

    2007-07-01

    Full Text Available When applying automatic speech recognition (ASR to meeting recordings including spontaneous speech, the performance of ASR is greatly reduced by the overlap of speech events. In this paper, a method of separating the overlapping speech events by using an adaptive beamforming (ABF framework is proposed. The main feature of this method is that all the information necessary for the adaptation of ABF, including microphone calibration, is obtained from meeting recordings based on the results of speech-event detection. The performance of the separation is evaluated via ASR using real meeting recordings.

  1. Comparing Speech Recognition Systems (Microsoft API, Google API And CMU Sphinx

    Directory of Open Access Journals (Sweden)

    Veton Këpuska

    2017-03-01

    Full Text Available The idea of this paper is to design a tool that will be used to test and compare commercial speech recognition systems, such as Microsoft Speech API and Google Speech API, with open-source speech recognition systems such as Sphinx-4. The best way to compare automatic speech recognition systems in different environments is by using some audio recordings that were selected from different sources and calculating the word error rate (WER. Although the WER of the three aforementioned systems were acceptable, it was observed that the Google API is superior.

  2. Towards Understanding Spontaneous Speech Word Accuracy vs. Concept Accuracy

    CERN Document Server

    Boros, M; Gallwitz, F; Goerz, G; Hanrieder, G; Niemann, H

    1996-01-01

    In this paper we describe an approach to automatic evaluation of both the speech recognition and understanding capabilities of a spoken dialogue system for train time table information. We use word accuracy for recognition and concept accuracy for understanding performance judgement. Both measures are calculated by comparing these modules' output with a correct reference answer. We report evaluation results for a spontaneous speech corpus with about 10000 utterances. We observed a nearly linear relationship between word accuracy and concept accuracy.

  3. Estimation of Speech Lip Features from Discrete Cosinus Transform

    OpenAIRE

    Ming, Zuheng; Beautemps, Denis; Feng, Gang; Schmerber, Sébastien,

    2010-01-01

    International audience; This study is a contribution to the field of visual speech processing. It focuses on the automatic extraction of Speech lip features from natural lips. The method is based on the direct prediction of these features from predictors derived from an adequate transformation of the pixels of the lip region of interest. The transformation is made of a 2-D Discrete Cosine Transform combined with a Principal Component Analysis applied to a subset of the DCT coefficients corres...

  4. Fast and accurate automatic structure prediction with HHpred.

    Science.gov (United States)

    Hildebrand, Andrea; Remmert, Michael; Biegert, Andreas; Söding, Johannes

    2009-01-01

    Automated protein structure prediction is becoming a mainstream tool for biological research. This has been fueled by steady improvements of publicly available automated servers over the last decade, in particular their ability to build good homology models for an increasing number of targets by reliably detecting and aligning more and more remotely homologous templates. Here, we describe the three fully automated versions of the HHpred server that participated in the community-wide blind protein structure prediction competition CASP8. What makes HHpred unique is the combination of usability, short response times (typically under 15 min) and a model accuracy that is competitive with those of the best servers in CASP8.

  5. Automatic Construction of Accurate Models of Physical Systems

    Science.gov (United States)

    1998-07-10

    order to develop an AI-adapted global optimization tool that combines qualitative reasoning and local numerical methods; see the Annals of Mathematics and...34Global Solutions for Nonlinear Systems using Qualitative Reasoning," Annals of Mathematics and Artificial Intelligence (1998) • • E. Bradley and

  6. Robust Speech Recognition Using a Harmonic Model

    Institute of Scientific and Technical Information of China (English)

    许超; 曹志刚

    2004-01-01

    Automatic speech recognition under conditions of a noisy environment remains a challenging problem. Traditionally, methods focused on noise structure, such as spectral subtraction, have been employed to address this problem, and thus the performance of such methods depends on the accuracy in noise estimation. In this paper, an alternative method, using a harmonic-based spectral reconstruction algorithm, is proposed for the enhancement of robust automatic speech recognition. Neither noise estimation nor noise-model training are required in the proposed approach. A spectral subtraction integrated autocorrelation function is proposed to determine the pitch for the harmonic model. Recognition results show that the harmonic-based spectral reconstruction approach outperforms spectral subtraction in the middle- and low-signal noise ratio (SNR) ranges. The advantage of the proposed method is more manifest for non-stationary noise, as the algorithm does not require an assumption of stationary noise.

  7. Finding Difficult Speakers in Automatic Speaker Recognition

    OpenAIRE

    Stoll, Lara Lynn

    2011-01-01

    The task of automatic speaker recognition, wherein a system verifies or determines a speaker's identity using a sample of speech, has been studied for a few decades. In that time, a great deal of progress has been made in improving the accuracy of the system's decisions, through the use of more successful machine learning algorithms, and the application of channel compensation techniques and other methodologies aimed at addressing sources of errors such as noise or data mismatch. In general, ...

  8. Emotion Recognition from Persian Speech with Neural Network

    Directory of Open Access Journals (Sweden)

    Mina Hamidi

    2012-10-01

    Full Text Available In this paper, we report an effort towards automatic recognition of emotional states from continuousPersian speech. Due to the unavailability of appropriate database in the Persian language for emotionrecognition, at first, we built a database of emotional speech in Persian. This database consists of 2400wave clips modulated with anger, disgust, fear, sadness, happiness and normal emotions. Then we extractprosodic features, including features related to the pitch, intensity and global characteristics of the speechsignal. Finally, we applied neural networks for automatic recognition of emotion. The resulting averageaccuracy was about 78%.

  9. Post-editing through Speech Recognition

    DEFF Research Database (Denmark)

    Mesa-Lao, Bartolomé

    In the past couple of years automatic speech recognition (ASR) software has quietly created a niche for itself in many situations of our lives. Nowadays it can be found at the other end of customer-support hotlines, it is built into operating systems and it is offered as an alternative text......-input method for smartphones. On another front, given the significant improvements in Machine Translation (MT) quality and the increasing demand for translations, post-editing of MT is becoming a popular practice in the translation industry, since it has been shown to allow for larger volumes of translations...... to be produced saving time and costs. The translation industry is at a deeply transformative point in its evolution and the coming years herald an era of converge where speech technology could make a difference. As post-editing services are becoming a common practice among language service providers and speech...

  10. Speech production in amplitude-modulated noise

    DEFF Research Database (Denmark)

    Macdonald, Ewen N; Raufer, Stefan

    2013-01-01

    The Lombard effect refers to the phenomenon where talkers automatically increase their level of speech in a noisy environment. While many studies have characterized how the Lombard effect influences different measures of speech production (e.g., F0, spectral tilt, etc.), few have investigated...... the consequences of temporally fluctuating noise. In the present study, 20 talkers produced speech in a variety of noise conditions, including both steady-state and amplitude-modulated white noise. While listening to noise over headphones, talkers produced randomly generated five word sentences. Similar...... to previous studies, talkers raised the level of their voice in steady-state noise. While talkers also increased the level of their voice in amplitude-modulated noise, the increase was not as large as that observed in steady-state noise. Importantly, for the 2 and 4 Hz amplitude-modulated noise conditions...

  11. Automated Gesturing for Virtual Characters: Speech-driven and Text-driven Approaches

    Directory of Open Access Journals (Sweden)

    Goranka Zoric

    2006-04-01

    Full Text Available We present two methods for automatic facial gesturing of graphically embodied animated agents. In one case, conversational agent is driven by speech in automatic Lip Sync process. By analyzing speech input, lip movements are determined from the speech signal. Another method provides virtual speaker capable of reading plain English text and rendering it in a form of speech accompanied by the appropriate facial gestures. Proposed statistical model for generating virtual speaker’s facial gestures can be also applied as addition to lip synchronization process in order to obtain speech driven facial gesturing. In this case statistical model will be triggered with the input speech prosody instead of lexical analysis of the input text.

  12. Phonetic alignment for speech synthesis in under-resourced languages

    CSIR Research Space (South Africa)

    Van Niekerk, DR

    2009-09-01

    Full Text Available The rapid development of concatenative speech synthesis systems in resource scarce languages requires an efficient and accurate solution with regard to automated phonetic alignment. However, in this context corpora are often minimally designed due...

  13. Adaptive Recognition of Phonemes from Speaker - Connected-Speech Using Alisa.

    Science.gov (United States)

    Osella, Stephen Albert

    The purpose of this dissertation research is to investigate a novel approach to automatic speech recognition (ASR). The successes that have been achieved in ASR have relied heavily on the use of a language grammar, which significantly constrains the ASR process. By using grammar to provide most of the recognition ability, the ASR system does not have to be as accurate at the low-level recognition stage. The ALISA Phonetic Transcriber (APT) algorithm is proposed as a way to improve ASR by enhancing the lowest -level recognition stage. The objective of the APT algorithm is to classify speech frames (a short sequence of speech signal samples) into a small set of phoneme classes. The APT algorithm constructs the mapping from speech frames to phoneme labels through a multi-layer feedforward process. A design principle of APT is that final decisions are delayed as long as possible. Instead of attempting to optimize the decision making at each processing level individually, each level generates a list of candidate solutions that are passed on to the next level of processing. The later processing levels use these candidate solutions to resolve ambiguities. The scope of this dissertation is the design of the APT algorithm up to the speech-frame classification stage. In future research, the APT algorithm will be extended to the word recognition stage. In particular, the APT algorithm could serve as the front-end stage to a Hidden Markov Model (HMM) based word recognition system. In such a configuration, the APT algorithm would provide the HMM with the requisite phoneme state-probability estimates. To date, the APT algorithm has been tested with the TIMIT and NTIMIT speech databases. The APT algorithm has been trained and tested on the SX and SI sentence texts using both male and female speakers. Results indicate better performance than those results obtained using a neural network based speech-frame classifier. The performance of the APT algorithm has been evaluated for

  14. Heart Rate Extraction from Vowel Speech Signals

    Institute of Scientific and Technical Information of China (English)

    Abdelwadood Mesleh; Dmitriy Skopin; Sergey Baglikov; Anas Quteishat

    2012-01-01

    This paper presents a novel non-contact heart rate extraction method from vowel speech signals.The proposed method is based on modeling the relationship between speech production of vowel speech signals and heart activities for humans where it is observed that the moment of heart beat causes a short increment (evolution) of vowel speech formants.The short-time Fourier transform (STFT) is used to detect the formant maximum peaks so as to accurately estimate the heart rate.Compared with traditional contact pulse oximeter,the average accuracy of the proposed non-contact heart rate extraction method exceeds 95%.The proposed non-contact heart rate extraction method is expected to play an important role in modern medical applications.

  15. Modeling Speech Intelligibility in Hearing Impaired Listeners

    DEFF Research Database (Denmark)

    Scheidiger, Christoph; Jørgensen, Søren; Dau, Torsten

    2014-01-01

    Models of speech intelligibility (SI) have a long history, starting with the articulation index (AI, [17]), followed by the SI index (SI I, [18]) and the speech transmission index (STI, [7]), to only name a few. However, these models fail to accurately predict SI with nonlinearly processed noisy...... speech, e.g. phase jitter or spectral subtraction. Recent studies predict SI for normal-hearing (NH) listeners based on a signal-to-noise ratio measure in the envelope domain (SNRenv), in the framework of the speech-based envelope power spectrum model (sEPSM, [20, 21]). These models have shown good...... agreement with measured data under a broad range of conditions, including stationary and modulated interferers, reverberation, and spectral subtraction. Despite the advances in modeling intelligibility in NH listeners, a broadly applicable model that can predict SI in hearing-impaired (HI) listeners...

  16. Delayed Speech or Language Development

    Science.gov (United States)

    ... to 2-Year-Old Delayed Speech or Language Development KidsHealth > For Parents > Delayed Speech or Language Development ... child is right on schedule. Normal Speech & Language Development It's important to discuss early speech and language ...

  17. The role of visual spatial attention in audiovisual speech perception

    DEFF Research Database (Denmark)

    Andersen, Tobias; Tiippana, K.; Laarni, J.

    2009-01-01

    integration did not change. Visual spatial attention was also able to select between the faces when lip reading. This suggests that visual spatial attention acts at the level of visual speech perception prior to audiovisual integration and that the effect propagates through audiovisual integration......Auditory and visual information is integrated when perceiving speech, as evidenced by the McGurk effect in which viewing an incongruent talking face categorically alters auditory speech perception. Audiovisual integration in speech perception has long been considered automatic and pre-attentive...... but recent reports have challenged this view. Here we study the effect of visual spatial attention on the McGurk effect. By presenting a movie of two faces symmetrically displaced to each side of a central fixation point and dubbed with a single auditory speech track, we were able to discern the influences...

  18. Auto Spell Suggestion for High Quality Speech Synthesis in Hindi

    Science.gov (United States)

    Kabra, Shikha; Agarwal, Ritika

    2014-02-01

    The goal of Text-to-Speech (TTS) synthesis in a particular language is to convert arbitrary input text to intelligible and natural sounding speech. However, for a particular language like Hindi, which is a highly confusing language (due to very close spellings), it is not an easy task to identify errors/mistakes in input text and an incorrect text degrade the quality of output speech hence this paper is a contribution to the development of high quality speech synthesis with the involvement of Spellchecker which generates spell suggestions for misspelled words automatically. Involvement of spellchecker would increase the efficiency of speech synthesis by providing spell suggestions for incorrect input text. Furthermore, we have provided the comparative study for evaluating the resultant effect on to phonetic text by adding spellchecker on to input text.

  19. Fifty years of progress in speech understanding systems

    Science.gov (United States)

    Zue, Victor

    2004-10-01

    Researchers working on human-machine interfaces realized nearly 50 years ago that automatic speech recognition (ASR) alone is not sufficient; one needs to impart linguistic knowledge to the system such that the signal could ultimately be understood. A speech understanding system combines speech recognition (i.e., the speech to symbols conversion) with natural language processing (i.e., the symbol to meaning transformation) to achieve understanding. Speech understanding research dates back to the DARPA Speech Understanding Project in the early 1970s. However, large-scale efforts only began in earnest in the late 1980s, with government research programs in the U.S. and Europe providing the impetus. This has resulted in many innovations including novel approaches to natural language understanding (NLU) for speech input, and integration techniques for ASR and NLU. In the past decade, speech understanding systems have become major building blocks of conversational interfaces that enable users to access and manage information using spoken dialogue, incorporating language generation, discourse modeling, dialogue management, and speech synthesis. Today, we are at the threshold of developing multimodal interfaces, augmenting sound with sight and touch. This talk will highlight past work and speculate on the future. [Work supported by an industrial consortium of the MIT Oxygen Project.

  20. Automatic detection of articulation disorders in children with cleft lip and palate.

    Science.gov (United States)

    Maier, Andreas; Hönig, Florian; Bocklet, Tobias; Nöth, Elmar; Stelzle, Florian; Nkenke, Emeka; Schuster, Maria

    2009-11-01

    Speech of children with cleft lip and palate (CLP) is sometimes still disordered even after adequate surgical and nonsurgical therapies. Such speech shows complex articulation disorders, which are usually assessed perceptually, consuming time and manpower. Hence, there is a need for an easy to apply and reliable automatic method. To create a reference for an automatic system, speech data of 58 children with CLP were assessed perceptually by experienced speech therapists for characteristic phonetic disorders at the phoneme level. The first part of the article aims to detect such characteristics by a semiautomatic procedure and the second to evaluate a fully automatic, thus simple, procedure. The methods are based on a combination of speech processing algorithms. The semiautomatic method achieves moderate to good agreement (kappa approximately 0.6) for the detection of all phonetic disorders. On a speaker level, significant correlations between the perceptual evaluation and the automatic system of 0.89 are obtained. The fully automatic system yields a correlation on the speaker level of 0.81 to the perceptual evaluation. This correlation is in the range of the inter-rater correlation of the listeners. The automatic speech evaluation is able to detect phonetic disorders at an experts'level without any additional human postprocessing.

  1. Unvoiced Speech Recognition Using Tissue-Conductive Acoustic Sensor

    Directory of Open Access Journals (Sweden)

    Heracleous Panikos

    2007-01-01

    Full Text Available We present the use of stethoscope and silicon NAM (nonaudible murmur microphones in automatic speech recognition. NAM microphones are special acoustic sensors, which are attached behind the talker's ear and can capture not only normal (audible speech, but also very quietly uttered speech (nonaudible murmur. As a result, NAM microphones can be applied in automatic speech recognition systems when privacy is desired in human-machine communication. Moreover, NAM microphones show robustness against noise and they might be used in special systems (speech recognition, speech transform, etc. for sound-impaired people. Using adaptation techniques and a small amount of training data, we achieved for a 20 k dictation task a word accuracy for nonaudible murmur recognition in a clean environment. In this paper, we also investigate nonaudible murmur recognition in noisy environments and the effect of the Lombard reflex on nonaudible murmur recognition. We also propose three methods to integrate audible speech and nonaudible murmur recognition using a stethoscope NAM microphone with very promising results.

  2. Unobtrusive multimodal emotion detection in adaptive interfaces: speech and facial expressions

    NARCIS (Netherlands)

    Truong, K.P.; Leeuwen, D.A. van; Neerincx, M.A.

    2007-01-01

    Two unobtrusive modalities for automatic emotion recognition are discussed: speech and facial expressions. First, an overview is given of emotion recognition studies based on a combination of speech and facial expressions. We will identify difficulties concerning data collection, data fusion, system

  3. Unobtrusive multimodal emotion detection in adaptive interfaces: speech and facial expressions

    NARCIS (Netherlands)

    Truong, K.P.; Leeuwen, D.A. van; Neerincx, M.A.

    2007-01-01

    Two unobtrusive modalities for automatic emotion recognition are discussed: speech and facial expressions. First, an overview is given of emotion recognition studies based on a combination of speech and facial expressions. We will identify difficulties concerning data collection, data fusion, system

  4. A Spoken Access Approach for Chinese Text and Speech Information Retrieval.

    Science.gov (United States)

    Chien, Lee-Feng; Wang, Hsin-Min; Bai, Bo-Ren; Lin, Sun-Chein

    2000-01-01

    Presents an efficient spoken-access approach for both Chinese text and Mandarin speech information retrieval. Highlights include human-computer interaction via voice input, speech query recognition at the syllable level, automatic term suggestion, relevance feedback techniques, and experiments that show an improvement in the effectiveness of…

  5. Speech Recognition Software for Language Learning: Toward an Evaluation of Validity and Student Perceptions

    Science.gov (United States)

    Cordier, Deborah

    2009-01-01

    A renewed focus on foreign language (FL) learning and speech for communication has resulted in computer-assisted language learning (CALL) software developed with Automatic Speech Recognition (ASR). ASR features for FL pronunciation (Lafford, 2004) are functional components of CALL designs used for FL teaching and learning. The ASR features…

  6. Automatic Teller Machine for disable users and its security issues

    OpenAIRE

    Ali, Liaqat; Jahankhani, Hamid; Jahankhani, Hossein

    2007-01-01

    Automatic Teller Machine is highly beneficial in banking industry. The banking industry forcefully promotes the use of ATM cards. In spite of the success and extensive use of Automatic Teller Machine, a large percentage of bank clients can not use them and it has been noted that no importance has been given to the feature of the accessibility at all. The significant number of disable users in bank industry had experience many difficulties in their interaction with these ATM’s. Speech technolo...

  7. An effective cluster-based model for robust speech detection and speech recognition in noisy environments.

    Science.gov (United States)

    Górriz, J M; Ramírez, J; Segura, J C; Puntonet, C G

    2006-07-01

    This paper shows an accurate speech detection algorithm for improving the performance of speech recognition systems working in noisy environments. The proposed method is based on a hard decision clustering approach where a set of prototypes is used to characterize the noisy channel. Detecting the presence of speech is enabled by a decision rule formulated in terms of an averaged distance between the observation vector and a cluster-based noise model. The algorithm benefits from using contextual information, a strategy that considers not only a single speech frame but also a neighborhood of data in order to smooth the decision function and improve speech detection robustness. The proposed scheme exhibits reduced computational cost making it adequate for real time applications, i.e., automated speech recognition systems. An exhaustive analysis is conducted on the AURORA 2 and AURORA 3 databases in order to assess the performance of the algorithm and to compare it to existing standard voice activity detection (VAD) methods. The results show significant improvements in detection accuracy and speech recognition rate over standard VADs such as ITU-T G.729, ETSI GSM AMR, and ETSI AFE for distributed speech recognition and a representative set of recently reported VAD algorithms.

  8. 78 FR 49693 - Speech-to-Speech and Internet Protocol (IP) Speech-to-Speech Telecommunications Relay Services...

    Science.gov (United States)

    2013-08-15

    ... COMMISSION 47 CFR Part 64 Speech-to-Speech and Internet Protocol (IP) Speech-to-Speech Telecommunications Relay Services; Telecommunications Relay Services and Speech-to-Speech Services for Individuals With... this document, the Commission amends telecommunications relay services (TRS) mandatory...

  9. THE BASIS FOR SPEECH PREVENTION

    Directory of Open Access Journals (Sweden)

    Jordan JORDANOVSKI

    1997-06-01

    Full Text Available The speech is a tool for accurate communication of ideas. When we talk about speech prevention as a practical realization of the language, we are referring to the fact that it should be comprised of the elements of the criteria as viewed from the perspective of the standards. This criteria, in the broad sense of the word, presupposes an exact realization of the thought expressed between the speaker and the recipient.The absence of this criterion catches the eye through the practical realization of the language and brings forth consequences, often hidden very deeply in the human psyche. Their outer manifestation already represents a delayed reaction of the social environment. The foundation for overcoming and standardization of this phenomenon must be the anatomy-physiological patterns of the body, accomplished through methods in concordance with the nature of the body.

  10. Speech and Language Impairments

    Science.gov (United States)

    ... impairment. Many children are identified as having a speech or language impairment after they enter the public school system. A teacher may notice difficulties in a child’s speech or communication skills and refer the child for ...

  11. Speech 7 through 12.

    Science.gov (United States)

    Nederland Independent School District, TX.

    GRADES OR AGES: Grades 7 through 12. SUBJECT MATTER: Speech. ORGANIZATION AND PHYSICAL APPEARANCE: Following the foreward, philosophy and objectives, this guide presents a speech curriculum. The curriculum covers junior high and Speech I, II, III (senior high). Thirteen units of study are presented for junior high, each unit is divided into…

  12. Automatic lexical classification: bridging research and practice.

    Science.gov (United States)

    Korhonen, Anna

    2010-08-13

    Natural language processing (NLP)--the automatic analysis, understanding and generation of human language by computers--is vitally dependent on accurate knowledge about words. Because words change their behaviour between text types, domains and sub-languages, a fully accurate static lexical resource (e.g. a dictionary, word classification) is unattainable. Researchers are now developing techniques that could be used to automatically acquire or update lexical resources from textual data. If successful, the automatic approach could considerably enhance the accuracy and portability of language technologies, such as machine translation, text mining and summarization. This paper reviews the recent and on-going research in automatic lexical acquisition. Focusing on lexical classification, it discusses the many challenges that still need to be met before the approach can benefit NLP on a large scale.

  13. UMLS-based automatic image indexing.

    Science.gov (United States)

    Sneiderman, C; Sneiderman, Charles Alan; Demner-Fushman, D; Demner-Fushman, Dina; Fung, K W; Fung, Kin Wah; Bray, B; Bray, Bruce

    2008-01-01

    To date, most accurate image retrieval techniques rely on textual descriptions of images. Our goal is to automatically generate indexing terms for an image extracted from a biomedical article by identifying Unified Medical Language System (UMLS) concepts in image caption and its discussion in the text. In a pilot evaluation of the suggested image indexing method by five physicians, a third of the automatically identified index terms were found suitable for indexing.

  14. Commercial applications of speech interface technology: an industry at the threshold.

    Science.gov (United States)

    Oberteuffer, J A

    1995-10-24

    Speech interface technology, which includes automatic speech recognition, synthetic speech, and natural language processing, is beginning to have a significant impact on business and personal computer use. Today, powerful and inexpensive microprocessors and improved algorithms are driving commercial applications in computer command, consumer, data entry, speech-to-text, telephone, and voice verification. Robust speaker-independent recognition systems for command and navigation in personal computers are now available; telephone-based transaction and database inquiry systems using both speech synthesis and recognition are coming into use. Large-vocabulary speech interface systems for document creation and read-aloud proofing are expanding beyond niche markets. Today's applications represent a small preview of a rich future for speech interface technology that will eventually replace keyboards with microphones and loud-speakers to give easy accessibility to increasingly intelligent machines.

  15. Speech in spinocerebellar ataxia.

    Science.gov (United States)

    Schalling, Ellika; Hartelius, Lena

    2013-12-01

    Spinocerebellar ataxias (SCAs) are a heterogeneous group of autosomal dominant cerebellar ataxias clinically characterized by progressive ataxia, dysarthria and a range of other concomitant neurological symptoms. Only a few studies include detailed characterization of speech symptoms in SCA. Speech symptoms in SCA resemble ataxic dysarthria but symptoms related to phonation may be more prominent. One study to date has shown an association between differences in speech and voice symptoms related to genotype. More studies of speech and voice phenotypes are motivated, to possibly aid in clinical diagnosis. In addition, instrumental speech analysis has been demonstrated to be a reliable measure that may be used to monitor disease progression or therapy outcomes in possible future pharmacological treatments. Intervention by speech and language pathologists should go beyond assessment. Clinical guidelines for management of speech, communication and swallowing need to be developed for individuals with progressive cerebellar ataxia.

  16. Speech rate in Parkinson's disease: A controlled study.

    Science.gov (United States)

    Martínez-Sánchez, F; Meilán, J J G; Carro, J; Gómez Íñiguez, C; Millian-Morell, L; Pujante Valverde, I M; López-Alburquerque, T; López, D E

    2016-09-01

    Speech disturbances will affect most patients with Parkinson's disease (PD) over the course of the disease. The origin and severity of these symptoms are of clinical and diagnostic interest. To evaluate the clinical pattern of speech impairment in PD patients and identify significant differences in speech rate and articulation compared to control subjects. Speech rate and articulation in a reading task were measured using an automatic analytical method. A total of 39 PD patients in the 'on' state and 45 age-and sex-matched asymptomatic controls participated in the study. None of the patients experienced dyskinesias or motor fluctuations during the test. The patients with PD displayed a significant reduction in speech and articulation rates; there were no significant correlations between the studied speech parameters and patient characteristics such as L-dopa dose, duration of the disorder, age, and UPDRS III scores and Hoehn & Yahr scales. Patients with PD show a characteristic pattern of declining speech rate. These results suggest that in PD, disfluencies are the result of the movement disorder affecting the physiology of speech production systems. Copyright © 2014 Sociedad Española de Neurología. Publicado por Elsevier España, S.L.U. All rights reserved.

  17. Automated Speech and Audio Analysis for Semantic Access to Multimedia

    NARCIS (Netherlands)

    Jong, F.M.G. de; Ordelman, R.; Huijbregts, M.

    2006-01-01

    The deployment and integration of audio processing tools can enhance the semantic annotation of multimedia content, and as a consequence, improve the effectiveness of conceptual access tools. This paper overviews the various ways in which automatic speech and audio analysis can contribute to

  18. Automated speech and audio analysis for semantic access to multimedia

    NARCIS (Netherlands)

    de Jong, Franciska M.G.; Ordelman, Roeland J.F.; Huijbregts, M.A.H.; Avrithis, Y.; Kompatsiaris, Y.; Staab, S.; O' Connor, N.E.

    2006-01-01

    The deployment and integration of audio processing tools can enhance the semantic annotation of multimedia content, and as a consequence, improve the effectiveness of conceptual access tools. This paper overviews the various ways in which automatic speech and audio analysis can contribute to

  19. Story Segmentation for Speech Transcripts in Sparse Data Conditions

    NARCIS (Netherlands)

    Werff, van der Laurens

    2010-01-01

    Information Retrieval systems determine relevance by comparing information needs with the content of potential retrieval units. Unlike most textual data, automatically generated speech transcripts cannot by default be easily divided into obvious retrieval units due to a lack of explicit structural m

  20. Term clouds as surrogates for user generated speech

    NARCIS (Netherlands)

    M. Tsagkias; M. Larson; M. de Rijke

    2008-01-01

    User generated spoken audio remains a challenge for Automatic Speech Recognition (ASR) technology and content-based audio surrogates derived from ASR-transcripts must be error robust. An investigation of the use of term clouds as surrogates for podcasts demonstrates that ASR term clouds closely appr

  1. Intonation and Dialog Context as Constraints for Speech Recognition.

    Science.gov (United States)

    Taylor, Paul; King, Simon; Isard, Stephen; Wright, Helen

    1998-01-01

    Describes how to use intonation and dialog context to improve the performance of an automatic speech-recognition system. Experiments utilized the DCIEM Maptask corpus, using a separate bigram language model for each type of move and showing that, with the correct move-specific language model for each utterance in the test set, the recognizer's…

  2. Digital speech processing using Matlab

    CERN Document Server

    Gopi, E S

    2014-01-01

    Digital Speech Processing Using Matlab deals with digital speech pattern recognition, speech production model, speech feature extraction, and speech compression. The book is written in a manner that is suitable for beginners pursuing basic research in digital speech processing. Matlab illustrations are provided for most topics to enable better understanding of concepts. This book also deals with the basic pattern recognition techniques (illustrated with speech signals using Matlab) such as PCA, LDA, ICA, SVM, HMM, GMM, BPN, and KSOM.

  3. A Self-Instructional Device for Conditioning Accurate Prosody.

    Science.gov (United States)

    Buiten, Roger; Lane, Harlan

    1965-01-01

    A self-instructional device for conditioning accurate prosody in second-language learning is described in this article. The Speech Auto-Instructional Device (SAID) is electro-mechanical and performs three functions: SAID (1) presents to the student tape-recorded pattern sentences that are considered standards in prosodic performance; (2) processes…

  4. Frequent word section extraction in a presentation speech by an effective dynamic programming algorithm.

    Science.gov (United States)

    Itoh, Yoshiaki; Tanaka, Kazuyo

    2004-08-01

    Word frequency in a document has often been utilized in text searching and summarization. Similarly, identifying frequent words or phrases in a speech data set for searching and summarization would also be meaningful. However, obtaining word frequency in a speech data set is difficult, because frequent words are often special terms in the speech and cannot be recognized by a general speech recognizer. This paper proposes another approach that is effective for automatic extraction of such frequent word sections in a speech data set. The proposed method is applicable to any domain of monologue speech, because no language models or specific terms are required in advance. The extracted sections can be regarded as speech labels of some kind or a digest of the speech presentation. The frequent word sections are determined by detecting similar sections, which are sections of audio data that represent the same word or phrase. The similar sections are detected by an efficient algorithm, called Shift Continuous Dynamic Programming (Shift CDP), which realizes fast matching between arbitrary sections in the reference speech pattern and those in the input speech, and enables frame-synchronous extraction of similar sections. In experiments, the algorithm is applied to extract the repeated sections in oral presentation speeches recorded in academic conferences in Japan. The results show that Shift CDP successfully detects similar sections and identifies the frequent word sections in individual presentation speeches, without prior domain knowledge, such as language models and terms.

  5. Automatic assessment of vowel space area.

    Science.gov (United States)

    Sandoval, Steven; Berisha, Visar; Utianski, Rene L; Liss, Julie M; Spanias, Andreas

    2013-11-01

    Vowel space area (VSA) is an attractive metric for the study of speech production deficits and reductions in intelligibility, in addition to the traditional study of vowel distinctiveness. Traditional VSA estimates are not currently sufficiently sensitive to map to production deficits. The present report describes an automated algorithm using healthy, connected speech rather than single syllables and estimates the entire vowel working space rather than corner vowels. Analyses reveal a strong correlation between the traditional VSA and automated estimates. When the two methods diverge, the automated method seems to provide a more accurate area since it accounts for all vowels.

  6. Speech intelligibility metrics in small unoccupied classrooms

    Science.gov (United States)

    Cruikshank, Matthew E.; Carney, Melinda J.; Cheenne, Dominique J.

    2005-04-01

    Nine small volume classrooms in schools located in the Chicago suburbs were tested to quantify speech intelligibility at various seat locations. Several popular intelligibility metrics were investigated, including Speech Transmission Index (STI), %Alcons, Signal to Noise Ratios (SNR), and 80 ms Useful/Detrimental Ratios (U80). Incorrect STI values were experienced in high noise environments, while the U80s and the SNRs were found to be the most accurate methodologies. Test results are evaluated against the guidelines of ANSI S12.60-2002, and match the data from previous research.

  7. Detection and identification of speech sounds using cortical activity patterns.

    Science.gov (United States)

    Centanni, T M; Sloan, A M; Reed, A C; Engineer, C T; Rennaker, R L; Kilgard, M P

    2014-01-31

    We have developed a classifier capable of locating and identifying speech sounds using activity from rat auditory cortex with an accuracy equivalent to behavioral performance and without the need to specify the onset time of the speech sounds. This classifier can identify speech sounds from a large speech set within 40 ms of stimulus presentation. To compare the temporal limits of the classifier to behavior, we developed a novel task that requires rats to identify individual consonant sounds from a stream of distracter consonants. The classifier successfully predicted the ability of rats to accurately identify speech sounds for syllable presentation rates up to 10 syllables per second (up to 17.9 ± 1.5 bits/s), which is comparable to human performance. Our results demonstrate that the spatiotemporal patterns generated in primary auditory cortex can be used to quickly and accurately identify consonant sounds from a continuous speech stream without prior knowledge of the stimulus onset times. Improved understanding of the neural mechanisms that support robust speech processing in difficult listening conditions could improve the identification and treatment of a variety of speech-processing disorders. Copyright © 2013 IBRO. Published by Elsevier Ltd. All rights reserved.

  8. Speech-to-Speech Relay Service

    Science.gov (United States)

    ... to make an STS call. You are then connected to an STS CA who will repeat your spoken words, making the spoken words clear to the other party. Persons with speech disabilities may also receive STS calls. The calling ...

  9. A Generative Model of Speech Production in Broca's and Wernicke's Areas.

    Science.gov (United States)

    Price, Cathy J; Crinion, Jenny T; Macsweeney, Mairéad

    2011-01-01

    Speech production involves the generation of an auditory signal from the articulators and vocal tract. When the intended auditory signal does not match the produced sounds, subsequent articulatory commands can be adjusted to reduce the difference between the intended and produced sounds. This requires an internal model of the intended speech output that can be compared to the produced speech. The aim of this functional imaging study was to identify brain activation related to the internal model of speech production after activation related to vocalization, auditory feedback, and movement in the articulators had been controlled. There were four conditions: silent articulation of speech, non-speech mouth movements, finger tapping, and visual fixation. In the speech conditions, participants produced the mouth movements associated with the words "one" and "three." We eliminated auditory feedback from the spoken output by instructing participants to articulate these words without producing any sound. The non-speech mouth movement conditions involved lip pursing and tongue protrusions to control for movement in the articulators. The main difference between our speech and non-speech mouth movement conditions is that prior experience producing speech sounds leads to the automatic and covert generation of auditory and phonological associations that may play a role in predicting auditory feedback. We found that, relative to non-speech mouth movements, silent speech activated Broca's area in the left dorsal pars opercularis and Wernicke's area in the left posterior superior temporal sulcus. We discuss these results in the context of a generative model of speech production and propose that Broca's and Wernicke's areas may be involved in predicting the speech output that follows articulation. These predictions could provide a mechanism by which rapid movement of the articulators is precisely matched to the intended speech outputs during future articulations.

  10. A generative model of speech production in Broca’s and Wernicke’s areas

    Directory of Open Access Journals (Sweden)

    Cathy J Price

    2011-09-01

    Full Text Available Speech production involves the generation of an auditory signal from the articulators and vocal tract. When the intended auditory signal does not match the produced sounds, subsequent articulatory commands can be adjusted to reduce the difference between the intended and produced sounds. This requires an internal model of the intended speech output that can be compared to the produced speech. The aim of this functional imaging study was to identify brain activation related to the internal model of speech production after activation related to vocalisation, auditory feedback and movement in the articulators had been controlled. There were four conditions: silent articulation of speech, non-speech mouth movements, finger tapping and visual fixation. In the speech conditions, participants produced the mouth movements associated with the words one and three. We eliminated auditory feedback from the spoken output by instructing participants to articulate these words without producing any sound. The non-speech mouth movement conditions involved lip pursing and tongue protrusions to control for movement in the articulators. The main difference between our speech and non-speech mouth movement conditions is that prior experience producing speech sounds leads to the automatic and covert generation of auditory and phonological associations that may play a role in predicting auditory feedback. We found that, relative to non-speech mouth movements, silent speech activated Broca's area in the left dorsal pars opercularis and Wernicke's area in the left posterior superior temporal sulcus. We discuss these results in the context of a generative model of speech production and propose that Broca’s and Wernicke’s areas may be involved in predicting the speech output that follows articulation. These predictions could provide a mechanism by which rapid movement of the articulators is precisely matched to the intended speech outputs during future articulations.

  11. Exploration of Speech Planning and Producing by Speech Error Analysis

    Institute of Scientific and Technical Information of China (English)

    冷卉

    2012-01-01

    Speech error analysis is an indirect way to discover speech planning and producing processes. From some speech errors made by people in their daily life, linguists and learners can reveal the planning and producing processes more easily and clearly.

  12. Accurate guitar tuning by cochlear implant musicians.

    Directory of Open Access Journals (Sweden)

    Thomas Lu

    Full Text Available Modern cochlear implant (CI users understand speech but find difficulty in music appreciation due to poor pitch perception. Still, some deaf musicians continue to perform with their CI. Here we show unexpected results that CI musicians can reliably tune a guitar by CI alone and, under controlled conditions, match simultaneously presented tones to <0.5 Hz. One subject had normal contralateral hearing and produced more accurate tuning with CI than his normal ear. To understand these counterintuitive findings, we presented tones sequentially and found that tuning error was larger at ∼ 30 Hz for both subjects. A third subject, a non-musician CI user with normal contralateral hearing, showed similar trends in performance between CI and normal hearing ears but with less precision. This difference, along with electric analysis, showed that accurate tuning was achieved by listening to beats rather than discriminating pitch, effectively turning a spectral task into a temporal discrimination task.

  13. SPEECH DELAY IN THE PRACTICE OF A PAEDIATRICIAN AND CHILD’S NEUROLOGIST

    Directory of Open Access Journals (Sweden)

    N. N. Zavadenko

    2015-01-01

    Full Text Available The article describes the main clinical forms and causes of speech delay in children. It presents modern data on the role of neurobiological factors in the speech delay pathogenesis, including early organic damage to the central nervous system due to the pregnancy and childbirth pathology, as well as genetic mechanisms. For early and accurate diagnosis of speech disorders in children, you need to consider normal patterns of speech development. The article presents indicators of pre-speech and speech development in children and describes the screening method for determining the speech delay. The main areas of complex correction are speech therapy, psycho-pedagogical and psychotherapeutic assistance, as well as pharmaceutical treatment. The capabilities of drug therapy for dysphasia (alalia are shown. 

  14. Recognition memory in noise for speech of varying intelligibility.

    Science.gov (United States)

    Gilbert, Rachael C; Chandrasekaran, Bharath; Smiljanic, Rajka

    2014-01-01

    This study investigated the extent to which noise impacts normal-hearing young adults' speech processing of sentences that vary in intelligibility. Intelligibility and recognition memory in noise were examined for conversational and clear speech sentences recorded in quiet (quiet speech, QS) and in response to the environmental noise (noise-adapted speech, NAS). Results showed that (1) increased intelligibility through conversational-to-clear speech modifications led to improved recognition memory and (2) NAS presented a more naturalistic speech adaptation to noise compared to QS, leading to more accurate word recognition and enhanced sentence recognition memory. These results demonstrate that acoustic-phonetic modifications implemented in listener-oriented speech enhance speech-in-noise processing beyond word recognition. Effortful speech processing in challenging listening environments can thus be improved by speaking style adaptations on the part of the talker. In addition to enhanced intelligibility, a substantial improvement in recognition memory can be achieved through speaker adaptations to the environment and to the listener when in adverse conditions.

  15. Indirect Speech Acts

    Institute of Scientific and Technical Information of China (English)

    李威

    2001-01-01

    Indirect speech acts are frequently used in verbal communication, the interpretation of them is of great importance in order to meet the demands of the development of students' communicative competence. This paper, therefore, intends to present Searle' s indirect speech acts and explore the way how indirect speech acts are interpreted in accordance with two influential theories. It consists of four parts. Part one gives a general introduction to the notion of speech acts theory. Part two makes an elaboration upon the conception of indirect speech act theory proposed by Searle and his supplement and development of illocutionary acts. Part three deals with the interpretation of indirect speech acts. Part four draws implication from the previous study and also serves as the conclusion of the dissertation.

  16. Emotional State Categorization from Speech: Machine vs. Human

    CERN Document Server

    Shaukat, Arslan

    2010-01-01

    This paper presents our investigations on emotional state categorization from speech signals with a psychologically inspired computational model against human performance under the same experimental setup. Based on psychological studies, we propose a multistage categorization strategy which allows establishing an automatic categorization model flexibly for a given emotional speech categorization task. We apply the strategy to the Serbian Emotional Speech Corpus (GEES) and the Danish Emotional Speech Corpus (DES), where human performance was reported in previous psychological studies. Our work is the first attempt to apply machine learning to the GEES corpus where the human recognition rates were only available prior to our study. Unlike the previous work on the DES corpus, our work focuses on a comparison to human performance under the same experimental settings. Our studies suggest that psychology-inspired systems yield behaviours that, to a great extent, resemble what humans perceived and their performance ...

  17. Esophageal speeches modified by the Speech Enhancer Program®

    OpenAIRE

    Manochiopinig, Sriwimon; Boonpramuk, Panuthat

    2014-01-01

    Esophageal speech appears to be the first choice of speech treatment for a laryngectomy. However, many laryngectomy people are unable to speak well. The aim of this study was to evaluate post-modified speech quality of Thai esophageal speakers using the Speech Enhancer Program®. The method adopted was to approach five speech–language pathologists to assess the speech accuracy and intelligibility of the words and continuing speech of the seven laryngectomy people. A comparison study was conduc...

  18. Principles of speech coding

    CERN Document Server

    Ogunfunmi, Tokunbo

    2010-01-01

    It is becoming increasingly apparent that all forms of communication-including voice-will be transmitted through packet-switched networks based on the Internet Protocol (IP). Therefore, the design of modern devices that rely on speech interfaces, such as cell phones and PDAs, requires a complete and up-to-date understanding of the basics of speech coding. Outlines key signal processing algorithms used to mitigate impairments to speech quality in VoIP networksOffering a detailed yet easily accessible introduction to the field, Principles of Speech Coding provides an in-depth examination of the

  19. Ear, Hearing and Speech

    DEFF Research Database (Denmark)

    Poulsen, Torben

    2000-01-01

    An introduction is given to the the anatomy and the function of the ear, basic psychoacoustic matters (hearing threshold, loudness, masking), the speech signal and speech intelligibility. The lecture note is written for the course: Fundamentals of Acoustics and Noise Control (51001)......An introduction is given to the the anatomy and the function of the ear, basic psychoacoustic matters (hearing threshold, loudness, masking), the speech signal and speech intelligibility. The lecture note is written for the course: Fundamentals of Acoustics and Noise Control (51001)...

  20. Advances in Speech Recognition

    CERN Document Server

    Neustein, Amy

    2010-01-01

    This volume is comprised of contributions from eminent leaders in the speech industry, and presents a comprehensive and in depth analysis of the progress of speech technology in the topical areas of mobile settings, healthcare and call centers. The material addresses the technical aspects of voice technology within the framework of societal needs, such as the use of speech recognition software to produce up-to-date electronic health records, not withstanding patients making changes to health plans and physicians. Included will be discussion of speech engineering, linguistics, human factors ana

  1. Evaluation of an Expert System for the Generation of Speech and Language Therapy Plans.

    Science.gov (United States)

    Robles-Bykbaev, Vladimir; López-Nores, Martín; García-Duque, Jorge; Pazos-Arias, José J; Arévalo-Lucero, Daysi

    2016-07-01

    Speech and language pathologists (SLPs) deal with a wide spectrum of disorders, arising from many different conditions, that affect voice, speech, language, and swallowing capabilities in different ways. Therefore, the outcomes of Speech and Language Therapy (SLT) are highly dependent on the accurate, consistent, and complete design of personalized therapy plans. However, SLPs often have very limited time to work with their patients and to browse the large (and growing) catalogue of activities and specific exercises that can be put into therapy plans. As a consequence, many plans are suboptimal and fail to address the specific needs of each patient. We aimed to evaluate an expert system that automatically generates plans for speech and language therapy, containing semiannual activities in the five areas of hearing, oral structure and function, linguistic formulation, expressive language and articulation, and receptive language. The goal was to assess whether the expert system speeds up the SLPs' work and leads to more accurate, consistent, and complete therapy plans for their patients. We examined the evaluation results of the SPELTA expert system in supporting the decision making of 4 SLPs treating children in three special education institutions in Ecuador. The expert system was first trained with data from 117 cases, including medical data; diagnosis for voice, speech, language and swallowing capabilities; and therapy plans created manually by the SLPs. It was then used to automatically generate new therapy plans for 13 new patients. The SLPs were finally asked to evaluate the accuracy, consistency, and completeness of those plans. A four-fold cross-validation experiment was also run on the original corpus of 117 cases in order to assess the significance of the results. The evaluation showed that 87% of the outputs provided by the SPELTA expert system were considered valid therapy plans for the different areas. The SLPs rated the overall accuracy, consistency

  2. Recognition of Emotions in Mexican Spanish Speech: An Approach Based on Acoustic Modelling of Emotion-Specific Vowels

    OpenAIRE

    Santiago-Omar Caballero-Morales

    2013-01-01

    An approach for the recognition of emotions in speech is presented. The target language is Mexican Spanish, and for this purpose a speech database was created. The approach consists in the phoneme acoustic modelling of emotion-specific vowels. For this, a standard phoneme-based Automatic Speech Recognition (ASR) system was built with Hidden Markov Models (HMMs), where different phoneme HMMs were built for the consonants and emotion-specific vowels associated with four emotional states (anger,...

  3. MMSE based Noise Tracking and Echo Cancellation of Speech Signals

    Directory of Open Access Journals (Sweden)

    Praveen. N

    2014-03-01

    Full Text Available During the recent years, there have been many studies on automatic audio classification and segmentation to use several features and techniques. The signal recorded at a microphone in a room incorporates the direct arrival of the sound from the source as well as the multiple weaker copies of the same signal that are created by sound reflections off the room walls. We call it as acoustic. Due to complications associated with exact tracking of the target in three dimensional environments, in many applications such as source localization and speech recognition the reverberate patterns imposed by the environment are seen as undesirable. In an auditorium, the common disturbances in public speech are Noise, Acoustic Echo etc. The MMSE estimator is one of the algorithms proposed for removal of additive background noise. It is a single channel speech enhancement technique for the enhancement of speech degraded by additive background noise. Background noise can effect our conversation in a noisy environment like in streets or in a car, when sending speech from the cockpit of an airplane to the ground or to the cabin and can effect both quality and intelligibility of speech. With the passage of time Spectral subtraction has undergone many modifications. This is a review paper and its objective is to provide an overview of MMSE estimator that has been proposed for enhancement of speech degraded by additive background noise during past decades

  4. Human phoneme recognition depending on speech-intrinsic variability.

    Science.gov (United States)

    Meyer, Bernd T; Jürgens, Tim; Wesker, Thorsten; Brand, Thomas; Kollmeier, Birger

    2010-11-01

    The influence of different sources of speech-intrinsic variation (speaking rate, effort, style and dialect or accent) on human speech perception was investigated. In listening experiments with 16 listeners, confusions of consonant-vowel-consonant (CVC) and vowel-consonant-vowel (VCV) sounds in speech-weighted noise were analyzed. Experiments were based on the OLLO logatome speech database, which was designed for a man-machine comparison. It contains utterances spoken by 50 speakers from five dialect/accent regions and covers several intrinsic variations. By comparing results depending on intrinsic and extrinsic variations (i.e., different levels of masking noise), the degradation induced by variabilities can be expressed in terms of the SNR. The spectral level distance between the respective speech segment and the long-term spectrum of the masking noise was found to be a good predictor for recognition rates, while phoneme confusions were influenced by the distance to spectrally close phonemes. An analysis based on transmitted information of articulatory features showed that voicing and manner of articulation are comparatively robust cues in the presence of intrinsic variations, whereas the coding of place is more degraded. The database and detailed results have been made available for comparisons between human speech recognition (HSR) and automatic speech recognizers (ASR).

  5. Speech recognition systems on the Cell Broadband Engine

    Energy Technology Data Exchange (ETDEWEB)

    Liu, Y; Jones, H; Vaidya, S; Perrone, M; Tydlitat, B; Nanda, A

    2007-04-20

    In this paper we describe our design, implementation, and first results of a prototype connected-phoneme-based speech recognition system on the Cell Broadband Engine{trademark} (Cell/B.E.). Automatic speech recognition decodes speech samples into plain text (other representations are possible) and must process samples at real-time rates. Fortunately, the computational tasks involved in this pipeline are highly data-parallel and can receive significant hardware acceleration from vector-streaming architectures such as the Cell/B.E. Identifying and exploiting these parallelism opportunities is challenging, but also critical to improving system performance. We observed, from our initial performance timings, that a single Cell/B.E. processor can recognize speech from thousands of simultaneous voice channels in real time--a channel density that is orders-of-magnitude greater than the capacity of existing software speech recognizers based on CPUs (central processing units). This result emphasizes the potential for Cell/B.E.-based speech recognition and will likely lead to the future development of production speech systems using Cell/B.E. clusters.

  6. Dual Key Speech Encryption Algorithm Based Underdetermined BSS

    Directory of Open Access Journals (Sweden)

    Huan Zhao

    2014-01-01

    Full Text Available When the number of the mixed signals is less than that of the source signals, the underdetermined blind source separation (BSS is a significant difficult problem. Due to the fact that the great amount data of speech communications and real-time communication has been required, we utilize the intractability of the underdetermined BSS problem to present a dual key speech encryption method. The original speech is mixed with dual key signals which consist of random key signals (one-time pad generated by secret seed and chaotic signals generated from chaotic system. In the decryption process, approximate calculation is used to recover the original speech signals. The proposed algorithm for speech signals encryption can resist traditional attacks against the encryption system, and owing to approximate calculation, decryption becomes faster and more accurate. It is demonstrated that the proposed method has high level of security and can recover the original signals quickly and efficiently yet maintaining excellent audio quality.

  7. Dual key speech encryption algorithm based underdetermined BSS.

    Science.gov (United States)

    Zhao, Huan; He, Shaofang; Chen, Zuo; Zhang, Xixiang

    2014-01-01

    When the number of the mixed signals is less than that of the source signals, the underdetermined blind source separation (BSS) is a significant difficult problem. Due to the fact that the great amount data of speech communications and real-time communication has been required, we utilize the intractability of the underdetermined BSS problem to present a dual key speech encryption method. The original speech is mixed with dual key signals which consist of random key signals (one-time pad) generated by secret seed and chaotic signals generated from chaotic system. In the decryption process, approximate calculation is used to recover the original speech signals. The proposed algorithm for speech signals encryption can resist traditional attacks against the encryption system, and owing to approximate calculation, decryption becomes faster and more accurate. It is demonstrated that the proposed method has high level of security and can recover the original signals quickly and efficiently yet maintaining excellent audio quality.

  8. Speech-Language Therapy (For Parents)

    Science.gov (United States)

    ... Feeding Your 1- to 2-Year-Old Speech-Language Therapy KidsHealth > For Parents > Speech-Language Therapy A ... with speech and/or language disorders. Speech Disorders, Language Disorders, and Feeding Disorders A speech disorder refers ...

  9. Speech Compression for Noise-Corrupted Thai Expressive Speech

    Directory of Open Access Journals (Sweden)

    Suphattharachai Chomphan

    2011-01-01

    Full Text Available Problem statement: In speech communication, speech coding aims at preserving the speech quality with lower coding bitrate. When considering the communication environment, various types of noises deteriorates the speech quality. The expressive speech with different speaking styles may cause different speech quality with the same coding method. Approach: This research proposed a study of speech compression for noise-corrupted Thai expressive speech by using two coding methods of CS-ACELP and MP-CELP. The speech material included a hundredmale speech utterances and a hundred female speech utterances. Four speaking styles included enjoyable, sad, angry and reading styles. Five sentences of Thai speech were chosen. Three types of noises were included (train, car and air conditioner. Five levels of each type of noise were varied from 0-20 dB. The subjective test of mean opinion score was exploited in the evaluation process. Results: The experimental results showed that CS-ACELP gave the better speech quality than that of MP-CELP at all three bitrates of 6000, 8600-12600 bps. When considering the levels of noise, the 20-dB noise gave the best speech quality, while 0-dB noise gave the worst speech quality. When considering the speech gender, female speech gave the better results than that of male speech. When considering the types of noise, the air-conditioner noise gave the best speech quality, while the train noise gave the worst speech quality. Conclusion: From the study, it can be seen that coding methods, types of noise, levels of noise, speech gender influence on the coding speech quality.

  10. Tracking Speech Sound Acquisition

    Science.gov (United States)

    Powell, Thomas W.

    2011-01-01

    This article describes a procedure to aid in the clinical appraisal of child speech. The approach, based on the work by Dinnsen, Chin, Elbert, and Powell (1990; Some constraints on functionally disordered phonologies: Phonetic inventories and phonotactics. "Journal of Speech and Hearing Research", 33, 28-37), uses a railway idiom to track gains in…

  11. Preschool Connected Speech Inventory.

    Science.gov (United States)

    DiJohnson, Albert; And Others

    This speech inventory developed for a study of aurally handicapped preschool children (see TM 001 129) provides information on intonation patterns in connected speech. The inventory consists of a list of phrases and simple sentences accompanied by pictorial clues. The test is individually administered by a teacher-examiner who presents the spoken…

  12. Illustrated Speech Anatomy.

    Science.gov (United States)

    Shearer, William M.

    Written for students in the fields of speech correction and audiology, the text deals with the following: structures involved in respiration; the skeleton and the processes of inhalation and exhalation; phonation and pitch, the larynx, and esophageal speech; muscles involved in articulation; muscles involved in resonance; and the anatomy of the…

  13. Private Speech in Ballet

    Science.gov (United States)

    Johnston, Dale

    2006-01-01

    Authoritarian teaching practices in ballet inhibit the use of private speech. This paper highlights the critical importance of private speech in the cognitive development of young ballet students, within what is largely a non-verbal art form. It draws upon research by Russian psychologist Lev Vygotsky and contemporary socioculturalists, to…

  14. Private Speech in Ballet

    Science.gov (United States)

    Johnston, Dale

    2006-01-01

    Authoritarian teaching practices in ballet inhibit the use of private speech. This paper highlights the critical importance of private speech in the cognitive development of young ballet students, within what is largely a non-verbal art form. It draws upon research by Russian psychologist Lev Vygotsky and contemporary socioculturalists, to…

  15. Tracking Speech Sound Acquisition

    Science.gov (United States)

    Powell, Thomas W.

    2011-01-01

    This article describes a procedure to aid in the clinical appraisal of child speech. The approach, based on the work by Dinnsen, Chin, Elbert, and Powell (1990; Some constraints on functionally disordered phonologies: Phonetic inventories and phonotactics. "Journal of Speech and Hearing Research", 33, 28-37), uses a railway idiom to track gains in…

  16. Free Speech Yearbook 1976.

    Science.gov (United States)

    Phifer, Gregg, Ed.

    The articles collected in this annual address several aspects of First Amendment Law. The following titles are included: "Freedom of Speech As an Academic Discipline" (Franklyn S. Haiman), "Free Speech and Foreign-Policy Decision Making" (Douglas N. Freeman), "The Supreme Court and the First Amendment: 1975-1976"…

  17. Preschool Connected Speech Inventory.

    Science.gov (United States)

    DiJohnson, Albert; And Others

    This speech inventory developed for a study of aurally handicapped preschool children (see TM 001 129) provides information on intonation patterns in connected speech. The inventory consists of a list of phrases and simple sentences accompanied by pictorial clues. The test is individually administered by a teacher-examiner who presents the spoken…

  18. Advertising and Free Speech.

    Science.gov (United States)

    Hyman, Allen, Ed.; Johnson, M. Bruce, Ed.

    The articles collected in this book originated at a conference at which legal and economic scholars discussed the issue of First Amendment protection for commercial speech. The first article, in arguing for freedom for commercial speech, finds inconsistent and untenable the arguments of those who advocate freedom from regulation for political…

  19. Free Speech. No. 38.

    Science.gov (United States)

    Kane, Peter E., Ed.

    This issue of "Free Speech" contains the following articles: "Daniel Schoor Relieved of Reporting Duties" by Laurence Stern, "The Sellout at CBS" by Michael Harrington, "Defending Dan Schorr" by Tome Wicker, "Speech to the Washington Press Club, February 25, 1976" by Daniel Schorr, "Funds Voted For Schorr Inquiry" by Richard Lyons, "Erosion of the…

  20. Syllable Segmentation of Farsi Continuous Speech using Wavelet Coefficients Thresholding and Fuzzy Smoothing of Energy Contour

    Directory of Open Access Journals (Sweden)

    Ghazaal Sheikhi

    2013-10-01

    Full Text Available Syllable, as a sub-word unit, nowadays plays an active role in the field of speech processing and recognition research according to its robust relation to human speech production and cognition. Automatic syllable boundaries detection is an important step forward in the areas of speech prosody, natural speech synthesis and speech recognition. In this paper, a novel method in automatic syllabification of Farsi continuous speech based on acoustic structure is proposed. Our previous studies, showed the proficiency of energy contour fuzzy smoothing method, compared with other prominent works in this area. This paper suggests that the conventional methodology-used in speech enhancement based on wavelet coefficient thresholding would improve syllable segmentation by decreasing insertion error. This process declines the energy in high energy consonants which are responsible for extra peaks in short term energy contour. Experimental results showed that utilizing proposed method along with fuzzy smoothing would diminish insertion error about 8% with no reasonable effect on other efficiency criteria. More than 94% of syllables are automatically segmented using presented technique with less than 50ms error.

  1. Charisma in business speeches

    DEFF Research Database (Denmark)

    Niebuhr, Oliver; Brem, Alexander; Novák-Tót, Eszter

    2016-01-01

    to business speeches. Consistent with the public opinion, our findings are indicative of Steve Jobs being a more charismatic speaker than Mark Zuckerberg. Beyond previous studies, our data suggest that rhythm and emphatic accentuation are also involved in conveying charisma. Furthermore, the differences......Charisma is a key component of spoken language interaction; and it is probably for this reason that charismatic speech has been the subject of intensive research for centuries. However, what is still largely missing is a quantitative and objective line of research that, firstly, involves analyses...... of the acoustic-prosodic signal, secondly, focuses on business speeches like product presentations, and, thirdly, in doing so, advances the still fairly fragmentary evidence on the prosodic correlates of charismatic speech. We show that the prosodic features of charisma in political speeches also apply...

  2. One-against-all weighted dynamic time warping for language-independent and speaker-dependent speech recognition in adverse conditions.

    Directory of Open Access Journals (Sweden)

    Xianglilan Zhang

    Full Text Available Considering personal privacy and difficulty of obtaining training material for many seldom used English words and (often non-English names, language-independent (LI with lightweight speaker-dependent (SD automatic speech recognition (ASR is a promising option to solve the problem. The dynamic time warping (DTW algorithm is the state-of-the-art algorithm for small foot-print SD ASR applications with limited storage space and small vocabulary, such as voice dialing on mobile devices, menu-driven recognition, and voice control on vehicles and robotics. Even though we have successfully developed two fast and accurate DTW variations for clean speech data, speech recognition for adverse conditions is still a big challenge. In order to improve recognition accuracy in noisy environment and bad recording conditions such as too high or low volume, we introduce a novel one-against-all weighted DTW (OAWDTW. This method defines a one-against-all index (OAI for each time frame of training data and applies the OAIs to the core DTW process. Given two speech signals, OAWDTW tunes their final alignment score by using OAI in the DTW process. Our method achieves better accuracies than DTW and merge-weighted DTW (MWDTW, as 6.97% relative reduction of error rate (RRER compared with DTW and 15.91% RRER compared with MWDTW are observed in our extensive experiments on one representative SD dataset of four speakers' recordings. To the best of our knowledge, OAWDTW approach is the first weighted DTW specially designed for speech data in adverse conditions.

  3. 基于言语情境分析的数字语音篡改检测%Digital speech tamper detection based on speaking conditions

    Institute of Scientific and Technical Information of China (English)

    丁琦; 平西建

    2011-01-01

    针对使用拼接手段的数字语音篡改,提出一种基于言语情境分析的篡改检测方法.该方法从背景噪声分析和说话人状态特征分析两方面入手,把语音信号分为语音部分和静音部分,对包含噪声的各个静音片段各帧提取时域和频域特征,对各语音片段提取韵律特征和音质特征,并分别基于贝叶斯信息准则检测特征的跳变点,通过综合判断得到篡改检测结果.实验结果表明,该方法能够比较准确地检测和定位语音拼接点.%An automatic detection method for digital speech tamper b means of stitching was proposed.This method was based on speaking condition analyses, which comprised background noise analysis and speaker fettle analysis.Speech signals were divided into speech and silence segments.For silence segments containing noise, features in time domain and frequency domain were extracted for each frame.For each speech segment, rhythm and timbre features were extracted.Features changing points of the silence segments and speech segments were detected separately based on Bayesian information criterion,and the tamper detection result was obtained by integrative decision.The experimental results show that the proposed method can detect and locate the stitching points accurately.

  4. Speaking Fluently And Accurately

    Institute of Scientific and Technical Information of China (English)

    JosephDeVeto

    2004-01-01

    Even after many years of study,students make frequent mistakes in English. In addition, many students still need a long time to think of what they want to say. For some reason, in spite of all the studying, students are still not quite fluent.When I teach, I use one technique that helps students not only speak more accurately, but also more fluently. That technique is dictations.

  5. Predicting speech intelligibility in conditions with nonlinearly processed noisy speech

    DEFF Research Database (Denmark)

    Jørgensen, Søren; Dau, Torsten

    2013-01-01

    The speech-based envelope power spectrum model (sEPSM; [1]) was proposed in order to overcome the limitations of the classical speech transmission index (STI) and speech intelligibility index (SII). The sEPSM applies the signal-tonoise ratio in the envelope domain (SNRenv), which was demonstrated...... to successfully predict speech intelligibility in conditions with nonlinearly processed noisy speech, such as processing with spectral subtraction. Moreover, a multiresolution version (mr-sEPSM) was demonstrated to account for speech intelligibility in various conditions with stationary and fluctuating...... from computational auditory scene analysis and further support the hypothesis that the SNRenv is a powerful metric for speech intelligibility prediction....

  6. SEMI-AUTOMATIC SPEAKER VERIFICATION SYSTEM

    Directory of Open Access Journals (Sweden)

    E. V. Bulgakova

    2016-03-01

    Full Text Available Subject of Research. The paper presents a semi-automatic speaker verification system based on comparing of formant values, statistics of phone lengths and melodic characteristics as well. Due to the development of speech technology, there is an increased interest now in searching for expert speaker verification systems, which have high reliability and low labour intensiveness because of the automation of data processing for the expert analysis. System Description. We present a description of a novel system analyzing similarity or distinction of speaker voices based on comparing statistics of phone lengths, formant features and melodic characteristics. The characteristic feature of the proposed system based on fusion of methods is a weak correlation between the analyzed features that leads to a decrease in the error rate of speaker recognition. The system advantage is the possibility to carry out rapid analysis of recordings since the processes of data preprocessing and making decision are automated. We describe the functioning methods as well as fusion of methods to combine their decisions. Main Results. We have tested the system on the speech database of 1190 target trials and 10450 non-target trials, including the Russian speech of the male and female speakers. The recognition accuracy of the system is 98.59% on the database containing records of the male speech, and 96.17% on the database containing records of the female speech. It was also experimentally established that the formant method is the most reliable of all used methods. Practical Significance. Experimental results have shown that proposed system is applicable for the speaker recognition task in the course of phonoscopic examination.

  7. Automatic measurement and representation of prosodic features

    Science.gov (United States)

    Ying, Goangshiuan Shawn

    Effective measurement and representation of prosodic features of the acoustic signal for use in automatic speech recognition and understanding systems is the goal of this work. Prosodic features-stress, duration, and intonation-are variations of the acoustic signal whose domains are beyond the boundaries of each individual phonetic segment. Listeners perceive prosodic features through a complex combination of acoustic correlates such as intensity, duration, and fundamental frequency (F0). We have developed new tools to measure F0 and intensity features. We apply a probabilistic global error correction routine to an Average Magnitude Difference Function (AMDF) pitch detector. A new short-term frequency-domain Teager energy algorithm is used to measure the energy of a speech signal. We have conducted a series of experiments performing lexical stress detection on words in continuous English speech from two speech corpora. We have experimented with two different approaches, a segment-based approach and a rhythm unit-based approach, in lexical stress detection. The first approach uses pattern recognition with energy- and duration-based measurements as features to build Bayesian classifiers to detect the stress level of a vowel segment. In the second approach we define rhythm unit and use only the F0-based measurement and a scoring system to determine the stressed segment in the rhythm unit. A duration-based segmentation routine was developed to break polysyllabic words into rhythm units. The long-term goal of this work is to develop a system that can effectively detect the stress pattern for each word in continuous speech utterances. Stress information will be integrated as a constraint for pruning the word hypotheses in a word recognition system based on hidden Markov models.

  8. Evaluating airborne sound insulation in terms of speech intelligibility.

    Science.gov (United States)

    Park, H K; Bradley, J S; Gover, B N

    2008-03-01

    This paper reports on an evaluation of ratings of the sound insulation of simulated walls in terms of the intelligibility of speech transmitted through the walls. Subjects listened to speech modified to simulate transmission through 20 different walls with a wide range of sound insulation ratings, with constant ambient noise. The subjects' mean speech intelligibility scores were compared with various physical measures to test the success of the measures as sound insulation ratings. The standard Sound Transmission Class (STC) and Weighted Sound Reduction Index ratings were only moderately successful predictors of intelligibility scores, and eliminating the 8 dB rule from STC led to very modest improvements. Various previously established speech intelligibility measures (e.g., Articulation Index or Speech Intelligibility Index) and measures derived from them, such as the Articulation Class, were all relatively strongly related to speech intelligibility scores. In general, measures that involved arithmetic averages or summations of decibel values over frequency bands important for speech were most strongly related to intelligibility scores. The two most accurate predictors of the intelligibility of transmitted speech were an arithmetic average transmission loss over the frequencies from 200 to 2.5 kHz and the addition of a new spectrum weighting term to R(w) that included frequencies from 400 to 2.5 kHz.

  9. Design of an expert system for phonetic speech recognition

    Energy Technology Data Exchange (ETDEWEB)

    Carbonell, N.; Haton, J.P.; Pierrel, J.M.; Lonchamp, F.

    1983-07-01

    Expert systems have been extensively used as a means for integrating the expertise of a human being into an artificial intelligence system. The authors are presently designing an expert system which will integrate the strategy and the knowledge of a phonetician reading a speech spectrogram. Their goal is twofold, firstly to obtain a better insight into the acoustic-decoding of speech, and, secondly, to improve the efficiency of present automatic phonetic recognition systems. This paper presents a preliminary description of the project, especially the overall strategy of the expert and the role of duration parameters in the segmentation and identification processes.

  10. Speech endpoint detection in real noise environments

    Institute of Scientific and Technical Information of China (English)

    GUO Yanmeng; FU Qiang; YAN Yonghong

    2007-01-01

    A method of speech endpoint detection in environments of complicated additive noise is presented. Based on the analysis of noise, an adaptive model of stationary noise is proposed to detect the section where the signal is nonstationary. Then the voice is detected in this section by its harmonic structure, and the accurate endpoint is searched using energy.Compared with the typical algorithms, this algorithm operates reliably in most real noise environments.

  11. Stochastic Modeling as a Means of Automatic Speech Recognition

    Science.gov (United States)

    1975-04-01

    posienon probability Pr( X| I :T|=x| l:T| | Y| I :T|«.y! I :T|. A, I, P, S ). where A, L, P, S represent the acoustic- phonetic , lexical, phonological , and... phonetic sequence by using multiple dictionary entries, phonological rules embedded in the dictionary, and a "degarbling" procedure. The search is...statistical model i f the hnguage. 2) a phonemic dictionary and statistical phonological rules. 3) a phonetic matching algorithm. 4) word level search

  12. Analysis of Phonetic Transcriptions for Danish Automatic Speech Recognition

    DEFF Research Database (Denmark)

    Kirkedal, Andreas Søeborg

    2013-01-01

    recognition system depends heavily on the dictionary and the transcriptions therein. This paper presents an analysis of phonetic/phonemic features that are salient for current Danish ASR systems. This preliminary study consists of a series of experiments using an ASR system trained on the DK-PAROLE corpus....... The analysis indicates that transcribing e.g. stress or vowel duration has a negative impact on performance. The best performance is obtained with coarse phonetic annotation and improves performance 1% word error rate and 3.8% sentence error rate....

  13. An algorithm for automatic speech understanding over word graphs

    OpenAIRE

    Sanchís Arnal, Emilio; Calvo Lance, Marcos; Gómez Adrian, Jon Ander; Hurtado Oliver, Lluis Felip

    2012-01-01

    [ES] En este trabajo se propone un algoritmo para la comprensión automática del habla que toma como entrada un grafo de palabras. Este grafo es procesado en primer lugar mediante un algoritmo de programación dinámica, obteniendo como resultado un segundo grafo enriquecido con información semántica. El cálculo del mejor camino sobre este segundo grafo permite obtener la secuencia de conceptos más verosímil de acuerdo con la evidencia acústica re¿ejada en el grafo de palabras. Ta...

  14. Automatic error detection in alignments for speech synthesis

    CSIR Research Space (South Africa)

    Barnard, E

    2006-11-01

    Full Text Available of the errors that existed in a manual segmentation were detected by this process, while flagging less than a quarter of all segments. Different phoneme classes are handled with differing amounts of success, with vowels being the most troublesome...

  15. Effect of talker and speaking style on the Speech Transmission Index (L)

    Science.gov (United States)

    van Wijngaarden, Sander J.; Houtgast, Tammo

    2004-01-01

    The Speech Transmission Index (STI) is routinely applied for predicting the intelligibility of messages (sentences) in noise and reverberation. Despite clear evidence that the STI is capable of doing so accurately, recent results indicate that the STI sometimes underestimates the effect of reverberation on sentence intelligibility. To investigate the influence of talker and speaking style, the Speech Reception Threshold in noise and reverberation was measured for three talkers, differing in clarity of articulation and speaking style. For very clear speech, the standard STI yields accurate results. For more conversational speech by an untrained talker, the effect of reverberation is underestimated. Measurements of the envelope spectrum reveal that conversational speech has relatively stronger contributions by higher (> 12.5 Hz) modulation frequencies. By modifying the STI calculation procedure to include modulations in the range 12.5-31.5 Hz, better results are obtained for conversational speech. A speaking-style-dependent choice for the STI modulation frequency range is proposed.

  16. Sperry Univac speech communications technology

    Science.gov (United States)

    Medress, Mark F.

    1977-01-01

    Technology and systems for effective verbal communication with computers were developed. A continuous speech recognition system for verbal input, a word spotting system to locate key words in conversational speech, prosodic tools to aid speech analysis, and a prerecorded voice response system for speech output are described.

  17. Voice and Speech after Laryngectomy

    Science.gov (United States)

    Stajner-Katusic, Smiljka; Horga, Damir; Musura, Maja; Globlek, Dubravka

    2006-01-01

    The aim of the investigation is to compare voice and speech quality in alaryngeal patients using esophageal speech (ESOP, eight subjects), electroacoustical speech aid (EACA, six subjects) and tracheoesophageal voice prosthesis (TEVP, three subjects). The subjects reading a short story were recorded in the sound-proof booth and the speech samples…

  18. Speech Correction in the Schools.

    Science.gov (United States)

    Eisenson, Jon; Ogilvie, Mardel

    An introduction to the problems and therapeutic needs of school age children whose speech requires remedial attention, the text is intended for both the classroom teacher and the speech correctionist. General considerations include classification and incidence of speech defects, speech correction services, the teacher as a speaker, the mechanism…

  19. Environmental Contamination of Normal Speech.

    Science.gov (United States)

    Harley, Trevor A.

    1990-01-01

    Environmentally contaminated speech errors (irrelevant words or phrases derived from the speaker's environment and erroneously incorporated into speech) are hypothesized to occur at a high level of speech processing, but with a relatively late insertion point. The data indicate that speech production processes are not independent of other…

  20. Automatic tracking sensor camera system

    Science.gov (United States)

    Tsuda, Takao; Kato, Daiichiro; Ishikawa, Akio; Inoue, Seiki

    2001-04-01

    We are developing a sensor camera system for automatically tracking and determining the positions of subjects moving in three-dimensions. The system is intended to operate even within areas as large as soccer fields. The system measures the 3D coordinates of the object while driving the pan and tilt movements of camera heads, and the degree of zoom of the lenses. Its principal feature is that it automatically zooms in as the object moves farther away and out as the object moves closer. This maintains the area of the object as a fixed position of the image. This feature makes stable detection by the image processing possible. We are planning to use the system to detect the position of a soccer ball during a soccer game. In this paper, we describe the configuration of the developing automatic tracking sensor camera system. We then give an analysis of the movements of the ball within images of games, the results of experiments on method of image processing used to detect the ball, and the results of other experiments to verify the accuracy of an experimental system. These results show that the system is sufficiently accurate in terms of obtaining positions in three-dimensions.

  1. Noise estimation Algorithms for Speech Enhancement in highly non-stationary Environments

    Directory of Open Access Journals (Sweden)

    Anuradha R Fukane

    2011-03-01

    Full Text Available A noise estimation algorithm plays an important role in speech enhancement. Speech enhancement for automatic speaker recognition system, Man-Machine communication, Voice recognition systems, speech coders, Hearing aids, Video conferencing and many applications are related to speech processing. All these systems are real world systems and input available for these systems is only the noisy speech signal, before applying to these systems we have to remove the noise component from noisy speech signal means enhanced speech signal can be applied to these systems. In most speech enhancement algorithms, it is assumed that an estimate of noise spectrum is available. Noise estimate is critical part and it is important for speech enhancement algorithms. If the noise estimate is too low then annoying residual noise will be available and if the noise estimate is too high then speech will get distorted and loss intelligibility. This paper focus on the different approaches of noise estimation. Section I introduction, Section II explains simple approach of Voice activity detector (VAD for noise estimation, Section III explains different classes of noise estimation algorithms, Section IV explains performance evaluation of noise estimation algorithms, Section V conclusion.

  2. Fully Automatic Expression-Invariant Face Correspondence

    CERN Document Server

    Salazar, Augusto; Shu, Chang; Prieto, Flavio

    2012-01-01

    We consider the problem of computing accurate point-to-point correspondences among a set of human face scans with varying expressions. Our fully automatic approach does not require any manually placed markers on the scan. Instead, the approach learns the locations of a set of landmarks present in a database and uses this knowledge to automatically predict the locations of these landmarks on a newly available scan. The predicted landmarks are then used to compute point-to-point correspondences between a template model and the newly available scan. To accurately fit the expression of the template to the expression of the scan, we use as template a blendshape model. Our algorithm was tested on a database of human faces of different ethnic groups with strongly varying expressions. Experimental results show that the obtained point-to-point correspondence is both highly accurate and consistent for most of the tested 3D face models.

  3. Global Freedom of Speech

    DEFF Research Database (Denmark)

    Binderup, Lars Grassme

    2007-01-01

    , as opposed to a legal norm, that curbs exercises of the right to free speech that offend the feelings or beliefs of members from other cultural groups. The paper rejects the suggestion that acceptance of such a norm is in line with liberal egalitarian thinking. Following a review of the classical liberal...... egalitarian reasons for free speech - reasons from overall welfare, from autonomy and from respect for the equality of citizens - it is argued that these reasons outweigh the proposed reasons for curbing culturally offensive speech. Currently controversial cases such as that of the Danish Cartoon Controversy...

  4. The Rhetoric in English Speech

    Institute of Scientific and Technical Information of China (English)

    马鑫

    2014-01-01

    English speech has a very long history and always attached importance of people highly. People usually give a speech in economic activities, political forums and academic reports to express their opinions to investigate or persuade others. English speech plays a rather important role in English literature. The distinct theme of speech should attribute to the rhetoric. It discusses parallelism, repetition and rhetorical question in English speech, aiming to help people appreciate better the charm of them.

  5. Dynamic weighing for accurate fertilizer application and monitoring

    NARCIS (Netherlands)

    Bergeijk, van J.; Goense, D.; Willigenburg, van L.G.; Speelman, L.

    2001-01-01

    The mass flow of fertilizer spreaders must be calibrated for the different types of fertilizers used. To obtain accurate fertilizer application manual calibration of actual mass flow must be repeated frequently. Automatic calibration is possible by measurement of the actual mass flow, based on

  6. Cued Speech Gesture Recognition: A First Prototype Based on Early Reduction

    Directory of Open Access Journals (Sweden)

    Caplier Alice

    2007-01-01

    Full Text Available Cued Speech is a specific linguistic code for hearing-impaired people. It is based on both lip reading and manual gestures. In the context of THIMP (Telephony for the Hearing-IMpaired Project, we work on automatic cued speech translation. In this paper, we only address the problem of automatic cued speech manual gesture recognition. Such a gesture recognition issue is really common from a theoretical point of view, but we approach it with respect to its particularities in order to derive an original method. This method is essentially built around a bioinspired method called early reduction. Prior to a complete analysis of each image of a sequence, the early reduction process automatically extracts a restricted number of key images which summarize the whole sequence. Only the key images are studied from a temporal point of view with lighter computation than the complete sequence.

  7. Cued Speech Gesture Recognition: A First Prototype Based on Early Reduction

    Directory of Open Access Journals (Sweden)

    Pascal Perret

    2008-01-01

    Full Text Available Cued Speech is a specific linguistic code for hearing-impaired people. It is based on both lip reading and manual gestures. In the context of THIMP (Telephony for the Hearing-IMpaired Project, we work on automatic cued speech translation. In this paper, we only address the problem of automatic cued speech manual gesture recognition. Such a gesture recognition issue is really common from a theoretical point of view, but we approach it with respect to its particularities in order to derive an original method. This method is essentially built around a bioinspired method called early reduction. Prior to a complete analysis of each image of a sequence, the early reduction process automatically extracts a restricted number of key images which summarize the whole sequence. Only the key images are studied from a temporal point of view with lighter computation than the complete sequence.

  8. Automatic Fiscal Stabilizers

    Directory of Open Access Journals (Sweden)

    Narcis Eduard Mitu

    2013-11-01

    Full Text Available Policies or institutions (built into an economic system that automatically tend to dampen economic cycle fluctuations in income, employment, etc., without direct government intervention. For example, in boom times, progressive income tax automatically reduces money supply as incomes and spendings rise. Similarly, in recessionary times, payment of unemployment benefits injects more money in the system and stimulates demand. Also called automatic stabilizers or built-in stabilizers.

  9. Cross-Language Activation Begins During Speech Planning and Extends Into Second Language Speech

    Science.gov (United States)

    Jacobs, April; Fricke, Melinda; Kroll, Judith F.

    2016-01-01

    Three groups of native English speakers named words aloud in Spanish, their second language (L2). Intermediate proficiency learners in a classroom setting (Experiment 1) and in a domestic immersion program (Experiment 2) were compared to a group of highly proficient English–Spanish speakers. All three groups named cognate words more quickly and accurately than matched noncognates, indicating that all speakers experienced cross-language activation during speech planning. However, only the classroom learners exhibited effects of cross-language activation in their articulation: Cognate words were named with shorter overall durations, but longer (more English-like) voice onset times. Inhibition of the first language during L2 speech planning appears to impact the stages of speech production at which cross-language activation patterns can be observed. PMID:27773945

  10. Study of acoustic correlates associate with emotional speech

    Science.gov (United States)

    Yildirim, Serdar; Lee, Sungbok; Lee, Chul Min; Bulut, Murtaza; Busso, Carlos; Kazemzadeh, Ebrahim; Narayanan, Shrikanth

    2004-10-01

    This study investigates the acoustic characteristics of four different emotions expressed in speech. The aim is to obtain detailed acoustic knowledge on how a speech signal is modulated by changes from neutral to a certain emotional state. Such knowledge is necessary for automatic emotion recognition and classification and emotional speech synthesis. Speech data obtained from two semi-professional actresses are analyzed and compared. Each subject produces 211 sentences with four different emotions; neutral, sad, angry, happy. We analyze changes in temporal and acoustic parameters such as magnitude and variability of segmental duration, fundamental frequency and the first three formant frequencies as a function of emotion. Acoustic differences among the emotions are also explored with mutual information computation, multidimensional scaling and acoustic likelihood comparison with normal speech. Results indicate that speech associated with anger and happiness is characterized by longer duration, shorter interword silence, higher pitch and rms energy with wider ranges. Sadness is distinguished from other emotions by lower rms energy and longer interword silence. Interestingly, the difference in formant pattern between [happiness/anger] and [neutral/sadness] are better reflected in back vowels such as /a/(/father/) than in front vowels. Detailed results on intra- and interspeaker variability will be reported.

  11. Speech and language therapy for aphasia following subacute stroke

    Directory of Open Access Journals (Sweden)

    Engin Koyuncu

    2016-01-01

    Full Text Available The aim of this study was to investigate the time window, duration and intensity of optimal speech and language therapy applied to aphasic patients with subacute stroke in our hospital. The study consisted of 33 patients being hospitalized for stroke rehabilitation in our hospital with first stroke but without previous history of speech and language therapy. Sixteen sessions of impairment-based speech and language therapy were applied to the patients, 30-60 minutes per day, 2 days a week, for 8 successive weeks. Aphasia assessment in stroke patients was performed with Gülhane Aphasia Test-2 before and after treatment. Compared with before treatment, fluency of speech, listening comprehension, reading comprehension, oral motor evaluation, automatic speech, repetition and naming were improved after treatment. This suggests that 16 seesions of speech and language therapy, 30-60 minutes per day, 2 days a week, for 8 successive weeks, are effective in the treatment of aphasic patients with subacute stroke.

  12. Speech and language therapy for aphasia following subacute stroke

    Institute of Scientific and Technical Information of China (English)

    Engin Koyuncu; Pnar am; Nermin Altnok; Duygu Ekinci all; Tuba Yarbay Duman; Nee zgirgin

    2016-01-01

    hTe aim of this study was to investigate the time window, duration and intensity of optimal speech and language therapy applied to aphasic patients with subacute stroke in our hospital. hTe study consisted of 33 patients being hospitalized for stroke rehabilitation in our hospital with ifrst stroke but without previous history of speech and language therapy. Sixteen sessions of impairment-based speech and language therapy were applied to the patients, 30–60 minutes per day, 2 days a week, for 8 successive weeks. Aphasia assess-ment in stroke patients was performed with Gülhane Aphasia Test-2 before and atfer treatment. Compared with before treatment, fluency of speech, listening comprehension, reading comprehension, oral motor evaluation, automatic speech, repetition and naming were improved atfer treatment. hTis suggests that 16 seesions of speech and language therapy, 30–60 minutes per day, 2 days a week, for 8 successive weeks, are effective in the treatment of aphasic patients with subacute stroke.

  13. On pre-image iterations for speech enhancement.

    Science.gov (United States)

    Leitner, Christina; Pernkopf, Franz

    2015-01-01

    In this paper, we apply kernel PCA for speech enhancement and derive pre-image iterations for speech enhancement. Both methods make use of a Gaussian kernel. The kernel variance serves as tuning parameter that has to be adapted according to the SNR and the desired degree of de-noising. We develop a method to derive a suitable value for the kernel variance from a noise estimate to adapt pre-image iterations to arbitrary SNRs. In experiments, we compare the performance of kernel PCA and pre-image iterations in terms of objective speech quality measures and automatic speech recognition. The speech data is corrupted by white and colored noise at 0, 5, 10, and 15 dB SNR. As a benchmark, we provide results of the generalized subspace method, of spectral subtraction, and of the minimum mean-square error log-spectral amplitude estimator. In terms of the scores of the PEASS (Perceptual Evaluation Methods for Audio Source Separation) toolbox, the proposed methods achieve a similar performance as the reference methods. The speech recognition experiments show that the utterances processed by pre-image iterations achieve a consistently better word recognition accuracy than the unprocessed noisy utterances and than the utterances processed by the generalized subspace method.

  14. Visual speech form influences the speed of auditory speech processing.

    Science.gov (United States)

    Paris, Tim; Kim, Jeesun; Davis, Chris

    2013-09-01

    An important property of visual speech (movements of the lips and mouth) is that it generally begins before auditory speech. Research using brain-based paradigms has demonstrated that seeing visual speech speeds up the activation of the listener's auditory cortex but it is not clear whether these observed neural processes link to behaviour. It was hypothesized that the very early portion of visual speech (occurring before auditory speech) will allow listeners to predict the following auditory event and so facilitate the speed of speech perception. This was tested in the current behavioural experiments. Further, we tested whether the salience of the visual speech played a role in this speech facilitation effect (Experiment 1). We also determined the relative contributions that visual form (what) and temporal (when) cues made (Experiment 2). The results showed that visual speech cues facilitated response times and that this was based on form rather than temporal cues. Copyright © 2013 Elsevier Inc. All rights reserved.

  15. Anxiety and ritualized speech

    Science.gov (United States)

    Lalljee, Mansur; Cook, Mark

    1975-01-01

    The experiment examines the effects on a number of words that seem irrelevant to semantic communication. The Units of Ritualized Speech (URSs) considered are: 'I mean', 'in fact', 'really', 'sort of', 'well' and 'you know'. (Editor)

  16. Anxiety and ritualized speech

    Science.gov (United States)

    Lalljee, Mansur; Cook, Mark

    1975-01-01

    The experiment examines the effects on a number of words that seem irrelevant to semantic communication. The Units of Ritualized Speech (URSs) considered are: 'I mean', 'in fact', 'really', 'sort of', 'well' and 'you know'. (Editor)

  17. HATE SPEECH AS COMMUNICATION

    National Research Council Canada - National Science Library

    Gladilin Aleksey Vladimirovich

    2012-01-01

    The purpose of the paper is a theoretical comprehension of hate speech from communication point of view, on the one hand, and from the point of view of prejudice, stereotypes and discrimination on the other...

  18. Speech intelligibility in hospitals.

    Science.gov (United States)

    Ryherd, Erica E; Moeller, Michael; Hsu, Timothy

    2013-07-01

    Effective communication between staff members is key to patient safety in hospitals. A variety of patient care activities including admittance, evaluation, and treatment rely on oral communication. Surprisingly, published information on speech intelligibility in hospitals is extremely limited. In this study, speech intelligibility measurements and occupant evaluations were conducted in 20 units of five different U.S. hospitals. A variety of unit types and locations were studied. Results show that overall, no unit had "good" intelligibility based on the speech intelligibility index (SII > 0.75) and several locations found to have "poor" intelligibility (SII speech intelligibility across a variety of hospitals and unit types, offers some evidence of the positive impact of absorption on intelligibility, and identifies areas for future research.

  19. Speech disorders - children

    Science.gov (United States)

    ... this page: //medlineplus.gov/ency/article/001430.htm Speech disorders - children To use the sharing features on ... 2017, A.D.A.M., Inc. Duplication for commercial use must be authorized in writing by ADAM ...

  20. Speech impairment (adult)

    Science.gov (United States)

    ... this page: //medlineplus.gov/ency/article/003204.htm Speech impairment (adult) To use the sharing features on ... 2017, A.D.A.M., Inc. Duplication for commercial use must be authorized in writing by ADAM ...

  1. Speech Compression and Synthesis

    Science.gov (United States)

    1980-10-01

    phonological rules combined with diphone improved the algorithms used by the phonetic synthesis prog?Im for gain normalization and time... phonetic vocoder, spectral template. i0^Th^TreprtTörc"u’d1sTuV^ork for the past two years on speech compression’and synthesis. Since there was an...from Block 19: speech recognition, pnoneme recogmtion. initial design for a phonetic recognition program. We also recorded ana partially labeled a

  2. Comparative wavelet, PLP, and LPC speech recognition techniques on the Hindi speech digits database

    Science.gov (United States)

    Mishra, A. N.; Shrotriya, M. C.; Sharan, S. N.

    2010-02-01

    In view of the growing use of automatic speech recognition in the modern society, we study various alternative representations of the speech signal that have the potential to contribute to the improvement of the recognition performance. In this paper wavelet based features using different wavelets are used for Hindi digits recognition. The recognition performance of these features has been compared with Linear Prediction Coefficients (LPC) and Perceptual Linear Prediction (PLP) features. All features have been tested using Hidden Markov Model (HMM) based classifier for speaker independent Hindi digits recognition. The recognition performance of PLP features is11.3% better than LPC features. The recognition performance with db10 features has shown a further improvement of 12.55% over PLP features. The recognition performance with db10 is best among all wavelet based features.

  3. 'What is it?' A functional MRI and SPECT study of ictal speech in a second language

    Energy Technology Data Exchange (ETDEWEB)

    Navarro, V.; Chauvire, V.; Baulac, M.; Cohen, L. [Department of Neurology, AP-HP, Hopital de la Pitie-Salpetriere, IFR 70, Paris (France); Delmaire, Ch.; Lehericy, St. [Department of Neuroradiology, AP-HP, Hopital de la Pitie-Salpetriere, IFR 70, Paris (France); Habert, M.O. [Department of Nuclear Medicine, AP-HP, Hopital de la Pitie-Salpetriere, IFR 70, Paris (France); Footnick, R.; Pallier, Ch. [INSERM, U562, CEA/DSV, IFR 49, Orsay (France); Baulac, M.; Cohen, L. [Universite Paris VI, Faculte Pitie-Salpetriere, Paris (France)

    2009-07-01

    Neuronal networks involved in second language (L2) processing vary between normal subjects. Patients with epilepsy may have ictal speech automatisms in their second language. To delineate the brain systems involved in L2 ictal speech, we combined functional MRI during bilingual tasks and ictal - inter-ictal single-photon emission computed tomography in a patient who presented L2 ictal speech productions. These analyses showed that the networks activated by the seizure and those activated by L2 processing intersected in the right hippocampus. These results may provide some insights both into the pathophysiology of ictal speech and into the brain organization for L2. (authors)

  4. The Effect of Feedback Schedule Manipulation on Speech Priming Patterns and Reaction Time

    Science.gov (United States)

    Slocomb, Dana; Spencer, Kristie A.

    2009-01-01

    Speech priming tasks are frequently used to delineate stages in the speech process such as lexical retrieval and motor programming. These tasks, often measured in reaction time (RT), require fast and accurate responses, reflecting maximized participant performance, to result in robust priming effects. Encouraging speed and accuracy in responding…

  5. Ascriptive speech act and legal language

    Directory of Open Access Journals (Sweden)

    Ogleznev Vitaly

    2016-01-01

    Full Text Available In this article the author explicates Herbert Hart’s theory of an ascriptive language as it has been developed in his influential early paper “The Ascription of Responsibility and Rights”. In the section ‘Discussion’ the author argues that the theory of ascriptive legal utterances, which is grounded on Austin’s and Searle’s theory of a speech act, provides the methodological basis for his analytical approach to philosophical and legal issues. In the section ‘Results’ the author justifies that an ascriptive is a specific speech (illocutionary act. In the section ‘Conclusion’ the matter concerns the original linguistic formula of an ascriptive that accurately reflects its nature. This article elaborates on the interpretation of ascriptive speech acts in legal language by evaluating the influence of philosophy of language on the formation of modern legal philosophy, along with evaluating the contribution of conceptual development of legal philosophy in the speech acts theory.

  6. Mediation and Automatization.

    Science.gov (United States)

    Hutchins, Edwin

    This paper discusses the relationship between the mediation of task performance by some structure that is not inherent in the task domain itself and the phenomenon of automatization, in which skilled performance becomes effortless or phenomenologically "automatic" after extensive practice. The use of a common simple explicit mediating…

  7. Automatic Differentiation Package

    Energy Technology Data Exchange (ETDEWEB)

    2007-03-01

    Sacado is an automatic differentiation package for C++ codes using operator overloading and C++ templating. Sacado provide forward, reverse, and Taylor polynomial automatic differentiation classes and utilities for incorporating these classes into C++ codes. Users can compute derivatives of computations arising in engineering and scientific applications, including nonlinear equation solving, time integration, sensitivity analysis, stability analysis, optimization and uncertainity quantification.

  8. Digital automatic gain control

    Science.gov (United States)

    Uzdy, Z.

    1980-01-01

    Performance analysis, used to evaluated fitness of several circuits to digital automatic gain control (AGC), indicates that digital integrator employing coherent amplitude detector (CAD) is best device suited for application. Circuit reduces gain error to half that of conventional analog AGC while making it possible to automatically modify response of receiver to match incoming signal conditions.

  9. Focusing Automatic Code Inspections

    NARCIS (Netherlands)

    Boogerd, C.J.

    2010-01-01

    Automatic Code Inspection tools help developers in early detection of defects in software. A well-known drawback of many automatic inspection approaches is that they yield too many warnings and require a clearer focus. In this thesis, we provide such focus by proposing two methods to prioritize

  10. Toward Speech and Nonverbal Behaviors Integration for Humanoid Robot

    Directory of Open Access Journals (Sweden)

    Wei Wang

    2012-09-01

    Full Text Available It is essential to integrate speeches and nonverbal behaviors for a humanoid robot in human-robot interaction. This paper presents an approach using multi-object genetic algorithm to match the speeches and behaviors automatically. Firstly, with humanoid robot's emotion status, we construct a hierarchical structure to link voice characteristics and nonverbal behaviors. Secondly, these behaviors corresponding to speeches are matched and integrated into an action sequence based on genetic algorithm, so the robot can consistently speak and perform emotional behaviors. Our approach takes advantage of relevant knowledge described by psychologists and nonverbal communication. And from experiment results, our ultimate goal, implementing an affective robot to act and speak with partners vividly and fluently, could be achieved.

  11. Exploring expressivity and emotion with artificial voice and speech technologies.

    Science.gov (United States)

    Pauletto, Sandra; Balentine, Bruce; Pidcock, Chris; Jones, Kevin; Bottaci, Leonardo; Aretoulaki, Maria; Wells, Jez; Mundy, Darren P; Balentine, James

    2013-10-01

    Emotion in audio-voice signals, as synthesized by text-to-speech (TTS) technologies, was investigated to formulate a theory of expression for user interface design. Emotional parameters were specified with markup tags, and the resulting audio was further modulated with post-processing techniques. Software was then developed to link a selected TTS synthesizer with an automatic speech recognition (ASR) engine, producing a chatbot that could speak and listen. Using these two artificial voice subsystems, investigators explored both artistic and psychological implications of artificial speech emotion. Goals of the investigation were interdisciplinary, with interest in musical composition, augmentative and alternative communication (AAC), commercial voice announcement applications, human-computer interaction (HCI), and artificial intelligence (AI). The work-in-progress points towards an emerging interdisciplinary ontology for artificial voices. As one study output, HCI tools are proposed for future collaboration.

  12. Improving the speech intelligibility in classrooms

    Science.gov (United States)

    Lam, Choi Ling Coriolanus

    One of the major acoustical concerns in classrooms is the establishment of effective verbal communication between teachers and students. Non-optimal acoustical conditions, resulting in reduced verbal communication, can cause two main problems. First, they can lead to reduce learning efficiency. Second, they can also cause fatigue, stress, vocal strain and health problems, such as headaches and sore throats, among teachers who are forced to compensate for poor acoustical conditions by raising their voices. Besides, inadequate acoustical conditions can induce the usage of public address system. Improper usage of such amplifiers or loudspeakers can lead to impairment of students' hearing systems. The social costs of poor classroom acoustics will be large to impair the learning of children. This invisible problem has far reaching implications for learning, but is easily solved. Many researches have been carried out that they have accurately and concisely summarized the research findings on classrooms acoustics. Though, there is still a number of challenging questions remaining unanswered. Most objective indices for speech intelligibility are essentially based on studies of western languages. Even several studies of tonal languages as Mandarin have been conducted, there is much less on Cantonese. In this research, measurements have been done in unoccupied rooms to investigate the acoustical parameters and characteristics of the classrooms. The speech intelligibility tests, which based on English, Mandarin and Cantonese, and the survey were carried out on students aged from 5 years old to 22 years old. It aims to investigate the differences in intelligibility between English, Mandarin and Cantonese of the classrooms in Hong Kong. The significance on speech transmission index (STI) related to Phonetically Balanced (PB) word scores will further be developed. Together with developed empirical relationship between the speech intelligibility in classrooms with the variations

  13. Computer-based speech therapy for childhood speech sound disorders.

    Science.gov (United States)

    Furlong, Lisa; Erickson, Shane; Morris, Meg E

    2017-07-01

    With the current worldwide workforce shortage of Speech-Language Pathologists, new and innovative ways of delivering therapy to children with speech sound disorders are needed. Computer-based speech therapy may be an effective and viable means of addressing service access issues for children with speech sound disorders. To evaluate the efficacy of computer-based speech therapy programs for children with speech sound disorders. Studies reporting the efficacy of computer-based speech therapy programs were identified via a systematic, computerised database search. Key study characteristics, results, main findings and details of computer-based speech therapy programs were extracted. The methodological quality was evaluated using a structured critical appraisal tool. 14 studies were identified and a total of 11 computer-based speech therapy programs were evaluated. The results showed that computer-based speech therapy is associated with positive clinical changes for some children with speech sound disorders. There is a need for collaborative research between computer engineers and clinicians, particularly during the design and development of computer-based speech therapy programs. Evaluation using rigorous experimental designs is required to understand the benefits of computer-based speech therapy. The reader will be able to 1) discuss how computerbased speech therapy has the potential to improve service access for children with speech sound disorders, 2) explain the ways in which computer-based speech therapy programs may enhance traditional tabletop therapy and 3) compare the features of computer-based speech therapy programs designed for different client populations. Copyright © 2017 Elsevier Inc. All rights reserved.

  14. SPEECH DISORDERS ENCOUNTERED DURING SPEECH THERAPY AND THERAPY TECHNIQUES

    Directory of Open Access Journals (Sweden)

    İlhan ERDEM

    2013-06-01

    Full Text Available Speech which is a physical and mental process, agreed signs and sounds to create a sense of mind to the message that change . Process to identify the sounds of speech it is essential to know the structure and function of various organs which allows to happen the conversation. Speech is a physical and mental process so many factors can lead to speech disorders. Speech disorder can be about language acquisitions as well as it can be caused medical and psychological many factors. Disordered speech, language, medical and psychological conditions as well as acquisitions also be caused by many factors. Speaking, is the collective work of many organs, such as an orchestra. Mental dimension of the speech disorder which is a very complex skill so it must be found which of these obstacles inhibit conversation. Speech disorder is a defect in speech flow, rhythm, tizliğinde, beats, the composition and vocalization. In this study, speech disorders such as articulation disorders, stuttering, aphasia, dysarthria, a local dialect speech, , language and lip-laziness, rapid speech peech defects in a term of language skills. This causes of speech disorders were investigated and presented suggestions for remedy was discussed.

  15. Degraded neural and behavioral processing of speech sounds in a rat model of Rett syndrome.

    Science.gov (United States)

    Engineer, Crystal T; Rahebi, Kimiya C; Borland, Michael S; Buell, Elizabeth P; Centanni, Tracy M; Fink, Melyssa K; Im, Kwok W; Wilson, Linda G; Kilgard, Michael P

    2015-11-01

    Individuals with Rett syndrome have greatly impaired speech and language abilities. Auditory brainstem responses to sounds are normal, but cortical responses are highly abnormal. In this study, we used the novel rat Mecp2 knockout model of Rett syndrome to document the neural and behavioral processing of speech sounds. We hypothesized that both speech discrimination ability and the neural response to speech sounds would be impaired in Mecp2 rats. We expected that extensive speech training would improve speech discrimination ability and the cortical response to speech sounds. Our results reveal that speech responses across all four auditory cortex fields of Mecp2 rats were hyperexcitable, responded slower, and were less able to follow rapidly presented sounds. While Mecp2 rats could accurately perform consonant and vowel discrimination tasks in quiet, they were significantly impaired at speech sound discrimination in background noise. Extensive speech training improved discrimination ability. Training shifted cortical responses in both Mecp2 and control rats to favor the onset of speech sounds. While training increased the response to low frequency sounds in control rats, the opposite occurred in Mecp2 rats. Although neural coding and plasticity are abnormal in the rat model of Rett syndrome, extensive therapy appears to be effective. These findings may help to explain some aspects of communication deficits in Rett syndrome and suggest that extensive rehabilitation therapy might prove beneficial.

  16. Phone Duration Modeling of Affective Speech Using Support Vector Regression

    Directory of Open Access Journals (Sweden)

    Alexandros Lazaridis

    2012-07-01

    Full Text Available In speech synthesis accurate modeling of prosody is important for producing high quality synthetic speech. One of the main aspects of prosody is phone duration. Robust phone duration modeling is a prerequisite for synthesizing emotional speech with natural sounding. In this work ten phone duration models are evaluated. These models belong to well known and widely used categories of algorithms, such as the decision trees, linear regression, lazy-learning algorithms and meta-learning algorithms. Furthermore, we investigate the effectiveness of Support Vector Regression (SVR in phone duration modeling in the context of emotional speech. The evaluation of the eleven models is performed on a Modern Greek emotional speech database which consists of four categories of emotional speech (anger, fear, joy, sadness plus neutral speech. The experimental results demonstrated that the SVR-based modeling outperforms the other ten models across all the four emotion categories. Specifically, the SVR model achieved an average relative reduction of 8% in terms of root mean square error (RMSE throughout all emotional categories.

  17. Practical speech user interface design

    CERN Document Server

    Lewis, James R

    2010-01-01

    Although speech is the most natural form of communication between humans, most people find using speech to communicate with machines anything but natural. Drawing from psychology, human-computer interaction, linguistics, and communication theory, Practical Speech User Interface Design provides a comprehensive yet concise survey of practical speech user interface (SUI) design. It offers practice-based and research-based guidance on how to design effective, efficient, and pleasant speech applications that people can really use. Focusing on the design of speech user interfaces for IVR application

  18. BIOACCESSIBILITY TESTS ACCURATELY ESTIMATE ...

    Science.gov (United States)

    Hazards of soil-borne Pb to wild birds may be more accurately quantified if the bioavailability of that Pb is known. To better understand the bioavailability of Pb to birds, we measured blood Pb concentrations in Japanese quail (Coturnix japonica) fed diets containing Pb-contaminated soils. Relative bioavailabilities were expressed by comparison with blood Pb concentrations in quail fed a Pb acetate reference diet. Diets containing soil from five Pb-contaminated Superfund sites had relative bioavailabilities from 33%-63%, with a mean of about 50%. Treatment of two of the soils with P significantly reduced the bioavailability of Pb. The bioaccessibility of the Pb in the test soils was then measured in six in vitro tests and regressed on bioavailability. They were: the “Relative Bioavailability Leaching Procedure” (RBALP) at pH 1.5, the same test conducted at pH 2.5, the “Ohio State University In vitro Gastrointestinal” method (OSU IVG), the “Urban Soil Bioaccessible Lead Test”, the modified “Physiologically Based Extraction Test” and the “Waterfowl Physiologically Based Extraction Test.” All regressions had positive slopes. Based on criteria of slope and coefficient of determination, the RBALP pH 2.5 and OSU IVG tests performed very well. Speciation by X-ray absorption spectroscopy demonstrated that, on average, most of the Pb in the sampled soils was sorbed to minerals (30%), bound to organic matter 24%, or present as Pb sulfate 18%. Ad

  19. Mandarin Digits Speech Recognition Using Support Vector Machines

    Institute of Scientific and Technical Information of China (English)

    XIE Xiang; KUANG Jing-ming

    2005-01-01

    A method of applying support vector machine (SVM) in speech recognition was proposed, and a speech recognition system for mandarin digits was built up by SVMs. In the system, vectors were linearly extracted from speech feature sequence to make up time-aligned input patterns for SVM, and the decisions of several 2-class SVM classifiers were employed for constructing an N-class classifier. Four kinds of SVM kernel functions were compared in the experiments of speaker-independent speech recognition of mandarin digits. And the kernel of radial basis function has the highest accurate rate of 99.33%, which is better than that of the baseline system based on hidden Markov models (HMM) (97.08%). And the experiments also show that SVM can outperform HMM especially when the samples for learning were very limited.

  20. Speech-Language Therapy (For Parents)

    Science.gov (United States)

    ... Speech-language pathologists (SLPs), often informally known as speech therapists, are professionals educated in the study of human ... Palate Hearing Evaluation in Children Going to a Speech Therapist Stuttering Hearing Impairment Speech Problems Cleft Lip and ...

  1. People Should Be Masters in Both Political and Cultural Areas: Toward a New “Free Speech Clause” in China

    OpenAIRE

    Zuo, Yilu

    2016-01-01

    This article tries to challenge—more accurately, to supplement—the “politico-centered” view in understanding China’s free speech. Unlike the conventional view that only treats Article 35 as China’s free speech clause and mainly focuses on political speech, this article argues that China’s “free speech clause” includes not one, but three articles: 35, 41 and 47. While Articles 35 and 41 guarantee the right to political speech, Article 47 explicitly safeguards citizens’ right to cultural constr...

  2. Speech processing using maximum likelihood continuity mapping

    Energy Technology Data Exchange (ETDEWEB)

    Hogden, John E. (Santa Fe, NM)

    2000-01-01

    Speech processing is obtained that, given a probabilistic mapping between static speech sounds and pseudo-articulator positions, allows sequences of speech sounds to be mapped to smooth sequences of pseudo-articulator positions. In addition, a method for learning a probabilistic mapping between static speech sounds and pseudo-articulator position is described. The method for learning the mapping between static speech sounds and pseudo-articulator position uses a set of training data composed only of speech sounds. The said speech processing can be applied to various speech analysis tasks, including speech recognition, speaker recognition, speech coding, speech synthesis, and voice mimicry.

  3. Speech processing using maximum likelihood continuity mapping

    Energy Technology Data Exchange (ETDEWEB)

    Hogden, J.E.

    2000-04-18

    Speech processing is obtained that, given a probabilistic mapping between static speech sounds and pseudo-articulator positions, allows sequences of speech sounds to be mapped to smooth sequences of pseudo-articulator positions. In addition, a method for learning a probabilistic mapping between static speech sounds and pseudo-articulator position is described. The method for learning the mapping between static speech sounds and pseudo-articulator position uses a set of training data composed only of speech sounds. The said speech processing can be applied to various speech analysis tasks, including speech recognition, speaker recognition, speech coding, speech synthesis, and voice mimicry.

  4. Managing the reaction effects of speech disorders on speech ...

    African Journals Online (AJOL)

    Speech disorders is responsible for defective speaking. It is usually ... They occur as a result of persistent frustrations which speech defectives usually encounter for speaking defectively. This paper ... AJOL African Journals Online. HOW TO ...

  5. Under-resourced speech recognition based on the speech manifold

    CSIR Research Space (South Africa)

    Sahraeian, R

    2015-09-01

    Full Text Available Conventional acoustic modeling involves estimating many parameters to effectively model feature distributions. The sparseness of speech and text data, however, degrades the reliability of the estimation process and makes speech recognition a...

  6. Speech training alters tone frequency tuning in rat primary auditory cortex

    Science.gov (United States)

    Engineer, Crystal T.; Perez, Claudia A.; Carraway, Ryan S.; Chang, Kevin Q.; Roland, Jarod L.; Kilgard, Michael P.

    2013-01-01

    Previous studies in both humans and animals have documented improved performance following discrimination training. This enhanced performance is often associated with cortical response changes. In this study, we tested the hypothesis that long-term speech training on multiple tasks can improve primary auditory cortex (A1) responses compared to rats trained on a single speech discrimination task or experimentally naïve rats. Specifically, we compared the percent of A1 responding to trained sounds, the responses to both trained and untrained sounds, receptive field properties of A1 neurons, and the neural discrimination of pairs of speech sounds in speech trained and naïve rats. Speech training led to accurate discrimination of consonant and vowel sounds, but did not enhance A1 response strength or the neural discrimination of these sounds. Speech training altered tone responses in rats trained on six speech discrimination tasks but not in rats trained on a single speech discrimination task. Extensive speech training resulted in broader frequency tuning, shorter onset latencies, a decreased driven response to tones, and caused a shift in the frequency map to favor tones in the range where speech sounds are the loudest. Both the number of trained tasks and the number of days of training strongly predict the percent of A1 responding to a low frequency tone. Rats trained on a single speech discrimination task performed less accurately than rats trained on multiple tasks and did not exhibit A1 response changes. Our results indicate that extensive speech training can reorganize the A1 frequency map, which may have downstream consequences on speech sound processing. PMID:24344364

  7. Analysis and prediction of acoustic speech features from mel-frequency cepstral coefficients in distributed speech recognition architectures.

    Science.gov (United States)

    Darch, Jonathan; Milner, Ben; Vaseghi, Saeed

    2008-12-01

    The aim of this work is to develop methods that enable acoustic speech features to be predicted from mel-frequency cepstral coefficient (MFCC) vectors as may be encountered in distributed speech recognition architectures. The work begins with a detailed analysis of the multiple correlation between acoustic speech features and MFCC vectors. This confirms the existence of correlation, which is found to be higher when measured within specific phonemes rather than globally across all speech sounds. The correlation analysis leads to the development of a statistical method of predicting acoustic speech features from MFCC vectors that utilizes a network of hidden Markov models (HMMs) to localize prediction to specific phonemes. Within each HMM, the joint density of acoustic features and MFCC vectors is modeled and used to make a maximum a posteriori prediction. Experimental results are presented across a range of conditions, such as with speaker-dependent, gender-dependent, and gender-independent constraints, and these show that acoustic speech features can be predicted from MFCC vectors with good accuracy. A comparison is also made against an alternative scheme that substitutes the higher-order MFCCs with acoustic features for transmission. This delivers accurate acoustic features but at the expense of a significant reduction in speech recognition accuracy.

  8. Identifying Speech Acts in E-Mails: Toward Automated Scoring of the "TOEIC"® E-Mail Task. Research Report. ETS RR-12-16

    Science.gov (United States)

    De Felice, Rachele; Deane, Paul

    2012-01-01

    This study proposes an approach to automatically score the "TOEIC"® Writing e-mail task. We focus on one component of the scoring rubric, which notes whether the test-takers have used particular speech acts such as requests, orders, or commitments. We developed a computational model for automated speech act identification and tested it…

  9. Exploring the role of low level visual processing in letter-speech sound integration: a visual MMN study

    Directory of Open Access Journals (Sweden)

    Dries Froyen

    2010-04-01

    Full Text Available In contrast with for example audiovisual speech, the relation between visual and auditory properties of letters and speech sounds is artificial and learned only by explicit instruction. The arbitrariness of the audiovisual link together with the widespread usage of letter – speech sound pairs in alphabetic languages makes those audiovisual objects a unique subject for cross-modal research. Brain imaging evidence has indicated that heteromodal areas in superior temporal, as well as modality-specific auditory cortex are involved in letter – speech sound processing. The role of low level visual areas, however, remains unclear. In this study the visual counterpart of the auditory mismatch negativity (MMN is used to investigate the influences of speech sounds on letter processing. Letter and non-letter deviants were infrequently presented in a train of standard letters, either in isolation or simultaneously with speech sounds. Although previous findings showed that letters systematically modulate speech sound processing (reflected by auditory MMN amplitude modulation, the reverse does not seem to hold: our results did not show evidence for an automatic influence of speech sounds on letter processing (no visual MMN amplitude modulation. This apparent asymmetric recruitment of low level sensory cortices during letter - speech sound processing, contrasts with the symmetric involvement of these cortices in audiovisual speech processing, and is possibly due to the arbitrary nature of the link between letters and speech sounds.

  10. Exploring the Role of Low Level Visual Processing in Letter–Speech Sound Integration: A Visual MMN Study

    Science.gov (United States)

    Froyen, Dries; van Atteveldt, Nienke; Blomert, Leo

    2009-01-01

    In contrast with for example audiovisual speech, the relation between visual and auditory properties of letters and speech sounds is artificial and learned only by explicit instruction. The arbitrariness of the audiovisual link together with the widespread usage of letter–speech sound pairs in alphabetic languages makes those audiovisual objects a unique subject for crossmodal research. Brain imaging evidence has indicated that heteromodal areas in superior temporal, as well as modality-specific auditory cortex are involved in letter–speech sound processing. The role of low level visual areas, however, remains unclear. In this study the visual counterpart of the auditory mismatch negativity (MMN) is used to investigate the influences of speech sounds on letter processing. Letter and non-letter deviants were infrequently presented in a train of standard letters, either in isolation or simultaneously with speech sounds. Although previous findings showed that letters systematically modulate speech sound processing (reflected by auditory MMN amplitude modulation), the reverse does not seem to hold: our results did not show evidence for an automatic influence of speech sounds on letter processing (no visual MMN amplitude modulation). This apparent asymmetric recruitment of low level sensory cortices during letter–speech sound processing, contrasts with the symmetric involvement of these cortices in audiovisual speech processing, and is possibly due to the arbitrary nature of the link between letters and speech sounds. PMID:20428501

  11. Exploring the Role of Low Level Visual Processing in Letter-Speech Sound Integration: A Visual MMN Study.

    Science.gov (United States)

    Froyen, Dries; van Atteveldt, Nienke; Blomert, Leo

    2010-01-01

    In contrast with for example audiovisual speech, the relation between visual and auditory properties of letters and speech sounds is artificial and learned only by explicit instruction. The arbitrariness of the audiovisual link together with the widespread usage of letter-speech sound pairs in alphabetic languages makes those audiovisual objects a unique subject for crossmodal research. Brain imaging evidence has indicated that heteromodal areas in superior temporal, as well as modality-specific auditory cortex are involved in letter-speech sound processing. The role of low level visual areas, however, remains unclear. In this study the visual counterpart of the auditory mismatch negativity (MMN) is used to investigate the influences of speech sounds on letter processing. Letter and non-letter deviants were infrequently presented in a train of standard letters, either in isolation or simultaneously with speech sounds. Although previous findings showed that letters systematically modulate speech sound processing (reflected by auditory MMN amplitude modulation), the reverse does not seem to hold: our results did not show evidence for an automatic influence of speech sounds on letter processing (no visual MMN amplitude modulation). This apparent asymmetric recruitment of low level sensory cortices during letter-speech sound processing, contrasts with the symmetric involvement of these cortices in audiovisual speech processing, and is possibly due to the arbitrary nature of the link between letters and speech sounds.

  12. Automatic extraction of syntactic patterns for dependency parsing in noun phrase chunks

    Directory of Open Access Journals (Sweden)

    Mihaela Colhon

    2014-05-01

    Full Text Available In this article we present a method for automatic extraction of syntactic patterns that are used to develop a dependency parsing method. The patterns have been extracted from a corpus automatically annotated for tokens, sentences’ borders, parts of speech and noun phrases, and manually annotated for dependency relations between words. The evaluation shows promising results in the case of an order-free language.

  13. On Optimal Linear Filtering of Speech for Near-End Listening Enhancement

    DEFF Research Database (Denmark)

    Taal, Cees H.; Jensen, Jesper; Leijon, Arne

    2013-01-01

    In this letter the focus is on linear filtering of speech before degradation due to additive background noise. The goal is to design the filter such that the speech intelligibility index (SII) is maximized when the speech is played back in a known noisy environment. Moreover, a power constraint...... suboptimal. In this work we propose a nonlinear approximation of the SII which is accurate for all SNRs. Experiments show large intelligibility improvements with the proposed method over the unprocessed noisy speech and better performance than one state-of-the art method....

  14. Intelligibility of speech of children with speech and sound disorders

    OpenAIRE

    Ivetac, Tina

    2014-01-01

    The purpose of this study is to examine speech intelligibility of children with primary speech and sound disorders aged 3 to 6 years in everyday life. The research problem is based on the degree to which parents or guardians, immediate family members (sister, brother, grandparents), extended family members (aunt, uncle, cousin), child's friends, other acquaintances, child's teachers and strangers understand the speech of children with speech sound disorders. We examined whether the level ...

  15. Variable-mass Thermodynamics Calculation Model for Gas-operated Automatic Weapon%Variable-mass Thermodynamics Calculation Model for Gas-operated Automatic Weapon

    Institute of Scientific and Technical Information of China (English)

    陈建彬; 吕小强

    2011-01-01

    Aiming at the fact that the energy and mass exchange phenomena exist between barrel and gas-operated device of the automatic weapon, for describing its interior ballistics and dynamic characteristics of the gas-operated device accurately, a new variable-mass thermodynamics model is built. It is used to calculate the automatic mechanism velocity of a certain automatic weapon, the calculation results coincide with the experimental results better, and thus the model is validated. The influences of structure parameters on gas-operated device' s dynamic characteristics are discussed. It shows that the model is valuable for design and accurate performance prediction of gas-operated automatic weapon.

  16. Superior Speech Acquisition and Robust Automatic Speech Recognition for Integrated Spacesuit Audio Systems Project

    Data.gov (United States)

    National Aeronautics and Space Administration — Astronauts suffer from poor dexterity of their hands due to the clumsy spacesuit gloves during Extravehicular Activity (EVA) operations and NASA has had a widely...

  17. Superior Speech Acquisition and Robust Automatic Speech Recognition for Integrated Spacesuit Audio Systems Project

    Data.gov (United States)

    National Aeronautics and Space Administration — Astronauts suffer from poor dexterity of their hands due to the clumsy spacesuit gloves during Extravehicular Activity (EVA) operations and NASA has had a widely...

  18. Differential Diagnosis of Severe Speech Disorders Using Speech Gestures

    Science.gov (United States)

    Bahr, Ruth Huntley

    2005-01-01

    The differentiation of childhood apraxia of speech from severe phonological disorder is a common clinical problem. This article reports on an attempt to describe speech errors in children with childhood apraxia of speech on the basis of gesture use and acoustic analyses of articulatory gestures. The focus was on the movement of articulators and…

  19. What does motor efference copy represent? Evidence from speech production.

    Science.gov (United States)

    Niziolek, Caroline A; Nagarajan, Srikantan S; Houde, John F

    2013-10-09

    How precisely does the brain predict the sensory consequences of our actions? Efference copy is thought to reflect the predicted sensation of self-produced motor acts, such as the auditory feedback heard while speaking. Here, we use magnetoencephalographic imaging (MEG-I) in human speakers to demonstrate that efference copy prediction does not track movement variability across repetitions of the same motor task. Specifically, spoken vowels were less accurately predicted when they were less similar to a speaker's median production, even though the prediction is thought to be based on the very motor commands that generate each vowel. Auditory cortical responses to less prototypical speech productions were less suppressed, resembling responses to speech errors, and were correlated with later corrective movement, suggesting that the suppression may be functionally significant for error correction. The failure of the motor system to accurately predict less prototypical speech productions suggests that the efferent-driven suppression does not reflect a sensory prediction, but a sensory goal.

  20. Heartbeat perception in social anxiety before and during speech anticipation.

    Science.gov (United States)

    Stevens, Stephan; Gerlach, Alexander L; Cludius, Barbara; Silkens, Anna; Craske, Michelle G; Hermann, Christiane

    2011-02-01

    According to current cognitive models of social phobia, individuals with social anxiety create a distorted image of themselves in social situations, relying, at least partially, on interoceptive cues. We investigated differences in heartbeat perception as a proxy of interoception in 48 individuals high and low in social anxiety at baseline and while anticipating a public speech. Results revealed lower error scores for high fearful participants both at baseline and during speech anticipation. Speech anticipation improved heartbeat perception in both groups only marginally. Eight of nine accurate perceivers as determined using a criterion of maximum difference between actual and counted beats were high socially anxious. Higher interoceptive accuracy might increase the risk of misinterpreting physical symptoms as visible signs of anxiety which then trigger negative evaluation by others. Treatment should take into account that in socially anxious individuals perceived physical arousal is likely to be accurate rather than false alarm. Copyright © 2010 Elsevier Ltd. All rights reserved.

  1. Tackling the complexity in speech

    DEFF Research Database (Denmark)

    section includes four carefully selected chapters. They deal with facets of speech production, speech acoustics, and/or speech perception or recognition, place them in an integrated phonetic-phonological perspective, and relate them in more or less explicit ways to aspects of speech technology. Therefore......, we hope that this volume can help speech scientists with traditional training in phonetics and phonology to keep up with the latest developments in speech technology. In the opposite direction, speech researchers starting from a technological perspective will hopefully get inspired by reading about...... the questions, phenomena, and communicative functions that are currently addressed in phonetics and phonology. Either way, the future of speech research lies in international, interdisciplinary collaborations, and our volume is meant to reflect and facilitate such collaborations...

  2. Denial Denied: Freedom of Speech

    Directory of Open Access Journals (Sweden)

    Glen Newey

    2009-12-01

    Full Text Available Free speech is a widely held principle. This is in some ways surprising, since formal and informal censorship of speech is widespread, and rather different issues seem to arise depending on whether the censorship concerns who speaks, what content is spoken or how it is spoken. I argue that despite these facts, free speech can indeed be seen as a unitary principle. On my analysis, the core of the free speech principle is the denial of the denial of speech, whether to a speaker, to a proposition, or to a mode of expression. Underlying free speech is the principle of freedom of association, according to which speech is both a precondition of future association (e.g. as a medium for negotiation and a mode of association in its own right. I conclude by applying this account briefly to two contentious issues: hate speech and pornography.

  3. Tackling the complexity in speech

    DEFF Research Database (Denmark)

    section includes four carefully selected chapters. They deal with facets of speech production, speech acoustics, and/or speech perception or recognition, place them in an integrated phonetic-phonological perspective, and relate them in more or less explicit ways to aspects of speech technology. Therefore......, we hope that this volume can help speech scientists with traditional training in phonetics and phonology to keep up with the latest developments in speech technology. In the opposite direction, speech researchers starting from a technological perspective will hopefully get inspired by reading about...... the questions, phenomena, and communicative functions that are currently addressed in phonetics and phonology. Either way, the future of speech research lies in international, interdisciplinary collaborations, and our volume is meant to reflect and facilitate such collaborations...

  4. Speech spectrogram expert

    Energy Technology Data Exchange (ETDEWEB)

    Johannsen, J.; Macallister, J.; Michalek, T.; Ross, S.

    1983-01-01

    Various authors have pointed out that humans can become quite adept at deriving phonetic transcriptions from speech spectrograms (as good as 90percent accuracy at the phoneme level). The authors describe an expert system which attempts to simulate this performance. The speech spectrogram expert (spex) is actually a society made up of three experts: a 2-dimensional vision expert, an acoustic-phonetic expert, and a phonetics expert. The visual reasoning expert finds important visual features of the spectrogram. The acoustic-phonetic expert reasons about how visual features relates to phonemes, and about how phonemes change visually in different contexts. The phonetics expert reasons about allowable phoneme sequences and transformations, and deduces an english spelling for phoneme strings. The speech spectrogram expert is highly interactive, allowing users to investigate hypotheses and edit rules. 10 references.

  5. RECOGNISING SPEECH ACTS

    Directory of Open Access Journals (Sweden)

    Phyllis Kaburise

    2012-09-01

    Full Text Available Speech Act Theory (SAT, a theory in pragmatics, is an attempt to describe what happens during linguistic interactions. Inherent within SAT is the idea that language forms and intentions are relatively formulaic and that there is a direct correspondence between sentence forms (for example, in terms of structure and lexicon and the function or meaning of an utterance. The contention offered in this paper is that when such a correspondence does not exist, as in indirect speech utterances, this creates challenges for English second language speakers and may result in miscommunication. This arises because indirect speech acts allow speakers to employ various pragmatic devices such as inference, implicature, presuppositions and context clues to transmit their messages. Such devices, operating within the non-literal level of language competence, may pose challenges for ESL learners.

  6. Protection limits on free speech

    Institute of Scientific and Technical Information of China (English)

    李敏

    2014-01-01

    Freedom of speech is one of the basic rights of citizens should receive broad protection, but in the real context of China under what kind of speech can be protected and be restricted, how to grasp between state power and free speech limit is a question worth considering. People tend to ignore the freedom of speech and its function, so that some of the rhetoric cannot be demonstrated in the open debates.

  7. Child vocalization composition as discriminant information for automatic autism detection.

    Science.gov (United States)

    Xu, Dongxin; Gilkerson, Jill; Richards, Jeffrey; Yapanel, Umit; Gray, Sharmi

    2009-01-01

    Early identification is crucial for young children with autism to access early intervention. The existing screens require either a parent-report questionnaire and/or direct observation by a trained practitioner. Although an automatic tool would benefit parents, clinicians and children, there is no automatic screening tool in clinical use. This study reports a fully automatic mechanism for autism detection/screening for young children. This is a direct extension of the LENA (Language ENvironment Analysis) system, which utilizes speech signal processing technology to analyze and monitor a child's natural language environment and the vocalizations/speech of the child. It is discovered that child vocalization composition contains rich discriminant information for autism detection. By applying pattern recognition and machine learning approaches to child vocalization composition data, accuracy rates of 85% to 90% in cross-validation tests for autism detection have been achieved at the equal-error-rate (EER) point on a data set with 34 children with autism, 30 language delayed children and 76 typically developing children. Due to its easy and automatic procedure, it is believed that this new tool can serve a significant role in childhood autism screening, especially in regards to population-based or universal screening.

  8. A Speech Intelligibility Index-based approach to predict the speech reception threshold for sentences in fluctuating noise for normal-hearing listeners

    Science.gov (United States)

    Rhebergen, Koenraad S.; Versfeld, Niek J.

    2005-04-01

    The SII model in its present form (ANSI S3.5-1997, American National Standards Institute, New York) can accurately describe intelligibility for speech in stationary noise but fails to do so for nonstationary noise maskers. Here, an extension to the SII model is proposed with the aim to predict the speech intelligibility in both stationary and fluctuating noise. The basic principle of the present approach is that both speech and noise signal are partitioned into small time frames. Within each time frame the conventional SII is determined, yielding the speech information available to the listener at that time frame. Next, the SII values of these time frames are averaged, resulting in the SII for that particular condition. Using speech reception threshold (SRT) data from the literature, the extension to the present SII model can give a good account for SRTs in stationary noise, fluctuating speech noise, interrupted noise, and multiple-talker noise. The predictions for sinusoidally intensity modulated (SIM) noise and real speech or speech-like maskers are better than with the original SII model, but are still not accurate. For the latter type of maskers, informational masking may play a role. .

  9. Groundwater recharge: Accurately representing evapotranspiration

    CSIR Research Space (South Africa)

    Bugan, Richard DH

    2011-09-01

    Full Text Available Groundwater recharge is the basis for accurate estimation of groundwater resources, for determining the modes of water allocation and groundwater resource susceptibility to climate change. Accurate estimations of groundwater recharge with models...

  10. The University and Free Speech

    OpenAIRE

    Grcic, Joseph

    2014-01-01

    Free speech is a necessary condition for the growth of knowledge and the implementation of real and rational democracy. Educational institutions play a central role in socializing individuals to function within their society. Academic freedom is the right to free speech in the context of the university and tenure, properly interpreted, is a necessary component of protecting academic freedom and free speech.

  11. Designing speech for a recipient

    DEFF Research Database (Denmark)

    Fischer, Kerstin

    is investigated on three candidates for so-called ‘simplified registers’: speech to children (also called motherese or baby talk), speech to foreigners (also called foreigner talk) and speech to robots. The volume integrates research from various disciplines, such as psychology, sociolinguistics...

  12. ADMINISTRATIVE GUIDE IN SPEECH CORRECTION.

    Science.gov (United States)

    HEALEY, WILLIAM C.

    WRITTEN PRIMARILY FOR SCHOOL SUPERINTENDENTS, PRINCIPALS, SPEECH CLINICIANS, AND SUPERVISORS, THIS GUIDE OUTLINES THE MECHANICS OF ORGANIZING AND CONDUCTING SPEECH CORRECTION ACTIVITIES IN THE PUBLIC SCHOOLS. IT INCLUDES THE REQUIREMENTS FOR CERTIFICATION OF A SPEECH CLINICIAN IN MISSOURI AND DESCRIBES ESSENTIAL STEPS FOR THE DEVELOPMENT OF A…

  13. Speech Sound Processing Deficits and Training-Induced Neural Plasticity in Rats with Dyslexia Gene Knockdown

    Science.gov (United States)

    Centanni, Tracy M.; Chen, Fuyi; Booker, Anne M.; Engineer, Crystal T.; Sloan, Andrew M.; Rennaker, Robert L.; LoTurco, Joseph J.; Kilgard, Michael P.

    2014-01-01

    In utero RNAi of the dyslexia-associated gene Kiaa0319 in rats (KIA-) degrades cortical responses to speech sounds and increases trial-by-trial variability in onset latency. We tested the hypothesis that KIA- rats would be impaired at speech sound discrimination. KIA- rats needed twice as much training in quiet conditions to perform at control levels and remained impaired at several speech tasks. Focused training using truncated speech sounds was able to normalize speech discrimination in quiet and background noise conditions. Training also normalized trial-by-trial neural variability and temporal phase locking. Cortical activity from speech trained KIA- rats was sufficient to accurately discriminate between similar consonant sounds. These results provide the first direct evidence that assumed reduced expression of the dyslexia-associated gene KIAA0319 can cause phoneme processing impairments similar to those seen in dyslexia and that intensive behavioral therapy can eliminate these impairments. PMID:24871331

  14. Using synchronous speech to facilitate acquisition of English rhythm

    Directory of Open Access Journals (Sweden)

    Elina Banzina

    2014-11-01

    Full Text Available While appropriate stress and rhythm is of importance for any speaker’s intelligibility, such properties are critical for international teaching assistants (ITA, who deliver new and complex information to native speaker audiences. Given the limited time available for ITA instruction and the need for a time-efficient rhythm teaching method, this article reports findings of a small-scale feasibility study that tested the effectiveness of a synchronous speech component introduced into conventional rhythm instruction. Synchronous speech involves teacher and learner speaking in unison continuously, which allows the L2 learner to learn rhythm implicitly and uninterruptedly, and provides rich auditory-visual input, ample motor speech practice and real-time feedback, thereby automatizing rhythm patterns. In a 6-week-long pre-post experimental feasibility study, blind listeners evaluated pretraining and post-training recordings of ITA-produced speech. Data revealed a trend towards more improvement in L2 rhythm working with the synchronous speech technique. Results establish feasibility in both instruction and research

  15. Speaker Independent Speech Recognition of Isolated Words in Room Environment

    Directory of Open Access Journals (Sweden)

    M. Tabassum

    2017-04-01

    Full Text Available In this paper, the process of recognizing some important words from a large set of vocabularies is demonstrated based on the combination of dynamic and instantaneous features of the speech spectrum. There are many procedures to recognize a word by its vowel but this paper presents the highly effective speaker independent speech recognition in a typical room environment noise cases. To distinguish several isolated words of the sound of different vowels, two important features known as Pitch and Formant are extracted from the speech signals collected from a number of random male and female speakers. The extracted features are then analysed for the particular utterances to train the system. The specific objectives of this work are to implement an isolated and automatic word speech recognizer, which is capable of recognizing as well as responding to speech and an audio interfacing system between human and machine for an effective human-machine interaction. The whole system has been tested using computer codes and the result was satisfactory in almost 90% of cases. However, system might get confused by similar vowel sounds sometimes.

  16. Speech intelligibility measure for vocal control of an automaton

    Science.gov (United States)

    Naranjo, Michel; Tsirigotis, Georgios

    1998-07-01

    The acceleration of investigations in Speech Recognition allows to augur, in the next future, a wide establishment of Vocal Control Systems in the production units. The communication between a human and a machine necessitates technical devices that emit, or are submitted to important noise perturbations. The vocal interface introduces a new control problem of a deterministic automaton using uncertain information. The purpose is to place exactly the automaton in a final state, ordered by voice, from an unknown initial state. The whole Speech Processing procedure, presented in this paper, has for input the temporal speech signal of a word and for output a recognised word labelled with an intelligibility index given by the recognition quality. In the first part, we present the essential psychoacoustic concepts for the automatic calculation of the loudness of a speech signal. The architecture of a Time Delay Neural Network is presented in second part where we also give the results of the recognition. The theory of the fuzzy subset, in third part, allows to extract at the same time a recognised word and its intelligibility index. In the fourth part, an Anticipatory System models the control of a Sequential Machine. A prediction phase and an updating one appear which involve data coming from the information system. A Bayesian decision strategy is used and the criterion is a weighted sum of criteria defined from information, minimum path functions and speech intelligibility measure.

  17. SPEECH DISORDERS ENCOUNTERED DURING SPEECH THERAPY AND THERAPY TECHNIQUES

    OpenAIRE

    2013-01-01

    Speech which is a physical and mental process, agreed signs and sounds to create a sense of mind to the message that change . Process to identify the sounds of speech it is essential to know the structure and function of various organs which allows to happen the conversation. Speech is a physical and mental process so many factors can lead to speech disorders. Speech disorder can be about language acquisitions as well as it can be caused medical and psychological many factors. Disordered sp...

  18. Freedom of Speech and Hate Speech: an analysis of possible limits for freedom of speech

    National Research Council Canada - National Science Library

    Riva Sobrado de Freitas; Matheus Felipe de Castro

    2013-01-01

      In a view to determining the outlines of the Freedom of Speech and to specify its contents, we face hate speech as an offensive and repulsive manifestation, particularly directed to minority groups...

  19. The self-advantage in visual speech processing enhances audiovisual speech recognition in noise.

    Science.gov (United States)

    Tye-Murray, Nancy; Spehar, Brent P; Myerson, Joel; Hale, Sandra; Sommers, Mitchell S

    2015-08-01

    Individuals lip read themselves more accurately than they lip read others when only the visual speech signal is available (Tye-Murray et al., Psychonomic Bulletin & Review, 20, 115-119, 2013). This self-advantage for vision-only speech recognition is consistent with the common-coding hypothesis (Prinz, European Journal of Cognitive Psychology, 9, 129-154, 1997), which posits (1) that observing an action activates the same motor plan representation as actually performing that action and (2) that observing one's own actions activates motor plan representations more than the others' actions because of greater congruity between percepts and corresponding motor plans. The present study extends this line of research to audiovisual speech recognition by examining whether there is a self-advantage when the visual signal is added to the auditory signal under poor listening conditions. Participants were assigned to sub-groups for round-robin testing in which each participant was paired with every member of their subgroup, including themselves, serving as both talker and listener/observer. On average, the benefit participants obtained from the visual signal when they were the talker was greater than when the talker was someone else and also was greater than the benefit others obtained from observing as well as listening to them. Moreover, the self-advantage in audiovisual speech recognition was significant after statistically controlling for individual differences in both participants' ability to benefit from a visual speech signal and the extent to which their own visual speech signal benefited others. These findings are consistent with our previous finding of a self-advantage in lip reading and with the hypothesis of a common code for action perception and motor plan representation.

  20. Word Automaticity of Tree Automatic Scattered Linear Orderings Is Decidable

    CERN Document Server

    Huschenbett, Martin

    2012-01-01

    A tree automatic structure is a structure whose domain can be encoded by a regular tree language such that each relation is recognisable by a finite automaton processing tuples of trees synchronously. Words can be regarded as specific simple trees and a structure is word automatic if it is encodable using only these trees. The question naturally arises whether a given tree automatic structure is already word automatic. We prove that this problem is decidable for tree automatic scattered linear orderings. Moreover, we show that in case of a positive answer a word automatic presentation is computable from the tree automatic presentation.

  1. Functional analysis screening for problem behavior maintained by automatic reinforcement.

    Science.gov (United States)

    Querim, Angie C; Iwata, Brian A; Roscoe, Eileen M; Schlichenmeyer, Kevin J; Ortega, Javier Virués; Hurl, Kylee E

    2013-01-01

    A common finding in previous research is that problem behavior maintained by automatic reinforcement continues to occur in the alone condition of a functional analysis (FA), whereas behavior maintained by social reinforcement typically is extinguished. Thus, the alone condition may represent an efficient screening procedure when maintenance by automatic reinforcement is suspected. We conducted a series of 5-min alone (or no-interaction) probes for 30 cases of problem behavior and compared initial predictions of maintenance or extinction to outcomes obtained in subsequent FAs. Results indicated that data from the screening procedure accurately predicted that problem behavior was maintained by automatic reinforcement in 21 of 22 cases and by social reinforcement in 7 of 8 cases. Thus, results of the screening accurately predicted the function of problem behavior (social vs. automatic reinforcement) in 28 of 30 cases.

  2. Global Freedom of Speech

    DEFF Research Database (Denmark)

    Binderup, Lars Grassme

    2007-01-01

    , as opposed to a legal norm, that curbs exercises of the right to free speech that offend the feelings or beliefs of members from other cultural groups. The paper rejects the suggestion that acceptance of such a norm is in line with liberal egalitarian thinking. Following a review of the classical liberal...

  3. Speech and Hearing Therapy.

    Science.gov (United States)

    Sakata, Reiko; Sakata, Robert

    1978-01-01

    In the public school, the speech and hearing therapist attempts to foster child growth and development through the provision of services basic to awareness of self and others, management of personal and social interactions, and development of strategies for coping with the handicap. (MM)

  4. Perceptual learning in speech

    NARCIS (Netherlands)

    Norris, D.; McQueen, J.M.; Cutler, A.

    2003-01-01

    This study demonstrates that listeners use lexical knowledge in perceptual learning of speech sounds. Dutch listeners first made lexical decisions on Dutch words and nonwords. The final fricative of 20 critical words had been replaced by an ambiguous sound, between [f] and [s]. One group of listener

  5. Speech and Language Delay

    Science.gov (United States)

    ... home affect my child’s language and speech?The brain has to work harder to interpret and use 2 languages, so it may take longer for children to start using either one or both of the languages they’re learning. It’s not unusual for a bilingual child to ...

  6. Mandarin Visual Speech Information

    Science.gov (United States)

    Chen, Trevor H.

    2010-01-01

    While the auditory-only aspects of Mandarin speech are heavily-researched and well-known in the field, this dissertation addresses its lesser-known aspects: The visual and audio-visual perception of Mandarin segmental information and lexical-tone information. Chapter II of this dissertation focuses on the audiovisual perception of Mandarin…

  7. Speech After Banquet

    Science.gov (United States)

    Yang, Chen Ning

    2013-05-01

    I am usually not so short of words, but the previous speeches have rendered me really speechless. I have known and admired the eloquence of Freeman Dyson, but I did not know that there is a hidden eloquence in my colleague George Sterman...

  8. Speech disfluency in centenarians.

    Science.gov (United States)

    Searl, Jeffrey P; Gabel, Rodney M; Fulks, J Steven

    2002-01-01

    Other than a single case presentation of a 105-year-old female, no other studies have addressed the speech fluency characteristics of centenarians. The purpose of this study was to provide descriptive information on the fluency characteristics of speakers between the ages of 100-103 years. Conversational speech samples from seven speakers were evaluated for the frequency and types of disfluencies and speech rate. The centenarian speakers had a disfluency rate similar to that reported for 70-, 80-, and early 90-year-olds. The types of disfluencies observed also were similar to those reported for younger elderly speakers (primarily whole word/phrase, or formulative fluency breaks). Finally, the speech rate data for the current group of speakers supports prior literature reports of a slower rate with advancing age, but extends the finding to centenarians. As a result of this activity, participants will be able to: (1) describe the frequency of disfluency breaks and the types of disfluencies exhibited by centenarian speakers, (2) describe the mean and range of speaking rates in centenarians, and (3) compare the present findings for centenarians to the fluency and speaking rate characteristics reported in the literature.

  9. Mandarin Visual Speech Information

    Science.gov (United States)

    Chen, Trevor H.

    2010-01-01

    While the auditory-only aspects of Mandarin speech are heavily-researched and well-known in the field, this dissertation addresses its lesser-known aspects: The visual and audio-visual perception of Mandarin segmental information and lexical-tone information. Chapter II of this dissertation focuses on the audiovisual perception of Mandarin…

  10. The Commercial Speech Doctrine.

    Science.gov (United States)

    Luebke, Barbara F.

    In its 1942 ruling in the "Valentine vs. Christensen" case, the Supreme Court established the doctrine that commercial speech is not protected by the First Amendment. In 1975, in the "Bigelow vs. Virginia" case, the Supreme Court took a decisive step toward abrogating that doctrine, by ruling that advertising is not stripped of…

  11. Parameter estimation of labial movements in speech production: implications for speech motor control.

    Science.gov (United States)

    Hinton, V A; Robey, R R

    1995-08-01

    Central to theories of speech motor control are estimates on magnitudes of lip activity expressed in terms of central tendency, variability, and interrelatedness. In fact, the tenability of each of two competing theories of motor control for speech production rests solely on the observation of the predicted direction of the correlation coefficient (one positive and one negative) that indexes the relationship of concurrent lip activity. Each theory, however, predicts a relationship that is the complete opposite of the relationship predicted by the other. That is, one theory proposes that the labial system functions on the basis of complementary variation, whereas the other assumes positive covariation, or complementary modulation. In apparent contradiction, each prediction has been observed under laboratory conditions. The explanation for this apparent contradiction resides in the small sample sizes upon which each estimate was based. The minimum number of observations that are necessary to achieve accurate estimates of lip displacement parameters has remained unclear. This paper addresses three fundamental questions: (a) how many observations of on-task behavior are necessary to accurately estimate mean and variance values for the magnitude of upper lip displacement in a speech production experiment?, (b) what is the analogous number of observations for estimating the same values of lower lip displacement (together with the mandible) in the same context?, and (c) how many observations are necessary to accurately estimate the correlation coefficient indexing the relationship of lip displacements during the production of speech? Answers to these questions are accomplished through a review of estimator properties, a Monte Carlo computer simulation, and through laboratory observations.(ABSTRACT TRUNCATED AT 250 WORDS)

  12. Semi-automatic object tracking in video sequences

    OpenAIRE

    Lecumberry, Federico; Pardo, Álvaro

    2005-01-01

    A method is presented for semi-automatic object tracking in video sequences using multiple features and a method for probabilistic relaxation to improve the tracking results producing smooth and accurate tracked borders. Starting from a given initial position of the object in the first frame the proposed method automatically tracks the object in the sequence modelling the a posteriori probabilities of a set of features such as color, position and motion, depth, etc. Facultad de Informática

  13. Direct and indirect measures of speech articulator motions using low power EM sensors

    Energy Technology Data Exchange (ETDEWEB)

    Barnes, T; Burnett, G; Gable, T; Holzrichter, J F; Ng, L

    1999-05-12

    Low power Electromagnetic (EM) Wave sensors can measure general properties of human speech articulator motions, as speech is produced. See Holzrichter, Burnett, Ng, and Lea, J.Acoust.Soc.Am. 103 (1) 622 (1998). Experiments have demonstrated extremely accurate pitch measurements (< 1 Hz per pitch cycle) and accurate onset of voiced speech. Recent measurements of pressure-induced tracheal motions enable very good spectra and amplitude estimates of a voiced excitation function. The use of the measured excitation functions and pitch synchronous processing enable the determination of each pitch cycle of an accurate transfer function and, indirectly, of the corresponding articulator motions. In addition, direct measurements have been made of EM wave reflections from articulator interfaces, including jaw, tongue, and palate, simultaneously with acoustic and glottal open/close signals. While several types of EM sensors are suitable for speech articulator measurements, the homodyne sensor has been found to provide good spatial and temporal resolution for several applications.

  14. Preferred real-ear insertion gain on a commercial hearing aid at different speech and noise levels.

    Science.gov (United States)

    Kuk, F K; Harper, T; Doubek, K

    1994-03-01

    In the present study, we measured preferred real-ear insertion gain (REIG) under different levels of speech and noise to assess whether current automatic gain control (AGC) and automatic signal processing (ASP) hearing aids are operating optimally. Preferred REIG for optimal speech clarity was determined under seven speech and noise conditions. In four conditions, speech (discourse passages) was varied from 55 dB SPL to 85 dB SPL in 10-dB steps at a fixed signal-to-noise ratio (S/N) of +5. In the remaining conditions, speech was fixed at 65 dB SPL while the noise level was varied in 5-dB steps to yield S/Ns from +10 to -5. The results showed that subjects selected less gain as speech or noise levels were increased. In general, less overall gain was selected as speech level was increased, and less overall gain, especially in the low-frequency region, was selected as the S/N ratio became progressively poorer. These results are discussed in relation to how hearing aids with adaptive frequency/gain responses should respond to varying input levels to achieve optimal clarity of speech.

  15. Conversation, speech acts, and memory.

    Science.gov (United States)

    Holtgraves, Thomas

    2008-03-01

    Speakers frequently have specific intentions that they want others to recognize (Grice, 1957). These specific intentions can be viewed as speech acts (Searle, 1969), and I argue that they play a role in long-term memory for conversation utterances. Five experiments were conducted to examine this idea. Participants in all experiments read scenarios ending with either a target utterance that performed a specific speech act (brag, beg, etc.) or a carefully matched control. Participants were more likely to falsely recall and recognize speech act verbs after having read the speech act version than after having read the control version, and the speech act verbs served as better recall cues for the speech act utterances than for the controls. Experiment 5 documented individual differences in the encoding of speech act verbs. The results suggest that people recognize and retain the actions that people perform with their utterances and that this is one of the organizing principles of conversation memory.

  16. Metaheuristic applications to speech enhancement

    CERN Document Server

    Kunche, Prajna

    2016-01-01

    This book serves as a basic reference for those interested in the application of metaheuristics to speech enhancement. The major goal of the book is to explain the basic concepts of optimization methods and their use in heuristic optimization in speech enhancement to scientists, practicing engineers, and academic researchers in speech processing. The authors discuss why it has been a challenging problem for researchers to develop new enhancement algorithms that aid in the quality and intelligibility of degraded speech. They present powerful optimization methods to speech enhancement that can help to solve the noise reduction problems. Readers will be able to understand the fundamentals of speech processing as well as the optimization techniques, how the speech enhancement algorithms are implemented by utilizing optimization methods, and will be given the tools to develop new algorithms. The authors also provide a comprehensive literature survey regarding the topic.

  17. Towards Contactless Silent Speech Recognition Based on Detection of Active and Visible Articulators Using IR-UWB Radar.

    Science.gov (United States)

    Shin, Young Hoon; Seo, Jiwon

    2016-10-29

    People with hearing or speaking disabilities are deprived of the benefits of conventional speech recognition technology because it is based on acoustic signals. Recent research has focused on silent speech recognition systems that are based on the motions of a speaker's vocal tract and articulators. Because most silent speech recognition systems use contact sensors that are very inconvenient to users or optical systems that are susceptible to environmental interference, a contactless and robust solution is hence required. Toward this objective, this paper presents a series of signal processing algorithms for a contactless silent speech recognition system using an impulse radio ultra-wide band (IR-UWB) radar. The IR-UWB radar is used to remotely and wirelessly detect motions of the lips and jaw. In order to extract the necessary features of lip and jaw motions from the received radar signals, we propose a feature extraction algorithm. The proposed algorithm noticeably improved speech recognition performance compared to the existing algorithm during our word recognition test with five speakers. We also propose a speech activity detection algorithm to automatically select speech segments from continuous input signals. Thus, speech recognition processing is performed only when speech segments are detected. Our testbed consists of commercial off-the-shelf radar products, and the proposed algorithms are readily applicable without designing specialized radar hardware for silent speech processing.

  18. Towards Contactless Silent Speech Recognition Based on Detection of Active and Visible Articulators Using IR-UWB Radar

    Directory of Open Access Journals (Sweden)

    Young Hoon Shin

    2016-10-01

    Full Text Available People with hearing or speaking disabilities are deprived of the benefits of conventional speech recognition technology because it is based on acoustic signals. Recent research has focused on silent speech recognition systems that are based on the motions of a speaker’s vocal tract and articulators. Because most silent speech recognition systems use contact sensors that are very inconvenient to users or optical systems that are susceptible to environmental interference, a contactless and robust solution is hence required. Toward this objective, this paper presents a series of signal processing algorithms for a contactless silent speech recognition system using an impulse radio ultra-wide band (IR-UWB radar. The IR-UWB radar is used to remotely and wirelessly detect motions of the lips and jaw. In order to extract the necessary features of lip and jaw motions from the received radar signals, we propose a feature extraction algorithm. The proposed algorithm noticeably improved speech recognition performance compared to the existing algorithm during our word recognition test with five speakers. We also propose a speech activity detection algorithm to automatically select speech segments from continuous input signals. Thus, speech recognition processing is performed only when speech segments are detected. Our testbed consists of commercial off-the-shelf radar products, and the proposed algorithms are readily applicable without designing specialized radar hardware for silent speech processing.

  19. Compliance, quality of life and quantitative voice quality aspects of hands-free speech.

    NARCIS (Netherlands)

    Coul, B.M.R. op de; Ackerstaff, A.H.; As-Brooks, C.J. van; Hoogen, F.J.A. van den; Meeuwis, C.A.; Manni, J.J.; Hilgers, F.J.M.

    2005-01-01

    CONCLUSIONS: With the use of a new automatic stoma valve (ASV) it appears possible to rehabilitate patients who have previously been unsuccessful in acquiring hands-free speech. As well as making daily ASV use possible for an additional group of patients, this new device was also appreciated by many

  20. The role of automated speech and audio analysis in semantic multimedia annotation

    NARCIS (Netherlands)

    de Jong, Franciska M.G.; Ordelman, Roeland J.F.; van Hessen, Adrianus J.

    This paper overviews the various ways in which automatic speech and audio analysis can be deployed to enhance the semantic annotation of multimedia content, and as a consequence to improve the effectiveness of conceptual access tools. A number of techniques will be presented, including the alignment

  1. Speech-based recognition of self-reported and observed emotion in a dimensional space

    NARCIS (Netherlands)

    Truong, Khiet P.; Leeuwen, van David A.; Jong, de Franciska M.G.

    2012-01-01

    The differences between self-reported and observed emotion have only marginally been investigated in the context of speech-based automatic emotion recognition. We address this issue by comparing self-reported emotion ratings to observed emotion ratings and look at how differences between these two t

  2. Combining Knowledge Sources to Reorder N-Best Speech Hypothesis Lists

    CERN Document Server

    Rayner, M; Digalakis, V; Price, P; Rayner, Manny; Carter, David; Digalakis, Vassilios; Price, Patti

    1994-01-01

    A simple and general method is described that can combine different knowledge sources to reorder N-best lists of hypotheses produced by a speech recognizer. The method is automatically trainable, acquiring information from both positive and negative examples. Experiments are described in which it was tested on a 1000-utterance sample of unseen ATIS data.

  3. Measuring Prevalence of Other-Oriented Transactive Contributions Using an Automated Measure of Speech Style Accommodation

    Science.gov (United States)

    Gweon, Gahgene; Jain, Mahaveer; McDonough, John; Raj, Bhiksha; Rose, Carolyn P.

    2013-01-01

    This paper contributes to a theory-grounded methodological foundation for automatic collaborative learning process analysis. It does this by illustrating how insights from the social psychology and sociolinguistics of speech style provide a theoretical framework to inform the design of a computational model. The purpose of that model is to detect…

  4. The Promise of NLP and Speech Processing Technologies in Language Assessment

    Science.gov (United States)

    Chapelle, Carol A.; Chung, Yoo-Ree

    2010-01-01

    Advances in natural language processing (NLP) and automatic speech recognition and processing technologies offer new opportunities for language testing. Despite their potential uses on a range of language test item types, relatively little work has been done in this area, and it is therefore not well understood by test developers, researchers or…

  5. Speech data collection in an under-resourced language within a multilingual context

    CSIR Research Space (South Africa)

    Molapo, B

    2014-05-01

    Full Text Available In this paper, we present an end-to-end solution to the development of an automatic speech recognition (ASR) system in typical under-resourced languages, where the target language is likely to be influenced by one more embedded foreign languages. We...

  6. Dealing with Phrase Level Co-Articulation (PLC) in speech recognition: a first approach

    NARCIS (Netherlands)

    Ordelman, Roeland J.F.; van Hessen, Adrianus J.; van Leeuwen, David A.; Robinson, Tony; Renals, Steve

    1999-01-01

    Whereas nowadays within-word co-articulation effects are usually sufficiently dealt with in automatic speech recognition, this is not always the case with phrase level co-articulation effects (PLC). This paper describes a first approach in dealing with phrase level co-articulation by applying these

  7. Transcribe Your Class: Using Speech Recognition to Improve Access for At-Risk Students

    Science.gov (United States)

    Bain, Keith; Lund-Lucas, Eunice; Stevens, Janice

    2012-01-01

    Through a project supported by Canada's Social Development Partnerships Program, a team of leading National Disability Organizations, universities, and industry partners are piloting a prototype Hosted Transcription Service that uses speech recognition to automatically create multimedia transcripts that can be used by students for study purposes.…

  8. Relationship between speech motor control and speech intelligibility in children with speech sound disorders.

    Science.gov (United States)

    Namasivayam, Aravind Kumar; Pukonen, Margit; Goshulak, Debra; Yu, Vickie Y; Kadis, Darren S; Kroll, Robert; Pang, Elizabeth W; De Nil, Luc F

    2013-01-01

    The current study was undertaken to investigate the impact of speech motor issues on the speech intelligibility of children with moderate to severe speech sound disorders (SSD) within the context of the PROMPT intervention approach. The word-level Children's Speech Intelligibility Measure (CSIM), the sentence-level Beginner's Intelligibility Test (BIT) and tests of speech motor control and articulation proficiency were administered to 12 children (3:11 to 6:7 years) before and after PROMPT therapy. PROMPT treatment was provided for 45 min twice a week for 8 weeks. Twenty-four naïve adult listeners aged 22-46 years judged the intelligibility of the words and sentences. For CSIM, each time a recorded word was played to the listeners they were asked to look at a list of 12 words (multiple-choice format) and circle the word while for BIT sentences, the listeners were asked to write down everything they heard. Words correctly circled (CSIM) or transcribed (BIT) were averaged across three naïve judges to calculate percentage speech intelligibility. Speech intelligibility at both the word and sentence level was significantly correlated with speech motor control, but not articulatory proficiency. Further, the severity of speech motor planning and sequencing issues may potentially be a limiting factor in connected speech intelligibility and highlights the need to target these issues early and directly in treatment. The reader will be able to: (1) outline the advantages and disadvantages of using word- and sentence-level speech intelligibility tests; (2) describe the impact of speech motor control and articulatory proficiency on speech intelligibility; and (3) describe how speech motor control and speech intelligibility data may provide critical information to aid treatment planning. Copyright © 2013 Elsevier Inc. All rights reserved.

  9. A Mobile Phone based Speech Therapist

    OpenAIRE

    Pandey, Vinod K.; Pande, Arun; Kopparapu, Sunil Kumar

    2016-01-01

    Patients with articulatory disorders often have difficulty in speaking. These patients need several speech therapy sessions to enable them speak normally. These therapy sessions are conducted by a specialized speech therapist. The goal of speech therapy is to develop good speech habits as well as to teach how to articulate sounds the right way. Speech therapy is critical for continuous improvement to regain normal speech. Speech therapy sessions require a patient to travel to a hospital or a ...

  10. Inner Speech in People with Aphasia

    Directory of Open Access Journals (Sweden)

    William Hayward

    2014-05-01

    Here, we have demonstrated that subjective reports of IS are meaningful in at least some individuals with aphasia. Additional research is needed to confirm the degree to which self-reported IS accurately reflects phonological access, as well as to determine which processes of word retrieval and self-monitoring are needed for reliable insights into inner speech and how specific language deficits interact with these insights. Examining insight into IS could inform our understanding of the psychological and neural bases of word retrieval as well as provide a novel tool for early prognosis and individualized aphasia treatment.

  11. Automatic River Network Extraction from LIDAR Data

    Science.gov (United States)

    Maderal, E. N.; Valcarcel, N.; Delgado, J.; Sevilla, C.; Ojeda, J. C.

    2016-06-01

    National Geographic Institute of Spain (IGN-ES) has launched a new production system for automatic river network extraction for the Geospatial Reference Information (GRI) within hydrography theme. The goal is to get an accurate and updated river network, automatically extracted as possible. For this, IGN-ES has full LiDAR coverage for the whole Spanish territory with a density of 0.5 points per square meter. To implement this work, it has been validated the technical feasibility, developed a methodology to automate each production phase: hydrological terrain models generation with 2 meter grid size and river network extraction combining hydrographic criteria (topographic network) and hydrological criteria (flow accumulation river network), and finally the production was launched. The key points of this work has been managing a big data environment, more than 160,000 Lidar data files, the infrastructure to store (up to 40 Tb between results and intermediate files), and process; using local virtualization and the Amazon Web Service (AWS), which allowed to obtain this automatic production within 6 months, it also has been important the software stability (TerraScan-TerraSolid, GlobalMapper-Blue Marble , FME-Safe, ArcGIS-Esri) and finally, the human resources managing. The results of this production has been an accurate automatic river network extraction for the whole country with a significant improvement for the altimetric component of the 3D linear vector. This article presents the technical feasibility, the production methodology, the automatic river network extraction production and its advantages over traditional vector extraction systems.

  12. Analytical Study on Fundamental Frequency Contours of Thai Expressive Speech Using Fujisaki's Model

    Directory of Open Access Journals (Sweden)

    Suphattharachai Chomphan

    2010-01-01

    Full Text Available Problem statement: In spontaneous speech communication, prosody is an important factor that must be taken into account, since the prosody effects on not only the naturalness but also the intelligibility of speech. Focusing on synthesis of Thai expressive speech, a number of systems has been developed for years. However, the expressive speech with various speaking styles has not been accomplished. To achieve the generation of expressive speech, we need to model the fundamental frequency (F0 contours accurately to preserve the speech prosody. Approach: Therefore this study proposes an analysis of model parameters for Thai speech prosody with three speaking styles and two genders which is a preliminary work for speech synthesis. Fujisaki's modeling; a powerful tool to model the F0 contour has been adopted, while the speaking styles of happiness, sadness and reading have been considered. Seven derived parameters from the Fujisaki's model are as follows. The first parameter is baseline frequency which is the lowest level of F0 contour. The second and third parameters are the numbers of phrase commands and tone commands which reflect the frequencies of surges of the utterance in global and local levels, respectively. The fourth and fifth parameters are phrase command and tone command durations which reflect the speed of speaking and the length of a syllable, respectively. The sixth and seventh parameters are amplitudes of phrase command and tone command which reflect the energy of the global speech and the energy of local syllable. Results: In the experiments, each speaking style includes 200 samples of one sentence with male and female speech. Therefore our speech database contains 1200 utterances in total. The results show that most of the proposed parameters can distinguish three kinds of speaking styles explicitly. Conclusion: From the finding, it is a strong evidence to further apply the successful parameters in the speech synthesis systems or

  13. Comparative analysis of gait and speech in Parkinson's disease: hypokinetic or dysrhythmic disorders?

    Science.gov (United States)

    Cantiniaux, Stéphanie; Vaugoyeau, Marianne; Robert, Danièle; Horrelou-Pitek, Christine; Mancini, Julien; Witjas, Tatiana; Azulay, Jean-Philippe

    2010-02-01

    Gait and speech are automatic motor activities which are frequently impaired in Parkinson's disease. Obvious clinical similarities exist between these disorders but were never investigated. We propose to determine whether there exist any common features in Parkinson's disease between spatiotemporal gait disorders and temporal speech disorders. Gait and speech were analysed on 11 Parkinsonian patients (PP) undergoing deep-brain stimulation of the subthalamic nucleus (STN-DBS) and 11 control subjects under three conditions of velocity (natural, slow and speed). The patients were tested with and without l-dopa and stimulator ON or OFF. Locomotor parameters were recorded using an optoelectronic system. Speech parameters were recorded with a headphone while subjects were reading a short paragraph. The results confirmed that PP walk and read more slowly than controls. Patient's difficulties in modulating walking and speech velocities seem to be due mainly to an inability to internally control the step length and the interpause-speech duration (ISD). STN-DBS and levodopa increased patients' walking velocity by increasing the step length. STN-DBS and levodopa had no effect on speech velocity but restored the patients' ability to modulate the ISD. The walking cadence and speech index of rythmicity tended to be lower in patients and were not significantly improved by STN-DBS or levodopa. Speech and walking velocity as well as ISD and step length were correlated in both groups. Negative correlations between speech index of and walking cadence were observed in both groups. Similar fundamental hypokinetic impairment and probably a similar rhythmic factor similarly affected the patients' speech and gait. These results suggest a similar physiopathological process in both walking and speaking dysfunction.

  14. An acoustic feature-based similarity scoring system for speech rehabilitation assistance.

    Science.gov (United States)

    Syauqy, Dahnial; Wu, Chao-Min; Setyawati, Onny

    2016-08-01

    The purpose of this study is to develop a tool to assist speech therapy and rehabilitation, which focused on automatic scoring based on the comparison of the patient's speech with another normal speech on several aspects including pitch, vowel, voiced-unvoiced segments, strident fricative and sound intensity. The pitch estimation employed the use of cepstrum-based algorithm for its robustness; the vowel classification used multilayer perceptron (MLP) to classify vowel from pitch and formants; and the strident fricative detection was based on the major peak spectral intensity, location and the pitch existence in the segment. In order to evaluate the performance of the system, this study analyzed eight patient's speech recordings (four males, four females; 4-58-years-old), which had been recorded in previous study in cooperation with Taipei Veterans General Hospital and Taoyuan General Hospital. The experiment result on pitch algorithm showed that the cepstrum method had 5.3% of gross pitch error from a total of 2086 frames. On the vowel classification algorithm, MLP method provided 93% accuracy (men), 87% (women) and 84% (children). In total, the overall results showed that 156 tool's grading results (81%) were consistent compared to 192 audio and visual observations done by four experienced respondents. Implication for Rehabilitation Difficulties in communication may limit the ability of a person to transfer and exchange information. The fact that speech is one of the primary means of communication has encouraged the needs of speech diagnosis and rehabilitation. The advances of technology in computer-assisted speech therapy (CAST) improve the quality, time efficiency of the diagnosis and treatment of the disorders. The present study attempted to develop tool to assist speech therapy and rehabilitation, which provided simple interface to let the assessment be done even by the patient himself without the need of particular knowledge of speech processing while at the

  15. Speech Motor Control in Fluent and Dysfluent Speech Production of an Individual with Apraxia of Speech and Broca's Aphasia

    Science.gov (United States)

    van Lieshout, Pascal H. H. M.; Bose, Arpita; Square, Paula A.; Steele, Catriona M.

    2007-01-01

    Apraxia of speech (AOS) is typically described as a motor-speech disorder with clinically well-defined symptoms, but without a clear understanding of the underlying problems in motor control. A number of studies have compared the speech of subjects with AOS to the fluent speech of controls, but only a few have included speech movement data and if…

  16. Aero-tactile integration in speech perception

    Science.gov (United States)

    Gick, Bryan; Derrick, Donald

    2013-01-01

    Visual information from a speaker’s face can enhance1 or interfere with2 accurate auditory perception. This integration of information across auditory and visual streams has been observed in functional imaging studies3,4, and has typically been attributed to the frequency and robustness with which perceivers jointly encounter event-specific information from these two modalities5. Adding the tactile modality has long been considered a crucial next step in understanding multisensory integration. However, previous studies have found an influence of tactile input on speech perception only under limited circumstances, either where perceivers were aware of the task6,7 or where they had received training to establish a cross-modal mapping8–10. Here we show that perceivers integrate naturalistic tactile information during auditory speech perception without previous training. Drawing on the observation that some speech sounds produce tiny bursts of aspiration (such as English ‘p’)11, we applied slight, inaudible air puffs on participants’ skin at one of two locations: the right hand or the neck. Syllables heard simultaneously with cutaneous air puffs were more likely to be heard as aspirated (for example, causing participants to mishear ‘b’ as ‘p’). These results demonstrate that perceivers integrate event-relevant tactile information in auditory perception in much the same way as they do visual information. PMID:19940925

  17. Objective speech quality measures for Internet telephony

    Science.gov (United States)

    Hall, Timothy A.

    2001-07-01

    Measuring voice quality for telephony is not a new problem. However, packet-switched, best-effort networks such as the Internet present significant new challenges for the delivery of real-time voice traffic. Unlike the circuit-switched PSTN, Internet protocol (IP) networks guarantee neither sufficient bandwidth for the voice traffic nor a constant, minimal delay. Dropped packets and varying delays introduce distortions not found in traditional telephony. In addition, if a low bitrate codec is used in voice over IP (VoIP) to achieve a high compression ratio, the original waveform can be significantly distorted. These new potential sources of signal distortion present significant challenges for objectively measuring speech quality. Measurement techniques designed for the PSTN may not perform well in VoIP environments. Our objective is to find a speech quality metric that accurately predicts subjective human perception under the conditions present in VoIP systems. To do this, we compared three types of measures: perceptually weighted distortion measures such as enhanced modified Bark spectral distance (EMBSD) and measuring normalizing blocks (MNB), word-error rates of continuous speech recognizers, and the ITU E-model. We tested the performance of these measures under conditions typical of a VoIP system. We found that the E-model had the highest correlation with mean opinion scores (MOS). The E-model is well-suited for online monitoring because it does not use the original (undistorted) signal to compute its quality metric and because it is computationally simple.

  18. Semi-automatic knee cartilage segmentation

    Science.gov (United States)

    Dam, Erik B.; Folkesson, Jenny; Pettersen, Paola C.; Christiansen, Claus

    2006-03-01

    Osteo-Arthritis (OA) is a very common age-related cause of pain and reduced range of motion. A central effect of OA is wear-down of the articular cartilage that otherwise ensures smooth joint motion. Quantification of the cartilage breakdown is central in monitoring disease progression and therefore cartilage segmentation is required. Recent advances allow automatic cartilage segmentation with high accuracy in most cases. However, the automatic methods still fail in some problematic cases. For clinical studies, even if a few failing cases will be averaged out in the overall results, this reduces the mean accuracy and precision and thereby necessitates larger/longer studies. Since the severe OA cases are often most problematic for the automatic methods, there is even a risk that the quantification will introduce a bias in the results. Therefore, interactive inspection and correction of these problematic cases is desirable. For diagnosis on individuals, this is even more crucial since the diagnosis will otherwise simply fail. We introduce and evaluate a semi-automatic cartilage segmentation method combining an automatic pre-segmentation with an interactive step that allows inspection and correction. The automatic step consists of voxel classification based on supervised learning. The interactive step combines a watershed transformation of the original scan with the posterior probability map from the classification step at sub-voxel precision. We evaluate the method for the task of segmenting the tibial cartilage sheet from low-field magnetic resonance imaging (MRI) of knees. The evaluation shows that the combined method allows accurate and highly reproducible correction of the segmentation of even the worst cases in approximately ten minutes of interaction.

  19. Automatic Program Development

    DEFF Research Database (Denmark)

    by members of the IFIP Working Group 2.1 of which Bob was an active member. All papers are related to some of the research interests of Bob and, in particular, to the transformational development of programs and their algorithmic derivation from formal specifications. Automatic Program Development offers......Automatic Program Development is a tribute to Robert Paige (1947-1999), our accomplished and respected colleague, and moreover our good friend, whose untimely passing was a loss to our academic and research community. We have collected the revised, updated versions of the papers published in his...... honor in the Higher-Order and Symbolic Computation Journal in the years 2003 and 2005. Among them there are two papers by Bob: (i) a retrospective view of his research lines, and (ii) a proposal for future studies in the area of the automatic program derivation. The book also includes some papers...

  20. Two distinct auditory-motor circuits for monitoring speech production as revealed by content-specific suppression of auditory cortex.

    Science.gov (United States)

    Ylinen, Sari; Nora, Anni; Leminen, Alina; Hakala, Tero; Huotilainen, Minna; Shtyrov, Yury; Mäkelä, Jyrki P; Service, Elisabet

    2015-06-01

    Speech production, both overt and covert, down-regulates the activation of auditory cortex. This is thought to be due to forward prediction of the sensory consequences of speech, contributing to a feedback control mechanism for speech production. Critically, however, these regulatory effects should be specific to speech content to enable accurate speech monitoring. To determine the extent to which such forward prediction is content-specific, we recorded the brain's neuromagnetic responses to heard multisyllabic pseudowords during covert rehearsal in working memory, contrasted with a control task. The cortical auditory processing of target syllables was significantly suppressed during rehearsal compared with control, but only when they matched the rehearsed items. This critical specificity to speech content enables accurate speech monitoring by forward prediction, as proposed by current models of speech production. The one-to-one phonological motor-to-auditory mappings also appear to serve the maintenance of information in phonological working memory. Further findings of right-hemispheric suppression in the case of whole-item matches and left-hemispheric enhancement for last-syllable mismatches suggest that speech production is monitored by 2 auditory-motor circuits operating on different timescales: Finer grain in the left versus coarser grain in the right hemisphere. Taken together, our findings provide hemisphere-specific evidence of the interface between inner and heard speech.