WorldWideScience

Sample records for accurate automatic speech

  1. Techniques for automatic speech recognition

    Moore, R. K.

    1983-05-01

    A brief insight into some of the algorithms that lie behind current automatic speech recognition system is provided. Early phonetically based approaches were not particularly successful, due mainly to a lack of appreciation of the problems involved. These problems are summarized, and various recognition techniques are reviewed in the contect of the solutions that they provide. It is pointed out that the majority of currently available speech recognition equipments employ a "whole-word' pattern matching approach which, although relatively simple, has proved particularly successful in its ability to recognize speech. The concepts of time-normalizing plays a central role in this type of recognition process and a family of such algorithms is described in detail. The technique of dynamic time warping is not only capable of providing good performance for isolated word recognition, but how it is also extended to the recognition of connected speech (thereby removing one of the most severe limitations of early speech recognition equipment).

  2. Automatic Speech Segmentation Based on HMM

    M. Kroul

    2007-01-01

    This contribution deals with the problem of automatic phoneme segmentation using HMMs. Automatization of speech segmentation task is important for applications, where large amount of data is needed to process, so manual segmentation is out of the question. In this paper we focus on automatic segmentation of recordings, which will be used for triphone synthesis unit database creation. For speech synthesis, the speech unit quality is a crucial aspect, so the maximal accuracy in segmentation is ...

  3. Dynamic Automatic Noisy Speech Recognition System (DANSR)

    Paul, Sheuli

    2014-01-01

    In this thesis we studied and investigated a very common but a long existing noise problem and we provided a solution to this problem. The task is to deal with different types of noise that occur simultaneously and which we call hybrid. Although there are individual solutions for specific types one cannot simply combine them because each solution affects the whole speech. We developed an automatic speech recognition system DANSR ( Dynamic Automatic Noisy Speech Recognition System) for hybri...

  4. Automatic speech recognition a deep learning approach

    Yu, Dong

    2015-01-01

    This book summarizes the recent advancement in the field of automatic speech recognition with a focus on discriminative and hierarchical models. This will be the first automatic speech recognition book to include a comprehensive coverage of recent developments such as conditional random field and deep learning techniques. It presents insights and theoretical foundation of a series of recent models such as conditional random field, semi-Markov and hidden conditional random field, deep neural network, deep belief network, and deep stacking models for sequential learning. It also discusses practical considerations of using these models in both acoustic and language modeling for continuous speech recognition.

  5. Personality in speech assessment and automatic classification

    Polzehl, Tim

    2015-01-01

    This work combines interdisciplinary knowledge and experience from research fields of psychology, linguistics, audio-processing, machine learning, and computer science. The work systematically explores a novel research topic devoted to automated modeling of personality expression from speech. For this aim, it introduces a novel personality assessment questionnaire and presents the results of extensive labeling sessions to annotate the speech data with personality assessments. It provides estimates of the Big 5 personality traits, i.e. openness, conscientiousness, extroversion, agreeableness, and neuroticism. Based on a database built on the questionnaire, the book presents models to tell apart different personality types or classes from speech automatically.

  6. Development of a System for Automatic Recognition of Speech Development of a System for Automatic Recognition of Speech

    Michal Kuba

    2003-01-01

    Full Text Available The article gives a review of a research on processing and automatic recognition of speech signals (ARR at the Department of Telecommunications of the Faculty of Electrical Engineering, University of iilina. On-going research is oriented to speech parametrization using 2-dimensional cepstral analysis, and to an application of HMMs and neural networks for speech recognition in Slovak language. The article summarizes achieved results and outlines future orientation of our research in automatic speech recognition.The article gives a review of a research on processing and automatic recognition of speech signals (ARR at the Department of Telecommunications of the Faculty of Electrical Engineering, University of Zilina. On-going research is oriented to speech parametrization using 2-dimensional cepstral analysis, and to an application of HMMs and neural networks for speech recognition in Slovak language. The article summarizes achieved results and outlines future orientation of our research in automatic speech recognition.

  7. Automatic Phonetic Transcription for Danish Speech Recognition

    Kirkedal, Andreas Søeborg

    to acquire and expensive to create. For languages with productive compounding or agglutinative languages like German and Finnish, respectively, phonetic dictionaries are also hard to maintain. For this reason, automatic phonetic transcription tools have been produced for many languages. The quality...... of automatic phonetic transcriptions vary greatly with respect to language and transcription strategy. For some languages where the difference between the graphemic and phonetic representations are small, graphemic transcriptions can be used to create ASR systems with acceptable performance. In other languages......, syllabication, stød and several other suprasegmental features (Kirkedal, 2013). Simplifying the transcriptions by filtering out the symbols for suprasegmental features in a post-processing step produces a format that is suitable for ASR purposes. eSpeak is an open source speech synthesizer originally created...

  8. a New Structure for Automatic Speech Recognition

    Duchnowski, Paul

    Speech is a wideband signal with cues identifying a particular element distributed across frequency. To capture these cues, most ASR systems analyze the speech signal into spectral (or spectrally-derived) components prior to recognition. Traditionally, these components are integrated across frequency to form a vector of "acoustic evidence" on which a decision by the ASR system is based. This thesis develops an alternate approach, post-labeling integration. In this scheme, tentative decisions or labels, of the identity of a given speech element are assigned in parallel by sub -recognizers, each operating on a band-limited portion of the speech waveform. Outputs of these independent channels are subsequently combined (integrated) to render the final decision. Remarkably good recognition of bandlimited nonsense syllables by humans leads to the consideration of this method. It also allows potentially more accurate parameterization of the speech waveform and simultaneously robust estimation of parameter probabilities. The algorithm also represents an attempt to make explicit use of redundancies in speech. Three basic methods of parameterizing the bandlimited input of the sub-recognizers were considered, focusing respectively on LPC and cepstrum coefficients, and parameters based on the autocorrelation function. Four sub-recognizers were implemented as discrete Hidden Markov Model (HMM) systems. Maximum A Posteriori (MAP) hypothesis testing approach was applied to the problem of integrating the individual sub-recognizer decisions on a frame by frame basis. Final segmentation was achieved by a secondary HMM. Five methods of estimating the probabilities necessary for MAP integration were tested. The proposed structure was applied to the task of phonetic, speaker-independent, continuous speech recognition. Performance for several combinations of parameterization schemes and integration methods was measured. The best score of 58.5% on a 39 phone alphabet is roughly

  9. Punjabi Automatic Speech Recognition Using HTK

    Mohit Dua

    2012-07-01

    Full Text Available This paper aims to discuss the implementation of an isolated word Automatic Speech Recognition system (ASR for an Indian regional language Punjabi. The HTK toolkit based on Hidden Markov Model (HMM, a statistical approach, is used to develop the system. Initially the system is trained for 115 distinct Punjabi words by collecting data from eight speakers and then is tested by using samples from six speakers in real time environments. To make the system more interactive and fast a GUI has been developed using JAVA platform for implementing the testing module. The paper also describes the role of each HTK tool, used in various phases of system development, by presenting a detailed architecture of an ASR system developed using HTK library modules and tools. The experimental results show that the overall system performance is 95.63% and 94.08%.

  10. Confidence Measures for Automatic and Interactive Speech Recognition

    Sánchez Cortina, Isaías

    2016-01-01

    [EN] This thesis work contributes to the field of the {Automatic Speech Recognition} (ASR). And particularly to the {Interactive Speech Transcription} and {Confidence Measures} (CM) for ASR. The main goals of this thesis work can be summarised as follows: 1. To design IST methods and tools to tackle the problem of improving automatically generated transcripts. 2. To assess the designed IST methods and tools on real-life tasks of transcription in large educational repositories of vide...

  11. Disordered Speech Assessment Using Automatic Methods Based on Quantitative Measures

    Shrivastav Rahul

    2005-01-01

    Full Text Available Speech quality assessment methods are necessary for evaluating and documenting treatment outcomes of patients suffering from degraded speech due to Parkinson's disease, stroke, or other disease processes. Subjective methods of speech quality assessment are more accurate and more robust than objective methods but are time-consuming and costly. We propose a novel objective measure of speech quality assessment that builds on traditional speech processing techniques such as dynamic time warping (DTW and the Itakura-Saito (IS distortion measure. Initial results show that our objective measure correlates well with the more expensive subjective methods.

  12. Automatic speech recognition for radiological reporting

    Large vocabulary speech recognition, its techniques and its software and hardware technology, are being developed, aimed at providing the office user with a tool that could significantly improve both quantity and quality of his work: the dictation machine, which allows memos and documents to be input using voice and a microphone instead of fingers and a keyboard. The IBM Rome Science Center, together with the IBM Research Division, has built a prototype recognizer that accepts sentences in natural language from 20.000-word Italian vocabulary. The unit runs on a personal computer equipped with a special hardware capable of giving all the necessary computing power. The first laboratory experiments yielded very interesting results and pointed out such system characteristics to make its use possible in operational environments. To this purpose, the dictation of medical reports was considered as a suitable application. In cooperation with the 2nd Radiology Department of S. Maria della Misericordia Hospital (Udine, Italy), a system was experimented by radiology department doctors during their everyday work. The doctors were able to directly dictate their reports to the unit. The text appeared immediately on the screen, and eventual errors could be corrected either by voice or by using the keyboard. At the end of report dictation, the doctors could both print and archive the text. The report could also be forwarded to hospital information system, when the latter was available. Our results have been very encouraging: the system proved to be robust, simple to use, and accurate (over 95% average recognition rate). The experiment was precious for suggestion and comments, and its results are useful for system evolution towards improved system management and efficency

  13. Modelling Errors in Automatic Speech Recognition for Dysarthric Speakers

    Caballero Morales, Santiago Omar; Cox, Stephen J.

    2009-12-01

    Dysarthria is a motor speech disorder characterized by weakness, paralysis, or poor coordination of the muscles responsible for speech. Although automatic speech recognition (ASR) systems have been developed for disordered speech, factors such as low intelligibility and limited phonemic repertoire decrease speech recognition accuracy, making conventional speaker adaptation algorithms perform poorly on dysarthric speakers. In this work, rather than adapting the acoustic models, we model the errors made by the speaker and attempt to correct them. For this task, two techniques have been developed: (1) a set of "metamodels" that incorporate a model of the speaker's phonetic confusion matrix into the ASR process; (2) a cascade of weighted finite-state transducers at the confusion matrix, word, and language levels. Both techniques attempt to correct the errors made at the phonetic level and make use of a language model to find the best estimate of the correct word sequence. Our experiments show that both techniques outperform standard adaptation techniques.

  14. Automatic Speech Receognition for Human-Machine Interaction

    Biundo, Giuseppina; Grassi Pauletti, Sara; Ansorge, Michael; Farine, Pierre-André

    2005-01-01

    Since the sixties, movies such as “2001: A Space Odyssey” have familiarized us with the idea of com-puters that can speak and hear just as a human being does. Automatic speech recogni-tion (ASR) is the technol-ogy that allows machines to interpret human speech (i.e. to answer the ques-tion: What is being said?). The machine ”speaks back“, either by playing pre-recorded messages or by using text-to-speech (TTS) technology.

  15. Automatic speech signal segmentation based on the innovation adaptive filter

    Makowski Ryszard

    2014-06-01

    Full Text Available Speech segmentation is an essential stage in designing automatic speech recognition systems and one can find several algorithms proposed in the literature. It is a difficult problem, as speech is immensely variable. The aim of the authors’ studies was to design an algorithm that could be employed at the stage of automatic speech recognition. This would make it possible to avoid some problems related to speech signal parametrization. Posing the problem in such a way requires the algorithm to be capable of working in real time. The only such algorithm was proposed by Tyagi et al., (2006, and it is a modified version of Brandt’s algorithm. The article presents a new algorithm for unsupervised automatic speech signal segmentation. It performs segmentation without access to information about the phonetic content of the utterances, relying exclusively on second-order statistics of a speech signal. The starting point for the proposed method is time-varying Schur coefficients of an innovation adaptive filter. The Schur algorithm is known to be fast, precise, stable and capable of rapidly tracking changes in second order signal statistics. A transfer from one phoneme to another in the speech signal always indicates a change in signal statistics caused by vocal track changes. In order to allow for the properties of human hearing, detection of inter-phoneme boundaries is performed based on statistics defined on the mel spectrum determined from the reflection coefficients. The paper presents the structure of the algorithm, defines its properties, lists parameter values, describes detection efficiency results, and compares them with those for another algorithm. The obtained segmentation results, are satisfactory.

  16. Automatic discrimination between laughter and speech

    Truong, K.; Leeuwen, D. van

    2007-01-01

    Emotions can be recognized by audible paralinguistic cues in speech. By detecting these paralinguistic cues that can consist of laughter, a trembling voice, coughs, changes in the intonation contour etc., information about the speaker’s state and emotion can be revealed. This paper describes the dev

  17. Speaker-Machine Interaction in Automatic Speech Recognition. Technical Report.

    Makhoul, John I.

    The feasibility and limitations of speaker adaptation in improving the performance of a "fixed" (speaker-independent) automatic speech recognition system were examined. A fixed vocabulary of 55 syllables is used in the recognition system which contains 11 stops and fricatives and five tense vowels. The results of an experiment on speaker…

  18. Automatic Emotion Recognition in Speech: Possibilities and Significance

    Milana Bojanić

    2009-12-01

    Full Text Available Automatic speech recognition and spoken language understanding are crucial steps towards a natural humanmachine interaction. The main task of the speech communication process is the recognition of the word sequence, but the recognition of prosody, emotion and stress tags may be of particular importance as well. This paper discusses thepossibilities of recognition emotion from speech signal in order to improve ASR, and also provides the analysis of acoustic features that can be used for the detection of speaker’s emotion and stress. The paper also provides a short overview of emotion and stress classification techniques. The importance and place of emotional speech recognition is shown in the domain of human-computer interactive systems and transaction communication model. The directions for future work are given at the end of this work.

  19. On Automatic Voice Casting for Expressive Speech: Speaker Recognition vs. Speech Classification

    Obin, Nicolas; Roebel, Axel; Bachman, Grégoire

    2014-01-01

    This paper presents the first large-scale automatic voice casting system, and explores the adaptation of speaker recognition techniques to measure voice similarities. The proposed system is based on the representation of a voice by classes (e.g., age/gender, voice quality, emotion). First, a multi-label system is used to classify speech into classes. Then, the output probabilities for each class are concatenated to form a vector that represents the vocal signature of a speech recording. Final...

  20. Mixed Bayesian Networks with Auxiliary Variables for Automatic Speech Recognition

    Stephenson, Todd Andrew; Magimai.-Doss, Mathew; Bourlard, Hervé

    2001-01-01

    Standard hidden Markov models (HMMs), as used in automatic speech recognition (ASR), calculate their emission probabilities by an artificial neural network (ANN) or a Gaussian distribution conditioned on the hidden state variable, considering the emissions independent of any other variable in the model. Recent work showed the benefit of conditioning the emission distributions on a discrete auxiliary variable, which is observed in training and hidden in recognition. Related work has shown the ...

  1. Automatic Speech Signal Analysis for Clinical Diagnosis and Assessment of Speech Disorders

    Baghai-Ravary, Ladan

    2013-01-01

    Automatic Speech Signal Analysis for Clinical Diagnosis and Assessment of Speech Disorders provides a survey of methods designed to aid clinicians in the diagnosis and monitoring of speech disorders such as dysarthria and dyspraxia, with an emphasis on the signal processing techniques, statistical validity of the results presented in the literature, and the appropriateness of methods that do not require specialized equipment, rigorously controlled recording procedures or highly skilled personnel to interpret results. Such techniques offer the promise of a simple and cost-effective, yet objective, assessment of a range of medical conditions, which would be of great value to clinicians. The ideal scenario would begin with the collection of examples of the clients’ speech, either over the phone or using portable recording devices operated by non-specialist nursing staff. The recordings could then be analyzed initially to aid diagnosis of conditions, and subsequently to monitor the clients’ progress and res...

  2. The benefit obtained from visually displayed text from an automatic speech recognizer during listening to speech presented in noise

    Zekveld, A.A.; Kramer, S.E.; Kessens, J.M.; Vlaming, M.S.M.G.; Houtgast, T.

    2008-01-01

    OBJECTIVES: The aim of this study was to evaluate the benefit that listeners obtain from visually presented output from an automatic speech recognition (ASR) system during listening to speech in noise. DESIGN: Auditory-alone and audiovisual speech reception thresholds (SRTs) were measured. The SRT i

  3. Modelling Errors in Automatic Speech Recognition for Dysarthric Speakers

    Santiago Omar Caballero Morales

    2009-01-01

    Full Text Available Dysarthria is a motor speech disorder characterized by weakness, paralysis, or poor coordination of the muscles responsible for speech. Although automatic speech recognition (ASR systems have been developed for disordered speech, factors such as low intelligibility and limited phonemic repertoire decrease speech recognition accuracy, making conventional speaker adaptation algorithms perform poorly on dysarthric speakers. In this work, rather than adapting the acoustic models, we model the errors made by the speaker and attempt to correct them. For this task, two techniques have been developed: (1 a set of “metamodels” that incorporate a model of the speaker's phonetic confusion matrix into the ASR process; (2 a cascade of weighted finite-state transducers at the confusion matrix, word, and language levels. Both techniques attempt to correct the errors made at the phonetic level and make use of a language model to find the best estimate of the correct word sequence. Our experiments show that both techniques outperform standard adaptation techniques.

  4. Automatic audiovisual integration in speech perception.

    Gentilucci, Maurizio; Cattaneo, Luigi

    2005-11-01

    Two experiments aimed to determine whether features of both the visual and acoustical inputs are always merged into the perceived representation of speech and whether this audiovisual integration is based on either cross-modal binding functions or on imitation. In a McGurk paradigm, observers were required to repeat aloud a string of phonemes uttered by an actor (acoustical presentation of phonemic string) whose mouth, in contrast, mimicked pronunciation of a different string (visual presentation). In a control experiment participants read the same printed strings of letters. This condition aimed to analyze the pattern of voice and the lip kinematics controlling for imitation. In the control experiment and in the congruent audiovisual presentation, i.e. when the articulation mouth gestures were congruent with the emission of the string of phones, the voice spectrum and the lip kinematics varied according to the pronounced strings of phonemes. In the McGurk paradigm the participants were unaware of the incongruence between visual and acoustical stimuli. The acoustical analysis of the participants' spoken responses showed three distinct patterns: the fusion of the two stimuli (the McGurk effect), repetition of the acoustically presented string of phonemes, and, less frequently, of the string of phonemes corresponding to the mouth gestures mimicked by the actor. However, the analysis of the latter two responses showed that the formant 2 of the participants' voice spectra always differed from the value recorded in the congruent audiovisual presentation. It approached the value of the formant 2 of the string of phonemes presented in the other modality, which was apparently ignored. The lip kinematics of the participants repeating the string of phonemes acoustically presented were influenced by the observation of the lip movements mimicked by the actor, but only when pronouncing a labial consonant. The data are discussed in favor of the hypothesis that features of both

  5. Speech Acquisition and Automatic Speech Recognition for Integrated Spacesuit Audio Systems

    Huang, Yiteng; Chen, Jingdong; Chen, Shaoyan

    2010-01-01

    A voice-command human-machine interface system has been developed for spacesuit extravehicular activity (EVA) missions. A multichannel acoustic signal processing method has been created for distant speech acquisition in noisy and reverberant environments. This technology reduces noise by exploiting differences in the statistical nature of signal (i.e., speech) and noise that exists in the spatial and temporal domains. As a result, the automatic speech recognition (ASR) accuracy can be improved to the level at which crewmembers would find the speech interface useful. The developed speech human/machine interface will enable both crewmember usability and operational efficiency. It can enjoy a fast rate of data/text entry, small overall size, and can be lightweight. In addition, this design will free the hands and eyes of a suited crewmember. The system components and steps include beam forming/multi-channel noise reduction, single-channel noise reduction, speech feature extraction, feature transformation and normalization, feature compression, model adaption, ASR HMM (Hidden Markov Model) training, and ASR decoding. A state-of-the-art phoneme recognizer can obtain an accuracy rate of 65 percent when the training and testing data are free of noise. When it is used in spacesuits, the rate drops to about 33 percent. With the developed microphone array speech-processing technologies, the performance is improved and the phoneme recognition accuracy rate rises to 44 percent. The recognizer can be further improved by combining the microphone array and HMM model adaptation techniques and using speech samples collected from inside spacesuits. In addition, arithmetic complexity models for the major HMMbased ASR components were developed. They can help real-time ASR system designers select proper tasks when in the face of constraints in computational resources.

  6. Studies on inter-speaker variability in speech and its application in automatic speech recognition

    S Umesh

    2011-10-01

    In this paper, we give an overview of the problem of inter-speaker variability and its study in many diverse areas of speech signal processing. We first give an overview of vowel-normalization studies that minimize variations in the acoustic representation of vowel realizations by different speakers. We then describe the universal-warping approach to speaker normalization which unifies many of the vowel normalization approaches and also shows the relation between speech production, perception and auditory processing. We then address the problem of inter-speaker variability in automatic speech recognition (ASR) and describe techniques that are used to reduce these effects and thereby improve the performance of speaker-independent ASR systems.

  7. Robust Automatic Speech Recognition in Impulsive Noise Environment

    DINGPei; CAOZhigang

    2005-01-01

    This paper presents an efficient method to directly suppress the effect of impulsive noise for robust Automatic speech recognition (ASR). In this method, according to the noise sensitivity of each feature dimension,the observation vectors are divided into several parts, eachof which is assigned to a proper threshold. In recognition stage, the unreliable probability preponderance of incorrect competing path caused by impulsive noise is eliminated by Flooring observation probability (FOP) of eachfeature sub-vector at the Gaussian mixture level, so that the correct path will recover the priority of being chosen in decoding. Experimental results also demonstrate that the proposed method can significantly improve the recognition accuracy both in machinegun noise and simulated impulsive noise environments, while maintaining high performance for clean speech recognition.

  8. Computer-based automatic finger- and speech-tracking system.

    Breidegard, Björn

    2007-11-01

    This article presents the first technology ever for online registration and interactive and automatic analysis of finger movements during tactile reading (Braille and tactile pictures). Interactive software has been developed for registration (with two cameras and a microphone), MPEG-2 video compression and storage on disk or DVD as well as an interactive analysis program to aid human analysis. An automatic finger-tracking system has been implemented which also semiautomatically tracks the reading aloud speech on the syllable level. This set of tools opens the way for large scale studies of blind people reading Braille or tactile images. It has been tested in a pilot project involving congenitally blind subjects reading texts and pictures. PMID:18183897

  9. Speech recognition for embedded automatic positioner for laparoscope

    Chen, Xiaodong; Yin, Qingyun; Wang, Yi; Yu, Daoyin

    2014-07-01

    In this paper a novel speech recognition methodology based on Hidden Markov Model (HMM) is proposed for embedded Automatic Positioner for Laparoscope (APL), which includes a fixed point ARM processor as the core. The APL system is designed to assist the doctor in laparoscopic surgery, by implementing the specific doctor's vocal control to the laparoscope. Real-time respond to the voice commands asks for more efficient speech recognition algorithm for the APL. In order to reduce computation cost without significant loss in recognition accuracy, both arithmetic and algorithmic optimizations are applied in the method presented. First, depending on arithmetic optimizations most, a fixed point frontend for speech feature analysis is built according to the ARM processor's character. Then the fast likelihood computation algorithm is used to reduce computational complexity of the HMM-based recognition algorithm. The experimental results show that, the method shortens the recognition time within 0.5s, while the accuracy higher than 99%, demonstrating its ability to achieve real-time vocal control to the APL.

  10. Automatic Speech Recognition Systems for the Evaluation of Voice and Speech Disorders in Head and Neck Cancer

    Andreas Maier

    2010-01-01

    Full Text Available In patients suffering from head and neck cancer, speech intelligibility is often restricted. For assessment and outcome measurements, automatic speech recognition systems have previously been shown to be appropriate for objective and quick evaluation of intelligibility. In this study we investigate the applicability of the method to speech disorders caused by head and neck cancer. Intelligibility was quantified by speech recognition on recordings of a standard text read by 41 German laryngectomized patients with cancer of the larynx or hypopharynx and 49 German patients who had suffered from oral cancer. The speech recognition provides the percentage of correctly recognized words of a sequence, that is, the word recognition rate. Automatic evaluation was compared to perceptual ratings by a panel of experts and to an age-matched control group. Both patient groups showed significantly lower word recognition rates than the control group. Automatic speech recognition yielded word recognition rates which complied with experts' evaluation of intelligibility on a significant level. Automatic speech recognition serves as a good means with low effort to objectify and quantify the most important aspect of pathologic speech—the intelligibility. The system was successfully applied to voice and speech disorders.

  11. Post-error Correction in Automatic Speech Recognition Using Discourse Information

    Kang, S.; Kim, J.-H.; Seo, J.

    2014-01-01

    Overcoming speech recognition errors in the field of human�computer interaction is important in ensuring a consistent user experience. This paper proposes a semantic-oriented post-processing approach for the correction of errors in speech recognition. The novelty of the model proposed here is that it re-ranks the n-best hypothesis of speech recognition based on the user's intention, which is analyzed from previous discourse information, while conventional automatic speech reco...

  12. Assessing the efficacy of benchmarks for automatic speech accent recognition

    Benjamin Bock

    2015-08-01

    Full Text Available Speech accents can possess valuable information about the speaker, and can be used in intelligent multimedia-based human-computer interfaces. The performance of algorithms for automatic classification of accents is often evaluated using audio datasets that include recording samples of different people, representing different accents. Here we describe a method that can detect bias in accent datasets, and apply the method to two accent identification datasets to reveal the existence of dataset bias, meaning that the datasets can be classified with accuracy higher than random even if the tested algorithm has no ability to analyze speech accent. We used the datasets by separating one second of silence from the beginning of each audio sample, such that the one-second sample did not contain voice, and therefore no information about the accent. An audio classification method was then applied to the datasets of silent audio samples, and provided classification accuracy significantly higher than random. These results indicate that the performance of accent classification algorithms measured using some accent classification benchmarks can be biased, and can be driven by differences in the background noise rather than the auditory features of the accents.

  13. Automatic speech recognition for report generation in computed tomography

    Purpose: A study was performed to compare the performance of automatic speech recognition (ASR) with conventional transcription. Materials and Methods: 100 CT reports were generated by using ASR and 100 CT reports were dictated and written by medical transcriptionists. The time for dictation and correction of errors by the radiologist was assessed and the type of mistakes was analysed. The text recognition rate was calculated in both groups and the average time between completion of the imaging study by the technologist and generation of the written report was assessed. A commercially available speech recognition technology (ASKA Software, IBM Via Voice) running of a personal computer was used. Results: The time for the dictation using digital voice recognition was 9.4±2.3 min compared to 4.5±3.6 min with an ordinary Dictaphone. The text recognition rate was 97% with digital voice recognition and 99% with medical transcriptionists. The average time from imaging completion to written report finalisation was reduced from 47.3 hours with medical transcriptionists to 12.7 hours with ASR. The analysis of misspellings demonstrated (ASR vs. medical transcriptionists): 3 vs. 4 for syntax errors, 0 vs. 37 orthographic mistakes, 16 vs. 22 mistakes in substance and 47 vs. erroneously applied terms. Conclusions: The use of digital voice recognition as a replacement for medical transcription is recommendable when an immediate availability of written reports is necessary. (orig.)

  14. Difficulties in Automatic Speech Recognition of Dysarthric Speakers and Implications for Speech-Based Applications Used by the Elderly: A Literature Review

    Young, Victoria; Mihailidis, Alex

    2010-01-01

    Despite their growing presence in home computer applications and various telephony services, commercial automatic speech recognition technologies are still not easily employed by everyone; especially individuals with speech disorders. In addition, relatively little research has been conducted on automatic speech recognition performance with older…

  15. Can automatic speech transcripts be used for large scale TV stream description and structuring?

    Guinaudeau, Camille; Gravier, Guillaume; Sébillot, Pascale

    2009-01-01

    International audience The increasing quantity of TV material requires methods to help users navigate such data streams. Automatically associating a short textual description with each program in a stream, is a first stage to navigating or structuring tasks. Speech contained in TV broadcasts--accessible by means of automatic speech recognition systems in the absence of closed caption--is a highly valuable semantic clue that might be used to link existing textual description such as program...

  16. Developing and Evaluating an Oral Skills Training Website Supported by Automatic Speech Recognition Technology

    Chen, Howard Hao-Jan

    2011-01-01

    Oral communication ability has become increasingly important to many EFL students. Several commercial software programs based on automatic speech recognition (ASR) technologies are available but their prices are not affordable for many students. This paper will demonstrate how the Microsoft Speech Application Software Development Kit (SASDK), a…

  17. Man-system interface based on automatic speech recognition: integration to a virtual control desk

    This work reports the implementation of a man-system interface based on automatic speech recognition, and its integration to a virtual nuclear power plant control desk. The later is aimed to reproduce a real control desk using virtual reality technology, for operator training and ergonomic evaluation purpose. An automatic speech recognition system was developed to serve as a new interface with users, substituting computer keyboard and mouse. They can operate this virtual control desk in front of a computer monitor or a projection screen through spoken commands. The automatic speech recognition interface developed is based on a well-known signal processing technique named cepstral analysis, and on artificial neural networks. The speech recognition interface is described, along with its integration with the virtual control desk, and results are presented. (author)

  18. Noise robust automatic speech recognition with adaptive quantile based noise estimation and speech band emphasizing filter bank

    Bonde, Casper Stork; Graversen, Carina; Gregersen, Andreas Gregers;

    2005-01-01

    appearance of the speech signal which require noise robust voice activity detection and assumptions of stationary noise. However, both of these requirements are often not met and it is therefore of particular interest to investigate methods like the Quantile Based Noise Estimation (QBNE) mehtod which......An important topic in Automatic Speech Recognition (ASR) is to reduce the effect of noise, in particular when mismatch exists between the training and application conditions. Many noise robutness schemes within the feature processing domain use as a prerequisite a noise estimate prior to the...... estimates the noise during speech and non-speech sections without the use of a voice activity detector. While the standard QBNE-method uses a fixed pre-defined quantile accross all frequency bands, this paper suggests adaptive QBNE (AQBNE) which adapts the quantile individually to each frequency band...

  19. Machine learning based sample extraction for automatic speech recognition using dialectal Assamese speech.

    Agarwalla, Swapna; Sarma, Kandarpa Kumar

    2016-06-01

    Automatic Speaker Recognition (ASR) and related issues are continuously evolving as inseparable elements of Human Computer Interaction (HCI). With assimilation of emerging concepts like big data and Internet of Things (IoT) as extended elements of HCI, ASR techniques are found to be passing through a paradigm shift. Oflate, learning based techniques have started to receive greater attention from research communities related to ASR owing to the fact that former possess natural ability to mimic biological behavior and that way aids ASR modeling and processing. The current learning based ASR techniques are found to be evolving further with incorporation of big data, IoT like concepts. Here, in this paper, we report certain approaches based on machine learning (ML) used for extraction of relevant samples from big data space and apply them for ASR using certain soft computing techniques for Assamese speech with dialectal variations. A class of ML techniques comprising of the basic Artificial Neural Network (ANN) in feedforward (FF) and Deep Neural Network (DNN) forms using raw speech, extracted features and frequency domain forms are considered. The Multi Layer Perceptron (MLP) is configured with inputs in several forms to learn class information obtained using clustering and manual labeling. DNNs are also used to extract specific sentence types. Initially, from a large storage, relevant samples are selected and assimilated. Next, a few conventional methods are used for feature extraction of a few selected types. The features comprise of both spectral and prosodic types. These are applied to Recurrent Neural Network (RNN) and Fully Focused Time Delay Neural Network (FFTDNN) structures to evaluate their performance in recognizing mood, dialect, speaker and gender variations in dialectal Assamese speech. The system is tested under several background noise conditions by considering the recognition rates (obtained using confusion matrices and manually) and computation time

  20. Physiologically Motivated Feature Extraction for Robust Automatic Speech Recognition

    Ibrahim Missaoui; Zied Lachiri

    2016-01-01

    In this paper, a new method is presented to extract robust speech features in the presence of the external noise. The proposed method based on two-dimensional Gabor filters takes in account the spectro-temporal modulation frequencies and also limits the redundancy on the feature level. The performance of the proposed feature extraction method was evaluated on isolated speech words which are extracted from TIMIT corpus and corrupted by background noise. The evaluation results demonstrate that ...

  1. Correcting Automatic Speech Recognition Errors in Real Time

    Wald, M; Boulain, P; Bell, J.; Doody, K; Gerrard, J

    2007-01-01

    Lectures can be digitally recorded and replayed to provide multimedia revision material for students who attended the class and a substitute learning experience for students unable to attend. Deaf and hard of hearing people can find it difficult to follow speech through hearing alone or to take notes while they are lip-reading or watching a sign-language interpreter. Synchronising the speech with text captions can ensure deaf students are not disadvantaged and assist all learners to search fo...

  2. Studies in automatic speech recognition and its application in aerospace

    Taylor, Michael Robinson

    Human communication is characterized in terms of the spectral and temporal dimensions of speech waveforms. Electronic speech recognition strategies based on Dynamic Time Warping and Markov Model algorithms are described and typical digit recognition error rates are tabulated. The application of Direct Voice Input (DVI) as an interface between man and machine is explored within the context of civil and military aerospace programmes. Sources of physical and emotional stress affecting speech production within military high performance aircraft are identified. Experimental results are reported which quantify fundamental frequency and coarse temporal dimensions of male speech as a function of the vibration, linear acceleration and noise levels typical of aerospace environments; preliminary indications of acoustic phonetic variability reported by other researchers are summarized. Connected whole-word pattern recognition error rates are presented for digits spoken under controlled Gz sinusoidal whole-body vibration. Correlations are made between significant increases in recognition error rate and resonance of the abdomen-thorax and head subsystems of the body. The phenomenon of vibrato style speech produced under low frequency whole-body Gz vibration is also examined. Interactive DVI system architectures and avionic data bus integration concepts are outlined together with design procedures for the efficient development of pilot-vehicle command and control protocols.

  3. Physiologically Motivated Feature Extraction for Robust Automatic Speech Recognition

    Ibrahim Missaoui

    2016-04-01

    Full Text Available In this paper, a new method is presented to extract robust speech features in the presence of the external noise. The proposed method based on two-dimensional Gabor filters takes in account the spectro-temporal modulation frequencies and also limits the redundancy on the feature level. The performance of the proposed feature extraction method was evaluated on isolated speech words which are extracted from TIMIT corpus and corrupted by background noise. The evaluation results demonstrate that the proposed feature extraction method outperforms the classic methods such as Perceptual Linear Prediction, Linear Predictive Coding, Linear Prediction Cepstral coefficients and Mel Frequency Cepstral Coefficients.

  4. Evaluating Automatic Speech Recognition-Based Language Learning Systems: A Case Study

    van Doremalen, Joost; Boves, Lou; Colpaert, Jozef; Cucchiarini, Catia; Strik, Helmer

    2016-01-01

    The purpose of this research was to evaluate a prototype of an automatic speech recognition (ASR)-based language learning system that provides feedback on different aspects of speaking performance (pronunciation, morphology and syntax) to students of Dutch as a second language. We carried out usability reviews, expert reviews and user tests to…

  5. Fusing Eye-gaze and Speech Recognition for Tracking in an Automatic Reading Tutor

    Rasmussen, Morten Højfeldt; Tan, Zheng-Hua

    2013-01-01

    In this paper we present a novel approach for automatically tracking the reading progress using a combination of eye-gaze tracking and speech recognition. The two are fused by first generating word probabilities based on eye-gaze information and then using these probabilities to augment the...

  6. Assessment of Severe Apnoea through Voice Analysis, Automatic Speech, and Speaker Recognition Techniques

    Fernández Pozo, Rubén; Blanco Murillo, Jose Luis; Hernández Gómez, Luis; López Gonzalo, Eduardo; Alcázar Ramírez, José; Toledano, Doroteo T.

    2009-12-01

    This study is part of an ongoing collaborative effort between the medical and the signal processing communities to promote research on applying standard Automatic Speech Recognition (ASR) techniques for the automatic diagnosis of patients with severe obstructive sleep apnoea (OSA). Early detection of severe apnoea cases is important so that patients can receive early treatment. Effective ASR-based detection could dramatically cut medical testing time. Working with a carefully designed speech database of healthy and apnoea subjects, we describe an acoustic search for distinctive apnoea voice characteristics. We also study abnormal nasalization in OSA patients by modelling vowels in nasal and nonnasal phonetic contexts using Gaussian Mixture Model (GMM) pattern recognition on speech spectra. Finally, we present experimental findings regarding the discriminative power of GMMs applied to severe apnoea detection. We have achieved an 81% correct classification rate, which is very promising and underpins the interest in this line of inquiry.

  7. Objective automatic assessment of rehabilitative speech treatment in Parkinson's disease

    Tsanas, A; Little, M.A.; Fox, C.; Ramig, L O

    2014-01-01

    Vocal performance degradation is a common symptom for the vast majority of Parkinson's disease (PD) subjects, who typically follow personalized one-to-one periodic rehabilitation meetings with speech experts over a long-term period. Recently, a novel computer program called Lee Silverman voice treatment (LSVT) Companion was developed to allow PD subjects to independently progress through a rehabilitative treatment session. This study is part of the assessment of the LSVT Companion, aiming to ...

  8. Automatic classification and accurate size measurement of blank mask defects

    Bhamidipati, Samir; Paninjath, Sankaranarayanan; Pereira, Mark; Buck, Peter

    2015-07-01

    A blank mask and its preparation stages, such as cleaning or resist coating, play an important role in the eventual yield obtained by using it. Blank mask defects' impact analysis directly depends on the amount of available information such as the number of defects observed, their accurate locations and sizes. Mask usability qualification at the start of the preparation process, is crudely based on number of defects. Similarly, defect information such as size is sought to estimate eventual defect printability on the wafer. Tracking of defect characteristics, specifically size and shape, across multiple stages, can further be indicative of process related information such as cleaning or coating process efficiencies. At the first level, inspection machines address the requirement of defect characterization by detecting and reporting relevant defect information. The analysis of this information though is still largely a manual process. With advancing technology nodes and reducing half-pitch sizes, a large number of defects are observed; and the detailed knowledge associated, make manual defect review process an arduous task, in addition to adding sensitivity to human errors. Cases where defect information reported by inspection machine is not sufficient, mask shops rely on other tools. Use of CDSEM tools is one such option. However, these additional steps translate into increased costs. Calibre NxDAT based MDPAutoClassify tool provides an automated software alternative to the manual defect review process. Working on defect images generated by inspection machines, the tool extracts and reports additional information such as defect location, useful for defect avoidance[4][5]; defect size, useful in estimating defect printability; and, defect nature e.g. particle, scratch, resist void, etc., useful for process monitoring. The tool makes use of smart and elaborate post-processing algorithms to achieve this. Their elaborateness is a consequence of the variety and

  9. Post-error Correction in Automatic Speech Recognition Using Discourse Information

    KANG, S.

    2014-05-01

    Full Text Available Overcoming speech recognition errors in the field of human�computer interaction is important in ensuring a consistent user experience. This paper proposes a semantic-oriented post-processing approach for the correction of errors in speech recognition. The novelty of the model proposed here is that it re-ranks the n-best hypothesis of speech recognition based on the user's intention, which is analyzed from previous discourse information, while conventional automatic speech recognition systems focus only on acoustic and language model scores for the current sentence. The proposed model successfully reduces the word error rate and semantic error rate by 3.65% and 8.61%, respectively.

  10. Suprasegmental Duration Modelling with Elastic Constraints in Automatic Speech Recognition

    Molloy, Laurence; Isard, Stephen

    1998-01-01

    In this paper a method of integrating a model of suprasegmental duration with a HMM-based recogniser at the post-processing level is presented. The N-Best utterance output is rescored using a suitable linear combination of acoustic log-likelihood (provided by a set of tied-state triphone HMMs) and duration log-likelihood (provided by a set of durational models). The durational model used in the post-processing imposes syllable-level elastic constraints on the durational behaviour of speech se...

  11. Arabic Language Learning Assisted by Computer, based on Automatic Speech Recognition

    Terbeh, Naim

    2012-01-01

    This work consists of creating a system of the Computer Assisted Language Learning (CALL) based on a system of Automatic Speech Recognition (ASR) for the Arabic language using the tool CMU Sphinx3 [1], based on the approach of HMM. To this work, we have constructed a corpus of six hours of speech recordings with a number of nine speakers. we find in the robustness to noise a grounds for the choice of the HMM approach [2]. the results achieved are encouraging since our corpus is made by only nine speakers, but they are always reasons that open the door for other improvement works.

  12. Development an Automatic Speech to Facial Animation Conversion for Improve Deaf Lives

    S. Hamidreza Kasaei

    2011-05-01

    Full Text Available In this paper, we propose design and initial implementation of a robust system which can automatically translates voice into text and text to sign language animations. Sign Language
    Translation Systems could significantly improve deaf lives especially in communications, exchange of information and employment of machine for translation conversations from one language to another has. Therefore, considering these points, it seems necessary to study the speech recognition. Usually, the voice recognition algorithms address three major challenges. The first is extracting feature form speech and the second is when limited sound gallery are available for recognition, and the final challenge is to improve speaker dependent to speaker independent voice recognition. Extracting feature form speech is an important stage in our method. Different procedures are available for extracting feature form speech. One of the commonest of which used in speech
    recognition systems is Mel-Frequency Cepstral Coefficients (MFCCs. The algorithm starts with preprocessing and signal conditioning. Next extracting feature form speech using Cepstral coefficients will be done. Then the result of this process sends to segmentation part. Finally recognition part recognizes the words and then converting word recognized to facial animation. The project is still in progress and some new interesting methods are described in the current report.

  13. Automatic analysis of slips of the tongue: Insights into the cognitive architecture of speech production.

    Goldrick, Matthew; Keshet, Joseph; Gustafson, Erin; Heller, Jordana; Needle, Jeremy

    2016-04-01

    Traces of the cognitive mechanisms underlying speaking can be found within subtle variations in how we pronounce sounds. While speech errors have traditionally been seen as categorical substitutions of one sound for another, acoustic/articulatory analyses show they partially reflect the intended sound. When "pig" is mispronounced as "big," the resulting /b/ sound differs from correct productions of "big," moving towards intended "pig"-revealing the role of graded sound representations in speech production. Investigating the origins of such phenomena requires detailed estimation of speech sound distributions; this has been hampered by reliance on subjective, labor-intensive manual annotation. Computational methods can address these issues by providing for objective, automatic measurements. We develop a novel high-precision computational approach, based on a set of machine learning algorithms, for measurement of elicited speech. The algorithms are trained on existing manually labeled data to detect and locate linguistically relevant acoustic properties with high accuracy. Our approach is robust, is designed to handle mis-productions, and overall matches the performance of expert coders. It allows us to analyze a very large dataset of speech errors (containing far more errors than the total in the existing literature), illuminating properties of speech sound distributions previously impossible to reliably observe. We argue that this provides novel evidence that two sources both contribute to deviations in speech errors: planning processes specifying the targets of articulation and articulatory processes specifying the motor movements that execute this plan. These findings illustrate how a much richer picture of speech provides an opportunity to gain novel insights into language processing. PMID:26779665

  14. Automatic evaluation of speech rhythm instability and acceleration in dysarthrias associated with basal ganglia dysfunction

    Jan eRusz

    2015-07-01

    Full Text Available Speech rhythm abnormalities are commonly present in patients with different neurodegenerative disorders. These alterations are hypothesized to be a consequence of disruption to the basal ganglia circuitry involving dysfunction of motor planning, programming and execution, which can be detected by a syllable repetition paradigm. Therefore, the aim of the present study was to design a robust signal processing technique that allows the automatic detection of spectrally-distinctive nuclei of syllable vocalizations and to determine speech features that represent rhythm instability and acceleration. A further aim was to elucidate specific patterns of dysrhythmia across various neurodegenerative disorders that share disruption of basal ganglia function. Speech samples based on repetition of the syllable /pa/ at a self-determined steady pace were acquired from 109 subjects, including 22 with Parkinson's disease (PD, 11 progressive supranuclear palsy (PSP, 9 multiple system atrophy (MSA, 24 ephedrone-induced parkinsonism (EP, 20 Huntington's disease (HD, and 23 healthy controls. Subsequently, an algorithm for the automatic detection of syllables as well as features representing rhythm instability and rhythm acceleration were designed. The proposed detection algorithm was able to correctly identify syllables and remove erroneous detections due to excessive inspiration and nonspeech sounds with a very high accuracy of 99.6%. Instability of vocal pace performance was observed in PSP, MSA, EP and HD groups. Significantly increased pace acceleration was observed only in the PD group. Although not significant, a tendency for pace acceleration was observed also in the PSP and MSA groups. Our findings underline the crucial role of the basal ganglia in the execution and maintenance of automatic speech motor sequences. We envisage the current approach to become the first step towards the development of acoustic technologies allowing automated assessment of rhythm

  15. An exploration of the potential of Automatic Speech Recognition to assist and enable receptive communication in higher education

    Mike Wald

    2006-12-01

    Full Text Available The potential use of Automatic Speech Recognition to assist receptive communication is explored. The opportunities and challenges that this technology presents students and staff to provide captioning of speech online or in classrooms for deaf or hard of hearing students and assist blind, visually impaired or dyslexic learners to read and search learning material more readily by augmenting synthetic speech with natural recorded real speech is also discussed and evaluated. The automatic provision of online lecture notes, synchronised with speech, enables staff and students to focus on learning and teaching issues, while also benefiting learners unable to attend the lecture or who find it difficult or impossible to take notes at the same time as listening, watching and thinking.

  16. A HYBRID METHOD FOR AUTOMATIC SPEECH RECOGNITION PERFORMANCE IMPROVEMENT IN REAL WORLD NOISY ENVIRONMENT

    Urmila Shrawankar

    2013-01-01

    Full Text Available It is a well known fact that, speech recognition systems perform well when the system is used in conditions similar to the one used to train the acoustic models. However, mismatches degrade the performance. In adverse environment, it is very difficult to predict the category of noise in advance in case of real world environmental noise and difficult to achieve environmental robustness. After doing rigorous experimental study it is observed that, a unique method is not available that will clean the noisy speech as well as preserve the quality which have been corrupted by real natural environmental (mixed noise. It is also observed that only back-end techniques are not sufficient to improve the performance of a speech recognition system. It is necessary to implement performance improvement techniques at every step of back-end as well as front-end of the Automatic Speech Recognition (ASR model. Current recognition systems solve this problem using a technique called adaptation. This study presents an experimental study that aims two points, first is to implement the hybrid method that will take care of clarifying the speech signal as much as possible with all combinations of filters and enhancement techniques. The second point is to develop a method for training all categories of noise that can adapt the acoustic models for a new environment that will help to improve the performance of the speech recognizer under real world environmental mismatched conditions. This experiment confirms that hybrid adaptation methods improve the ASR performance on both levels, (Signal-to-Noise Ratio SNR improvement as well as word recognition accuracy in real world noisy environment.

  17. Automatic transcription of continuous speech into syllable-like units for Indian languages

    G Lakshmi Sarada; A Lakshmi; Hema A Murthy; T Nagarajan

    2009-04-01

    The focus of this paper is to automatically segment and label continuous speech signal into syllable-like units for Indian languages. In this approach, the continuous speech signal is first automatically segmented into syllable-like units using group delay based algorithm. Similar syllable segments are then grouped together using an unsupervised and incremental training (UIT) technique. Isolated style HMM models are generated for each of the clusters during training. During testing, the speech signal is segmented into syllable-like units which are then tested against the HMMs obtained during training. This results in a syllable recognition performance of 42·6% and 39·94% for Tamil and Telugu. A new feature extraction technique that uses features extracted from multiple frame sizes and frame rates during both training and testing is explored for the syllable recognition task. This results in a recognition performance of 48·7% and 45·36%, for Tamil and Telugu respectively. The performance of segmentation followed by labelling is superior to that of a flat start syllable recogniser (27·8% and 28·8% for Tamil and Telugu respectively).

  18. A perception system for accurate automatic control of an articulated bus

    Salinas, Carlota; Montes, Héctor; Armada, Manuel

    2010-01-01

    This paper describes the perception system for an automatic articulated bus where an accurate tracking trajectory is desired. Among the most promising transport infrastructures of the autonomous or semi-autonomous transportation systems, the articulated bus is an interesting low cost and friendly option. This platform involves a mobile vehicle and a private circuit inside CSIC premises. The perception system, presented in this work, based on 2D laser scanner as a prime sensor generates local ...

  19. Deformable meshes for medical image segmentation accurate automatic segmentation of anatomical structures

    Kainmueller, Dagmar

    2014-01-01

    ? Segmentation of anatomical structures in medical image data is an essential task in clinical practice. Dagmar Kainmueller introduces methods for accurate fully automatic segmentation of anatomical structures in 3D medical image data. The author's core methodological contribution is a novel deformation model that overcomes limitations of state-of-the-art Deformable Surface approaches, hence allowing for accurate segmentation of tip- and ridge-shaped features of anatomical structures. As for practical contributions, she proposes application-specific segmentation pipelines for a range of anatom

  20. Towards an Intelligent Acoustic Front End for Automatic Speech Recognition: Built-in Speaker Normalization

    Umit H. Yapanel

    2008-08-01

    Full Text Available A proven method for achieving effective automatic speech recognition (ASR due to speaker differences is to perform acoustic feature speaker normalization. More effective speaker normalization methods are needed which require limited computing resources for real-time performance. The most popular speaker normalization technique is vocal-tract length normalization (VTLN, despite the fact that it is computationally expensive. In this study, we propose a novel online VTLN algorithm entitled built-in speaker normalization (BISN, where normalization is performed on-the-fly within a newly proposed PMVDR acoustic front end. The novel algorithm aspect is that in conventional frontend processing with PMVDR and VTLN, two separating warping phases are needed; while in the proposed BISN method only one single speaker dependent warp is used to achieve both the PMVDR perceptual warp and VTLN warp simultaneously. This improved integration unifies the nonlinear warping performed in the front end and reduces simultaneously. This improved integration unifies the nonlinear warping performed in the front end and reduces computational requirements, thereby offering advantages for real-time ASR systems. Evaluations are performed for (i an in-car extended digit recognition task, where an on-the-fly BISN implementation reduces the relative word error rate (WER by 24%, and (ii for a diverse noisy speech task (SPINE 2, where the relative WER improvement was 9%, both relative to the baseline speaker normalization method.

  1. Towards an Intelligent Acoustic Front End for Automatic Speech Recognition: Built-in Speaker Normalization

    Yapanel UmitH

    2008-01-01

    Full Text Available A proven method for achieving effective automatic speech recognition (ASR due to speaker differences is to perform acoustic feature speaker normalization. More effective speaker normalization methods are needed which require limited computing resources for real-time performance. The most popular speaker normalization technique is vocal-tract length normalization (VTLN, despite the fact that it is computationally expensive. In this study, we propose a novel online VTLN algorithm entitled built-in speaker normalization (BISN, where normalization is performed on-the-fly within a newly proposed PMVDR acoustic front end. The novel algorithm aspect is that in conventional frontend processing with PMVDR and VTLN, two separating warping phases are needed; while in the proposed BISN method only one single speaker dependent warp is used to achieve both the PMVDR perceptual warp and VTLN warp simultaneously. This improved integration unifies the nonlinear warping performed in the front end and reduces simultaneously. This improved integration unifies the nonlinear warping performed in the front end and reduces computational requirements, thereby offering advantages for real-time ASR systems. Evaluations are performed for (i an in-car extended digit recognition task, where an on-the-fly BISN implementation reduces the relative word error rate (WER by 24%, and (ii for a diverse noisy speech task (SPINE 2, where the relative WER improvement was 9%, both relative to the baseline speaker normalization method.

  2. Accurate and automatic extrinsic calibration method for blade measurement system integrated by different optical sensors

    He, Wantao; Li, Zhongwei; Zhong, Kai; Shi, Yusheng; Zhao, Can; Cheng, Xu

    2014-11-01

    Fast and precise 3D inspection system is in great demand in modern manufacturing processes. At present, the available sensors have their own pros and cons, and hardly exist an omnipotent sensor to handle the complex inspection task in an accurate and effective way. The prevailing solution is integrating multiple sensors and taking advantages of their strengths. For obtaining a holistic 3D profile, the data from different sensors should be registrated into a coherent coordinate system. However, some complex shape objects own thin wall feather such as blades, the ICP registration method would become unstable. Therefore, it is very important to calibrate the extrinsic parameters of each sensor in the integrated measurement system. This paper proposed an accurate and automatic extrinsic parameter calibration method for blade measurement system integrated by different optical sensors. In this system, fringe projection sensor (FPS) and conoscopic holography sensor (CHS) is integrated into a multi-axis motion platform, and the sensors can be optimally move to any desired position at the object's surface. In order to simple the calibration process, a special calibration artifact is designed according to the characteristics of the two sensors. An automatic registration procedure based on correlation and segmentation is used to realize the artifact datasets obtaining by FPS and CHS rough alignment without any manual operation and data pro-processing, and then the Generalized Gauss-Markoff model is used to estimate the optimization transformation parameters. The experiments show the measurement result of a blade, where several sampled patches are merged into one point cloud, and it verifies the performance of the proposed method.

  3. The effect of automatic gain control structure and release time on cochlear implant speech intelligibility.

    Phyu P Khing

    Full Text Available Nucleus cochlear implant systems incorporate a fast-acting front-end automatic gain control (AGC, sometimes called a compression limiter. The objective of the present study was to determine the effect of replacing the front-end compression limiter with a newly proposed envelope profile limiter. A secondary objective was to investigate the effect of AGC speed on cochlear implant speech intelligibility. The envelope profile limiter was located after the filter bank and reduced the gain when the largest of the filter bank envelopes exceeded the compression threshold. The compression threshold was set equal to the saturation level of the loudness growth function (i.e. the envelope level that mapped to the maximum comfortable current level, ensuring that no envelope clipping occurred. To preserve the spectral profile, the same gain was applied to all channels. Experiment 1 compared sentence recognition with the front-end limiter and with the envelope profile limiter, each with two release times (75 and 625 ms. Six implant recipients were tested in quiet and in four-talker babble noise, at a high presentation level of 89 dB SPL. Overall, release time had a larger effect than the AGC type. With both AGC types, speech intelligibility was lower for the 75 ms release time than for the 625 ms release time. With the shorter release time, the envelope profile limiter provided higher group mean scores than the front-end limiter in quiet, but there was no significant difference in noise. Experiment 2 measured sentence recognition in noise as a function of presentation level, from 55 to 89 dB SPL. The envelope profile limiter with 625 ms release time yielded better scores than the front-end limiter with 75 ms release time. A take-home study showed no clear pattern of preferences. It is concluded that the envelope profile limiter is a feasible alternative to a front-end compression limiter.

  4. Morpho-syntactic post-processing of N-best lists for improved French automatic speech recognition

    Huet, Stéphane; Gravier, Guillaume; Sébillot, Pascale

    2010-01-01

    Abstract Many automatic speech recognition (ASR) systems rely on the sole pronunciation dictionaries and language models to take into account information about language. Implicitly, morphology and syntax are to a certain extent embedded in the language models but the richness of such linguistic knowledge is not exploited. This paper studies the use of morpho-syntactic (MS) information in a post-processing stage of an ASR system, by reordering N-best lists. Each sentence hypothesis ...

  5. A new automatic blood pressure kit auscultates for accurate reading with a smartphone

    Wu, Hongjun; Wang, Bingjian; Zhu, Xinpu; Chu, Guang; Zhang, Zhi

    2016-01-01

    Abstract The widely used oscillometric automated blood pressure (BP) monitor was continuously questioned on its accuracy. A novel BP kit named Accutension which adopted Korotkoff auscultation method was then devised. Accutension worked with a miniature microphone, a pressure sensor, and a smartphone. The BP values were automatically displayed on the smartphone screen through the installed App. Data recorded in the phone could be played back and reconfirmed after measurement. They could also be uploaded and saved to the iCloud. The accuracy and consistency of this novel electronic auscultatory sphygmomanometer was preliminarily verified here. Thirty-two subjects were included and 82 qualified readings were obtained. The mean differences ± SD for systolic and diastolic BP readings between Accutension and mercury sphygmomanometer were 0.87 ± 2.86 and −0.94 ± 2.93 mm Hg. Agreements between Accutension and mercury sphygmomanometer were highly significant for systolic (ICC = 0.993, 95% confidence interval (CI): 0.989–0.995) and diastolic (ICC = 0.987, 95% CI: 0.979–0.991). In conclusion, Accutension worked accurately based on our pilot study data. The difference was acceptable. ICC and Bland–Altman plot charts showed good agreements with manual measurements. Systolic readings of Accutension were slightly higher than those of manual measurement, while diastolic readings were slightly lower. One possible reason was that Accutension captured the first and the last korotkoff sound more sensitively than human ear during manual measurement and avoided sound missing, so that it might be more accurate than traditional mercury sphygmomanometer. By documenting and analyzing of variant tendency of BP values, Accutension helps management of hypertension and therefore contributes to the mobile heath service. PMID:27512876

  6. Robust Automatic Speech Recognition Features using Complex Wavelet Packet Transform Coefficients

    TjongWan Sen

    2009-11-01

    Full Text Available To improve the performance of phoneme based Automatic Speech Recognition (ASR in noisy environment; we developed a new technique that could add robustness to clean phonemes features. These robust features are obtained from Complex Wavelet Packet Transform (CWPT coefficients. Since the CWPT coefficients represent all different frequency bands of the input signal, decomposing the input signal into complete CWPT tree would also cover all frequencies involved in recognition process. For time overlapping signals with different frequency contents, e. g. phoneme signal with noises, its CWPT coefficients are the combination of CWPT coefficients of phoneme signal and CWPT coefficients of noises. The CWPT coefficients of phonemes signal would be changed according to frequency components contained in noises. Since the numbers of phonemes in every language are relatively small (limited and already well known, one could easily derive principal component vectors from clean training dataset using Principal Component Analysis (PCA. These principal component vectors could be used then to add robustness and minimize noises effects in testing phase. Simulation results, using Alpha Numeric 4 (AN4 from Carnegie Mellon University and NOISEX-92 examples from Rice University, showed that this new technique could be used as features extractor that improves the robustness of phoneme based ASR systems in various adverse noisy conditions and still preserves the performance in clean environments.

  7. Automatic speech recognizer based on the Spanish spoken in Valdivia, Chile

    Sanchez, Maria L.; Poblete, Victor H.; Sommerhoff, Jorge

    2001-05-01

    The performance of an automatic speech recognizer is affected by training process (dependent on or independent of the speaker) and the size of the vocabulary. The language used in this study was the Spanish spoken in the city of Valdivia, Chile. A representative sample of 14 students and six professionals all natives of Valdivia (ten women and ten men) were used to complete the study. The sample ranged in age between 20 and 30 years old. Two systems were programmed based on the classical principles: digitalizing, end point detection, linear prediction coding, cepstral coefficients, dynamic time warping, and a final decision stage with a previous step of training: (i) one dependent speaker (15 words: five colors and ten numbers), (ii) one independent speaker (30 words: ten verbs, ten nouns, and ten adjectives). A simple didactical application, with options to choose colors, numbers and drawings of the verbs, nouns and adjectives, was designed to be used with a personal computer. In both programs, the tests carried out showed a tendency towards errors in short words with monosyllables like ``flor,'' and ``sol.'' The best results were obtained in words with three syllables like ``disparar'' and ``mojado.'' [Work supported by Proyecto DID UACh N S-200278.

  8. Rapid and automatic speech-specific learning mechanism in human neocortex.

    Kimppa, Lilli; Kujala, Teija; Leminen, Alina; Vainio, Martti; Shtyrov, Yury

    2015-09-01

    A unique feature of human communication system is our ability to rapidly acquire new words and build large vocabularies. However, its neurobiological foundations remain largely unknown. In an electrophysiological study optimally designed to probe this rapid formation of new word memory circuits, we employed acoustically controlled novel word-forms incorporating native and non-native speech sounds, while manipulating the subjects' attention on the input. We found a robust index of neurolexical memory-trace formation: a rapid enhancement of the brain's activation elicited by novel words during a short (~30min) perceptual exposure, underpinned by fronto-temporal cortical networks, and, importantly, correlated with behavioural learning outcomes. Crucially, this neural memory trace build-up took place regardless of focused attention on the input or any pre-existing or learnt semantics. Furthermore, it was found only for stimuli with native-language phonology, but not for acoustically closely matching non-native words. These findings demonstrate a specialised cortical mechanism for rapid, automatic and phonology-dependent formation of neural word memory circuits. PMID:26074199

  9. Automatic Speech Recognition Using Template Model for Man-Machine Interface

    Mishra, Neema; Shrawankar, Urmila; Thakare, V. M

    2013-01-01

    Speech is a natural form of communication for human beings, and computers with the ability to understand speech and speak with a human voice are expected to contribute to the development of more natural man-machine interfaces. Computers with this kind of ability are gradually becoming a reality, through the evolution of speech recognition technologies. Speech is being an important mode of interaction with computers. In this paper Feature extraction is implemented using well-known Mel-Frequenc...

  10. Fast and Accurate Recurrent Neural Network Acoustic Models for Speech Recognition

    SAK, Haşim; Senior, Andrew; Rao, Kanishka; Beaufays, Françoise

    2015-01-01

    We have recently shown that deep Long Short-Term Memory (LSTM) recurrent neural networks (RNNs) outperform feed forward deep neural networks (DNNs) as acoustic models for speech recognition. More recently, we have shown that the performance of sequence trained context dependent (CD) hidden Markov model (HMM) acoustic models using such LSTM RNNs can be equaled by sequence trained phone models initialized with connectionist temporal classification (CTC). In this paper, we present techniques tha...

  11. Novel Techniques for Dialectal Arabic Speech Recognition

    Elmahdy, Mohamed; Minker, Wolfgang

    2012-01-01

    Novel Techniques for Dialectal Arabic Speech describes approaches to improve automatic speech recognition for dialectal Arabic. Since speech resources for dialectal Arabic speech recognition are very sparse, the authors describe how existing Modern Standard Arabic (MSA) speech data can be applied to dialectal Arabic speech recognition, while assuming that MSA is always a second language for all Arabic speakers. In this book, Egyptian Colloquial Arabic (ECA) has been chosen as a typical Arabic dialect. ECA is the first ranked Arabic dialect in terms of number of speakers, and a high quality ECA speech corpus with accurate phonetic transcription has been collected. MSA acoustic models were trained using news broadcast speech. In order to cross-lingually use MSA in dialectal Arabic speech recognition, the authors have normalized the phoneme sets for MSA and ECA. After this normalization, they have applied state-of-the-art acoustic model adaptation techniques like Maximum Likelihood Linear Regression (MLLR) and M...

  12. Subjective Quality Measurement of Speech Its Evaluation, Estimation and Applications

    Kondo, Kazuhiro

    2012-01-01

    It is becoming crucial to accurately estimate and monitor speech quality in various ambient environments to guarantee high quality speech communication. This practical hands-on book shows speech intelligibility measurement methods so that the readers can start measuring or estimating speech intelligibility of their own system. The book also introduces subjective and objective speech quality measures, and describes in detail speech intelligibility measurement methods. It introduces a diagnostic rhyme test which uses rhyming word-pairs, and includes: An investigation into the effect of word familiarity on speech intelligibility. Speech intelligibility measurement of localized speech in virtual 3-D acoustic space using the rhyme test. Estimation of speech intelligibility using objective measures, including the ITU standard PESQ measures, and automatic speech recognizers.

  13. Contribution to automatic speech recognition. Analysis of the direct acoustical signal. Recognition of isolated words and phoneme identification

    This report deals with the acoustical-phonetic step of the automatic recognition of the speech. The parameters used are the extrema of the acoustical signal (coded in amplitude and duration). This coding method, the properties of which are described, is simple and well adapted to a digital processing. The quality and the intelligibility of the coded signal after reconstruction are particularly satisfactory. An experiment for the automatic recognition of isolated words has been carried using this coding system. We have designed a filtering algorithm operating on the parameters of the coding. Thus the characteristics of the formants can be derived under certain conditions which are discussed. Using these characteristics the identification of a large part of the phonemes for a given speaker was achieved. Carrying on the studies has required the development of a particular methodology of real time processing which allowed immediate evaluation of the improvement of the programs. Such processing on temporal coding of the acoustical signal is extremely powerful and could represent, used in connection with other methods an efficient tool for the automatic processing of the speech.(author)

  14. Combining Statistical Parameteric Speech Synthesis and Unit-Selection for Automatic Voice Cloning

    Aylett, Matthew; Yamagishi, Junichi

    2008-01-01

    The ability to use the recorded audio of a subject's voice to produce an open-domain synthesis system has generated much interest both in academic research and in commercial speech technology. The ability to produce synthetic versions of a subjects voice has potential commercial applications, such as virtual celebrity actors, or potential clinical applications, such as offering a synthetic replacement voice in the case of a laryngectomy. Recent developments in HMM-based speech synthesis have ...

  15. Estimation of phoneme-specific HMM topologies for the automatic recognition of dysarthric speech.

    Caballero-Morales, Santiago-Omar

    2013-01-01

    Dysarthria is a frequently occurring motor speech disorder which can be caused by neurological trauma, cerebral palsy, or degenerative neurological diseases. Because dysarthria affects phonation, articulation, and prosody, spoken communication of dysarthric speakers gets seriously restricted, affecting their quality of life and confidence. Assistive technology has led to the development of speech applications to improve the spoken communication of dysarthric speakers. In this field, this paper presents an approach to improve the accuracy of HMM-based speech recognition systems. Because phonatory dysfunction is a main characteristic of dysarthric speech, the phonemes of a dysarthric speaker are affected at different levels. Thus, the approach consists in finding the most suitable type of HMM topology (Bakis, Ergodic) for each phoneme in the speaker's phonetic repertoire. The topology is further refined with a suitable number of states and Gaussian mixture components for acoustic modelling. This represents a difference when compared with studies where a single topology is assumed for all phonemes. Finding the suitable parameters (topology and mixtures components) is performed with a Genetic Algorithm (GA). Experiments with a well-known dysarthric speech database showed statistically significant improvements of the proposed approach when compared with the single topology approach, even for speakers with severe dysarthria. PMID:24222784

  16. Dynamic time warping applied to detection of confusable word pairs in automatic speech recognition

    Anguita Ortega, Jan; Hernando Pericás, Francisco Javier

    2005-01-01

    In this paper we present a rnethod to predict if two words are likely to be confused by an Autornatic SpeechRecognition (ASR) systern. This method is based on the c1assical Dynamic Time Warping (DTW) technique. This technique, which is usually used in ASR to measure the distance between two speech signals, is usedhere to calculate the distance between two words. With this distance the words are c1assified as confusable or not confusable using a threshold. We have te...

  17. Silent Speech Interfaces

    Denby, B; Schultz, T.; Honda, K.; Hueber, T.; Gilbert, J.M.; Brumberg, J.S.

    2010-01-01

    Abstract The possibility of speech processing in the absence of an intelligible acoustic signal has given rise to the idea of a `silent speech? interface, to be used as an aid for the speech handicapped, or as part of a communications system operating in silence-required or high-background-noise environments. The article first outlines the emergence of the silent speech interface from the fields of speech production, automatic speech processing, speech pathology research, and telec...

  18. Creating Accessible Educational Multimedia through Editing Automatic Speech Recognition Captioning in Real Time

    Wald, M

    2006-01-01

    Lectures can be digitally recorded and replayed to provide multimedia revision material for students who attended the class and a substitute learning experience for students unable to attend. Deaf and hard of hearing people can find it difficult to follow speech through hearing alone or to take notes while they are lip-reading or watching a sign-language interpreter. Notetakers can only summarise what is being said while qualified sign language interpreters with a good understanding of the re...

  19. Captioning for Deaf and Hard of Hearing People by Editing Automatic Speech Recognition in Real Time

    Wald, M

    2006-01-01

    Deaf and hard of hearing people can find it difficult to follow speech through hearing alone or to take notes when lip-reading or watching a sign-language interpreter. Notetakers summarise what is being said while qualified sign language interpreters with a good understanding of the relevant higher education subject content are in very scarce supply. Real time captioning/transcription is not normally available in UK higher education because of the shortage of real time stenographers. Lectures...

  20. Automatic and Accurate Conflation of Different Road-Network Vector Data towards Multi-Modal Navigation

    Meng Zhang

    2016-05-01

    Full Text Available With the rapid improvement of geospatial data acquisition and processing techniques, a variety of geospatial databases from public or private organizations have become available. Quite often, one dataset may be superior to other datasets in one, but not all aspects. In Germany, for instance, there were three major road network vector data, viz. Tele Atlas (which is now “TOMTOM”, NAVTEQ (which is now “here”, and ATKIS. However, none of them was qualified for the purpose of multi-modal navigation (e.g., driving + walking: Tele Atlas and NAVTEQ consist of comprehensive routing-relevant information, but many pedestrian ways are missing; ATKIS covers more pedestrian areas but the road objects are not fully attributed. To satisfy the requirements of multi-modal navigation, an automatic approach has been proposed to conflate different road networks together, which involves five routines: (a road-network matching between datasets; (b identification of the pedestrian ways; (c geometric transformation to eliminate geometric inconsistency; (d topologic remodeling of the conflated road network; and (e error checking and correction. The proposed approach demonstrates high performance in a number of large test areas and therefore has been successfully utilized for the real-world data production in the whole region of Germany. As a result, the conflated road network allows the multi-modal navigation of “driving + walking”.

  1. A FAST AND ACCURATE METHOD FOR AUTOMATIC CORONARY ARTERIAL TREE EXTRACTION IN ANGIOGRAMS

    Rohollah Moosavi Tayebi

    2014-01-01

    Full Text Available Coronary arterial tree extraction in angiograms is an essential component of each cardiac image processing system. Once physicians decide to check up coronary arteries from x-ray angiograms, extraction must be done precisely, fast, automatically and including whole arterial tree to help diagnosis or treatment during the cardiac surgical operation. This application is very helpful for the surgeon on deciding the target vessels prior to coronary artery bypass graft surgery. Some techniques and algorithms are proposed for extracting coronary arteries in angiograms. However, most of them suffer from some disadvantages such as time complexity, low accuracy, extracting only parts of main arteries instead of the full coronary arterial tree, need manual segmentation, appearance of artifacts and so forth. This study presents a new method for extracting whole coronary arterial tree in angiography images using Starlet wavelet transform. To this end, firstly we remove noise from raw angiograms and then sharpen the coronary arteries. Then coronary arterial tree is extracted by applying a modified Starlet wavelet transform and afterwards the residual noises and artifacts are cleaned. For evaluation, we measure proposed method performance on our created data set from 4932 Left Coronary Artery (LCA and Right Coronary Artery (RCA angiograms and compared with some state-of-the-art approaches. The proposed method shows much higher accuracy 96% for LCA and 97% for RCA, higher sensitivity 86% for LCA and 89% for RCA, higher specificity 98% for LCA and 99% for RCA and also higher precision 87% for LCA and 93% for RCA angiograms.

  2. Speech recognition and understanding

    Vintsyuk, T.K.

    1983-05-01

    This article discusses the automatic processing of speech signals with the aim of finding a sequence of works (speech recognition) or a concept (speech understanding) being transmitted by the speech signal. The goal of the research is to develop an automatic typewriter that will automatically edit and type text under voice control. A dynamic programming method is proposed in which all possible class signals are stored, after which the presented signal is compared to all the stored signals during the recognition phase. Topics considered include element-by-element recognition of words of speech, learning speech recognition, phoneme-by-phoneme speech recognition, the recognition of connected speech, understanding connected speech, and prospects for designing speech recognition and understanding systems. An application of the composition dynamic programming method for the solution of basic problems in the recognition and understanding of speech is presented.

  3. Fast, automatic, and accurate catheter reconstruction in HDR brachytherapy using an electromagnetic 3D tracking system

    Poulin, Eric; Racine, Emmanuel; Beaulieu, Luc, E-mail: Luc.Beaulieu@phy.ulaval.ca [Département de physique, de génie physique et d’optique et Centre de recherche sur le cancer de l’Université Laval, Université Laval, Québec, Québec G1V 0A6, Canada and Département de radio-oncologie et Axe Oncologie du Centre de recherche du CHU de Québec, CHU de Québec, 11 Côte du Palais, Québec, Québec G1R 2J6 (Canada); Binnekamp, Dirk [Integrated Clinical Solutions and Marketing, Philips Healthcare, Veenpluis 4-6, Best 5680 DA (Netherlands)

    2015-03-15

    Purpose: In high dose rate brachytherapy (HDR-B), current catheter reconstruction protocols are relatively slow and error prone. The purpose of this technical note is to evaluate the accuracy and the robustness of an electromagnetic (EM) tracking system for automated and real-time catheter reconstruction. Methods: For this preclinical study, a total of ten catheters were inserted in gelatin phantoms with different trajectories. Catheters were reconstructed using a 18G biopsy needle, used as an EM stylet and equipped with a miniaturized sensor, and the second generation Aurora{sup ®} Planar Field Generator from Northern Digital Inc. The Aurora EM system provides position and orientation value with precisions of 0.7 mm and 0.2°, respectively. Phantoms were also scanned using a μCT (GE Healthcare) and Philips Big Bore clinical computed tomography (CT) system with a spatial resolution of 89 μm and 2 mm, respectively. Reconstructions using the EM stylet were compared to μCT and CT. To assess the robustness of the EM reconstruction, five catheters were reconstructed twice and compared. Results: Reconstruction time for one catheter was 10 s, leading to a total reconstruction time inferior to 3 min for a typical 17-catheter implant. When compared to the μCT, the mean EM tip identification error was 0.69 ± 0.29 mm while the CT error was 1.08 ± 0.67 mm. The mean 3D distance error was found to be 0.66 ± 0.33 mm and 1.08 ± 0.72 mm for the EM and CT, respectively. EM 3D catheter trajectories were found to be more accurate. A maximum difference of less than 0.6 mm was found between successive EM reconstructions. Conclusions: The EM reconstruction was found to be more accurate and precise than the conventional methods used for catheter reconstruction in HDR-B. This approach can be applied to any type of catheters and applicators.

  4. Fast, automatic, and accurate catheter reconstruction in HDR brachytherapy using an electromagnetic 3D tracking system

    Purpose: In high dose rate brachytherapy (HDR-B), current catheter reconstruction protocols are relatively slow and error prone. The purpose of this technical note is to evaluate the accuracy and the robustness of an electromagnetic (EM) tracking system for automated and real-time catheter reconstruction. Methods: For this preclinical study, a total of ten catheters were inserted in gelatin phantoms with different trajectories. Catheters were reconstructed using a 18G biopsy needle, used as an EM stylet and equipped with a miniaturized sensor, and the second generation Aurora® Planar Field Generator from Northern Digital Inc. The Aurora EM system provides position and orientation value with precisions of 0.7 mm and 0.2°, respectively. Phantoms were also scanned using a μCT (GE Healthcare) and Philips Big Bore clinical computed tomography (CT) system with a spatial resolution of 89 μm and 2 mm, respectively. Reconstructions using the EM stylet were compared to μCT and CT. To assess the robustness of the EM reconstruction, five catheters were reconstructed twice and compared. Results: Reconstruction time for one catheter was 10 s, leading to a total reconstruction time inferior to 3 min for a typical 17-catheter implant. When compared to the μCT, the mean EM tip identification error was 0.69 ± 0.29 mm while the CT error was 1.08 ± 0.67 mm. The mean 3D distance error was found to be 0.66 ± 0.33 mm and 1.08 ± 0.72 mm for the EM and CT, respectively. EM 3D catheter trajectories were found to be more accurate. A maximum difference of less than 0.6 mm was found between successive EM reconstructions. Conclusions: The EM reconstruction was found to be more accurate and precise than the conventional methods used for catheter reconstruction in HDR-B. This approach can be applied to any type of catheters and applicators

  5. Using automatic speech processing to study French oral vowels Contributions du traitement automatique de la parole à l'étude des voyelles orales du français

    Martine Adda-Decker

    2009-10-01

    Full Text Available Automatic speech processing methods and tools can contribute to shedding light on many issues relating to phonemic variability in speech. The processing of huge amounts of speech thus allows to extract main tendencies, for which detailed interpretations then require both linguistic and methodological insights. The experimental study focuses on the variability of French oral vowels in the PFC and ESTER corpora, which are widely used both by linguists and researchers in automatic speech processing. Duration and formant measures allow to illustrate global variations depending on different parameters, which include speech style, syllable position and the speakers' regional origins. The last part addresses the phonetic realization of close-mid front vowels, using automatic classification in a Bayesian framework.

  6. Accurate and Fully Automatic Hippocampus Segmentation Using Subject-Specific 3D Optimal Local Maps Into a Hybrid Active Contour Model

    ZARPALAS, Dimitrios; Gkontra, Polyxeni; Daras, Petros; Maglaveras, Nicos

    2014-01-01

    Assessing the structural integrity of the hippocampus (HC) is an essential step toward prevention, diagnosis, and follow-up of various brain disorders due to the implication of the structural changes of the HC in those disorders. In this respect, the development of automatic segmentation methods that can accurately, reliably, and reproducibly segment the HC has attracted considerable attention over the past decades. This paper presents an innovative 3-D fully automatic method to be used on to...

  7. Optimizing Automatic Speech Recognition for Low-Proficient Non-Native Speakers

    Catia Cucchiarini

    2010-01-01

    Full Text Available Computer-Assisted Language Learning (CALL applications for improving the oral skills of low-proficient learners have to cope with non-native speech that is particularly challenging. Since unconstrained non-native ASR is still problematic, a possible solution is to elicit constrained responses from the learners. In this paper, we describe experiments aimed at selecting utterances from lists of responses. The first experiment on utterance selection indicates that the decoding process can be improved by optimizing the language model and the acoustic models, thus reducing the utterance error rate from 29–26% to 10–8%. Since giving feedback on incorrectly recognized utterances is confusing, we verify the correctness of the utterance before providing feedback. The results of the second experiment on utterance verification indicate that combining duration-related features with a likelihood ratio (LR yield an equal error rate (EER of 10.3%, which is significantly better than the EER for the other measures in isolation.

  8. Automatic Understanding of Spontaneous Arabic Speech --- A Numerical Model Compréhension automatique de la parole arabe spontanée --- Une modélisation numérique

    Anis Zouaghi

    2009-01-01

    Full Text Available This work is part of a large research project entitled "Oreillodule" aimed at developing tools for automatic speech recognition, translation, and synthesis for Arabic language. Our attention has mainly been focused on an attempt to present the semantic analyzer developed for the automatic comprehension of the standard spontaneous arabic speech. The findings on the effectiveness of the semantic decoder are quite satisfactory.

  9. Call recognition and individual identification of fish vocalizations based on automatic speech recognition: An example with the Lusitanian toadfish.

    Vieira, Manuel; Fonseca, Paulo J; Amorim, M Clara P; Teixeira, Carlos J C

    2015-12-01

    The study of acoustic communication in animals often requires not only the recognition of species specific acoustic signals but also the identification of individual subjects, all in a complex acoustic background. Moreover, when very long recordings are to be analyzed, automatic recognition and identification processes are invaluable tools to extract the relevant biological information. A pattern recognition methodology based on hidden Markov models is presented inspired by successful results obtained in the most widely known and complex acoustical communication signal: human speech. This methodology was applied here for the first time to the detection and recognition of fish acoustic signals, specifically in a stream of round-the-clock recordings of Lusitanian toadfish (Halobatrachus didactylus) in their natural estuarine habitat. The results show that this methodology is able not only to detect the mating sounds (boatwhistles) but also to identify individual male toadfish, reaching an identification rate of ca. 95%. Moreover this method also proved to be a powerful tool to assess signal durations in large data sets. However, the system failed in recognizing other sound types. PMID:26723348

  10. Ranking of predictor variables based on effect size criterion provides an accurate means of automatically classifying opinion column articles

    Legara, Erika Fille; Monterola, Christopher; Abundo, Cheryl

    2011-01-01

    We demonstrate an accurate procedure based on linear discriminant analysis that allows automatic authorship classification of opinion column articles. First, we extract the following stylometric features of 157 column articles from four authors: statistics on high frequency words, number of words per sentence, and number of sentences per paragraph. Then, by systematically ranking these features based on an effect size criterion, we show that we can achieve an average classification accuracy of 93% for the test set. In comparison, frequency size based ranking has an average accuracy of 80%. The highest possible average classification accuracy of our data merely relying on chance is ∼31%. By carrying out sensitivity analysis, we show that the effect size criterion is superior than frequency ranking because there exist low frequency words that significantly contribute to successful author discrimination. Consistent results are seen when the procedure is applied in classifying the undisputed Federalist papers of Alexander Hamilton and James Madison. To the best of our knowledge, the work is the first attempt in classifying opinion column articles, that by virtue of being shorter in length (as compared to novels or short stories), are more prone to over-fitting issues. The near perfect classification for the longer papers supports this claim. Our results provide an important insight on authorship attribution that has been overlooked in previous studies: that ranking discriminant variables based on word frequency counts is not necessarily an optimal procedure.

  11. Security and Hyper-accurate Positioning Monitoring with Automatic Dependent Surveillance-Broadcast (ADS-B) Project

    National Aeronautics and Space Administration — Lightning Ridge Technologies, working in collaboration with The Innovation Laboratory, Inc., extend Automatic Dependent Surveillance Broadcast (ADS-B) into a safe,...

  12. Security and Hyper-accurate Positioning Monitoring with Automatic Dependent Surveillance-Broadcast (ADS-B) Project

    National Aeronautics and Space Administration — Lightning Ridge Technologies, LLC, working in collaboration with The Innovation Laboratory, Inc., extend Automatic Dependent Surveillance ? Broadcast (ADS-B) into a...

  13. Automatic pose initialization for accurate 2D/3D registration applied to abdominal aortic aneurysm endovascular repair

    Miao, Shun; Lucas, Joseph; Liao, Rui

    2012-02-01

    Minimally invasive abdominal aortic aneurysm (AAA) stenting can be greatly facilitated by overlaying the preoperative 3-D model of the abdominal aorta onto the intra-operative 2-D X-ray images. Accurate 2-D/3-D registration in 3-D space makes the 2-D/3-D overlay robust to the change of C-Arm angulations. By far, the 2-D/3-D registration methods based on simulated X-ray projection images using multiple image planes have been shown to be able to provide satisfactory 3-D registration accuracy. However, one drawback of the intensity-based 2-D/3-D registration methods is that the similarity measure is usually highly non-convex and hence the optimizer can easily be trapped into local minima. User interaction therefore is often needed in the initialization of the position of the 3-D model in order to get a successful 2-D/3-D registration. In this paper, a novel 3-D pose initialization technique is proposed, as an extension of our previously proposed bi-plane 2-D/3-D registration method for AAA intervention [4]. The proposed method detects vessel bifurcation points and spine centerline in both 2-D and 3-D images, and utilizes landmark information to bring the 3-D volume into a 15mm capture range. The proposed landmark detection method was validated on real dataset, and is shown to be able to provide a good initialization for 2-D/3-D registration in [4], thus making the workflow fully automatic.

  14. A fully automatic tool to perform accurate flood mapping by merging remote sensing imagery and ancillary data

    D'Addabbo, Annarita; Refice, Alberto; Lovergine, Francesco; Pasquariello, Guido

    2016-04-01

    Flooding is one of the most frequent and expansive natural hazard. High-resolution flood mapping is an essential step in the monitoring and prevention of inundation hazard, both to gain insight into the processes involved in the generation of flooding events, and from the practical point of view of the precise assessment of inundated areas. Remote sensing data are recognized to be useful in this respect, thanks to the high resolution and regular revisit schedules of state-of-the-art satellites, moreover offering a synoptic overview of the extent of flooding. In particular, Synthetic Aperture Radar (SAR) data present several favorable characteristics for flood mapping, such as their relative insensitivity to the meteorological conditions during acquisitions, as well as the possibility of acquiring independently of solar illumination, thanks to the active nature of the radar sensors [1]. However, flood scenarios are typical examples of complex situations in which different factors have to be considered to provide accurate and robust interpretation of the situation on the ground: the presence of many land cover types, each one with a particular signature in presence of flood, requires modelling the behavior of different objects in the scene in order to associate them to flood or no flood conditions [2]. Generally, the fusion of multi-temporal, multi-sensor, multi-resolution and/or multi-platform Earth observation image data, together with other ancillary information, seems to have a key role in the pursuit of a consistent interpretation of complex scenes. In the case of flooding, distance from the river, terrain elevation, hydrologic information or some combination thereof can add useful information to remote sensing data. Suitable methods, able to manage and merge different kind of data, are so particularly needed. In this work, a fully automatic tool, based on Bayesian Networks (BNs) [3] and able to perform data fusion, is presented. It supplies flood maps

  15. SPEECH PROCESSING –AN OVERVIEW

    A.INDUMATHI

    2012-06-01

    Full Text Available One of the earliest goals of speech processing was coding speech for efficient transmission. Later, the research spread in various area like Automatic Speech Recognition (ASR, Speech Synthesis (TTS,Speech Enhancement, Automatic Language Translation (ALT.Initially, ASR is used to recognize single words in a small vocabulary, later many product was developed for continuous speech for large vocabulary.Speech Synthesis is used for synthesizing the speech corresponding to a given text Speech Synthesis provide a way to communicate for persons unable to speak. When Speech Synthesis used together withASR, it allows a complete two-way spoken interaction between humans and machines. Speech Enhancement technique is applied to improve the quality of speech signal. Automatic Language Translation helps toconvert one language into another language. Basic concept of speech processing is provided for beginners.

  16. Assessing the Performance of Automatic Speech Recognition Systems When Used by Native and Non-Native Speakers of Three Major Languages in Dictation Workflows

    Zapata, Julián; Kirkedal, Andreas Søeborg

    In this paper, we report on a two-part experiment aiming to assess and compare the performance of two types of automatic speech recognition (ASR) systems on two different computational platforms when used to augment dictation workflows. The experiment was performed with a sample of speakers of...... three major languages and with different linguistic profiles: non-native English speakers; non-native French speakers; and native Spanish speakers. The main objective of this experiment is to examine ASR performance in translation dictation (TD) and medical dictation (MD) workflows without manual...... transcription vs. with transcription. We discuss the advantages and drawbacks of a particular ASR approach in different computational platforms when used by various speakers of a given language, who may have different accents and levels of proficiency in that language, and who may have different levels of...

  17. Full automatic fiducial marker detection on coil arrays for accurate instrumentation placement during MRI guided breast interventions

    Filippatos, Konstantinos; Boehler, Tobias; Geisler, Benjamin; Zachmann, Harald; Twellmann, Thorsten

    2010-02-01

    With its high sensitivity, dynamic contrast-enhanced MR imaging (DCE-MRI) of the breast is today one of the first-line tools for early detection and diagnosis of breast cancer, particularly in the dense breast of young women. However, many relevant findings are very small or occult on targeted ultrasound images or mammography, so that MRI guided biopsy is the only option for a precise histological work-up [1]. State-of-the-art software tools for computer-aided diagnosis of breast cancer in DCE-MRI data offer also means for image-based planning of biopsy interventions. One step in the MRI guided biopsy workflow is the alignment of the patient position with the preoperative MR images. In these images, the location and orientation of the coil localization unit can be inferred from a number of fiducial markers, which for this purpose have to be manually or semi-automatically detected by the user. In this study, we propose a method for precise, full-automatic localization of fiducial markers, on which basis a virtual localization unit can be subsequently placed in the image volume for the purpose of determining the parameters for needle navigation. The method is based on adaptive thresholding for separating breast tissue from background followed by rigid registration of marker templates. In an evaluation of 25 clinical cases comprising 4 different commercial coil array models and 3 different MR imaging protocols, the method yielded a sensitivity of 0.96 at a false positive rate of 0.44 markers per case. The mean distance deviation between detected fiducial centers and ground truth information that was appointed from a radiologist was 0.94mm.

  18. Machines a Comprendre la Parole: Methodologie et Bilan de Recherche (Automatic Speech Recognition: Methodology and the State of the Research)

    Haton, Jean-Pierre

    1974-01-01

    Still no decisive result has been achieved in the automatic machine recognition of sentences of a natural language. Current research concentrates on developing algorithms for syntactic and semantic analysis. It is obvious that clues from all levels of perception have to be taken into account if a long term solution is ever to be found. (Author/MSE)

  19. Accurate and Fully Automatic Hippocampus Segmentation Using Subject-Specific 3D Optimal Local Maps Into a Hybrid Active Contour Model.

    Zarpalas, Dimitrios; Gkontra, Polyxeni; Daras, Petros; Maglaveras, Nicos

    2014-01-01

    Assessing the structural integrity of the hippocampus (HC) is an essential step toward prevention, diagnosis, and follow-up of various brain disorders due to the implication of the structural changes of the HC in those disorders. In this respect, the development of automatic segmentation methods that can accurately, reliably, and reproducibly segment the HC has attracted considerable attention over the past decades. This paper presents an innovative 3-D fully automatic method to be used on top of the multiatlas concept for the HC segmentation. The method is based on a subject-specific set of 3-D optimal local maps (OLMs) that locally control the influence of each energy term of a hybrid active contour model (ACM). The complete set of the OLMs for a set of training images is defined simultaneously via an optimization scheme. At the same time, the optimal ACM parameters are also calculated. Therefore, heuristic parameter fine-tuning is not required. Training OLMs are subsequently combined, by applying an extended multiatlas concept, to produce the OLMs that are anatomically more suitable to the test image. The proposed algorithm was tested on three different and publicly available data sets. Its accuracy was compared with that of state-of-the-art methods demonstrating the efficacy and robustness of the proposed method. PMID:27170866

  20. Study on automatic prediction of sentential stress for Chinese Putonghua Text-to-Speech system with natural style

    SHAO Yanqiu; HAN Jiqing; ZHAO Yongzhen; LIU Ting

    2007-01-01

    Stress is an important parameter for prosody processing in speech synthesis. In this paper, we compare the acoustic features of neutral tone syllables and strong stress syllables with moderate stress syllables, including pitch, syllable duration, intensity and pause length after syllable. The relation between duration and pitch, as well as the Third Tone (T3) and pitch are also studied. Three stress prediction models based on ANN, i.e. the acoustic model,the linguistic model and the mixed model, are presented for predicting Chinese sentential stress.The results show that the mixed model performs better than the other two models. In order to solve the problem of the diversity of manual labeling, an evaluation index of support ratio is proposed.

  1. A fast, accurate, and automatic 2D-3D image registration for image-guided cranial radiosurgery

    The authors developed a fast and accurate two-dimensional (2D)-three-dimensional (3D) image registration method to perform precise initial patient setup and frequent detection and correction for patient movement during image-guided cranial radiosurgery treatment. In this method, an approximate geometric relationship is first established to decompose a 3D rigid transformation in the 3D patient coordinate into in-plane transformations and out-of-plane rotations in two orthogonal 2D projections. Digitally reconstructed radiographs are generated offline from a preoperative computed tomography volume prior to treatment and used as the reference for patient position. A multiphase framework is designed to register the digitally reconstructed radiographs with the x-ray images periodically acquired during patient setup and treatment. The registration in each projection is performed independently; the results in the two projections are then combined and converted to a 3D rigid transformation by 2D-3D geometric backprojection. The in-plane transformation and the out-of-plane rotation are estimated using different search methods, including multiresolution matching, steepest descent minimization, and one-dimensional search. Two similarity measures, optimized pattern intensity and sum of squared difference, are applied at different registration phases to optimize accuracy and computation speed. Various experiments on an anthropomorphic head-and-neck phantom showed that, using fiducial registration as a gold standard, the registration errors were 0.33±0.16 mm (s.d.) in overall translation and 0.29 deg. ±0.11 deg. (s.d.) in overall rotation. The total targeting errors were 0.34±0.16 mm (s.d.), 0.40±0.2 mm (s.d.), and 0.51±0.26 mm (s.d.) for the targets at the distances of 2, 6, and 10 cm from the rotation center, respectively. The computation time was less than 3 s on a computer with an Intel Pentium 3.0 GHz dual processor

  2. Integranting prosodic information into a speech recogniser

    López Soto, María Teresa

    2001-01-01

    In the last decade there has been an increasing tendency to incorporate language engineering strategies into speech technology. This technique combines linguistic and mathematical information in different applications: machine translation, natural language processing, speech synthesis and automatic speech recognition (ASR). In the field of speech synthesis, this hybrid approach (linguistic and mathematical/statistical) has led to the design of efficient models for reproducin...

  3. Application of Perceptual Filtering Models to Noisy Speech Signals Enhancement

    Novlene Zoghlami

    2012-01-01

    Full Text Available This paper describes a new speech enhancement approach using perceptually based noise reduction. The proposed approach is based on the application of two perceptual filtering models to noisy speech signals: the gammatone and the gammachirp filter banks with nonlinear resolution according to the equivalent rectangular bandwidth (ERB scale. The perceptual filtering gives a number of subbands that are individually spectral weighted and modified according to two different noise suppression rules. The importance of an accurate noise estimate is related to the reduction of the musical noise artifacts in the processed speech that appears after classic subtractive process. In this context, we use continuous noise estimation algorithms. The performance of the proposed approach is evaluated on speech signals corrupted by real-world noises. Using objective tests based on the perceptual quality PESQ score and the quality rating of signal distortion (SIG, noise distortion (BAK and overall quality (OVRL, and subjective test based on the quality rating of automatic speech recognition (ASR, we demonstrate that our speech enhancement approach using filter banks modeling the human auditory system outperforms the conventional spectral modification algorithms to improve quality and intelligibility of the enhanced speech signal.

  4. Pattern recognition in speech and language processing

    Chou, Wu

    2003-01-01

    Minimum Classification Error (MSE) Approach in Pattern Recognition, Wu ChouMinimum Bayes-Risk Methods in Automatic Speech Recognition, Vaibhava Goel and William ByrneA Decision Theoretic Formulation for Adaptive and Robust Automatic Speech Recognition, Qiang HuoSpeech Pattern Recognition Using Neural Networks, Shigeru KatagiriLarge Vocabulary Speech Recognition Based on Statistical Methods, Jean-Luc GauvainToward Spontaneous Speech Recognition and Understanding, Sadaoki FuruiSpeaker Authentication, Qi Li and Biing-Hwang JuangHMMs for Language Processing Problems, Ri

  5. Speech Problems

    ... your treatment plan may include seeing a speech therapist , a person who is trained to treat speech disorders. How often you have to see the speech therapist will vary — you'll probably start out seeing ...

  6. Speech Segmentation Algorithm Based On Fuzzy Memberships

    Luis D. Huerta; Jose Antonio Huesca; Julio C. Contreras

    2010-01-01

    In this work, an automatic speech segmentation algorithm with text independency was implemented. In the algorithm, the use of fuzzy memberships on each characteristic in different speech sub-bands is proposed. Thus, the segmentation is performed a greater detail. Additionally, we tested with various speech signal frequencies and labeling, and we could observe how they affect the performance of the segmentation process in phonemes. The speech segmentation algorithm used is described. During th...

  7. Sparse representation in speech signal processing

    Lee, Te-Won; Jang, Gil-Jin; Kwon, Oh-Wook

    2003-11-01

    We review the sparse representation principle for processing speech signals. A transformation for encoding the speech signals is learned such that the resulting coefficients are as independent as possible. We use independent component analysis with an exponential prior to learn a statistical representation for speech signals. This representation leads to extremely sparse priors that can be used for encoding speech signals for a variety of purposes. We review applications of this method for speech feature extraction, automatic speech recognition and speaker identification. Furthermore, this method is also suited for tackling the difficult problem of separating two sounds given only a single microphone.

  8. Speech Development

    ... Spotlight Fundraising Ideas Vehicle Donation Volunteer Efforts Speech Development skip to submenu Parents & Individuals Information for Parents & Individuals Speech Development To download the PDF version of this factsheet, ...

  9. INTEGRATING MACHINE TRANSLATION AND SPEECH SYNTHESIS COMPONENT FOR ENGLISH TO DRAVIDIAN LANGUAGE SPEECH TO SPEECH TRANSLATION SYSTEM

    J. SANGEETHA

    2015-02-01

    Full Text Available This paper provides an interface between the machine translation and speech synthesis system for converting English speech to Tamil text in English to Tamil speech to speech translation system. The speech translation system consists of three modules: automatic speech recognition, machine translation and text to speech synthesis. Many procedures for incorporation of speech recognition and machine translation have been projected. Still speech synthesis system has not yet been measured. In this paper, we focus on integration of machine translation and speech synthesis, and report a subjective evaluation to investigate the impact of speech synthesis, machine translation and the integration of machine translation and speech synthesis components. Here we implement a hybrid machine translation (combination of rule based and statistical machine translation and concatenative syllable based speech synthesis technique. In order to retain the naturalness and intelligibility of synthesized speech Auto Associative Neural Network (AANN prosody prediction is used in this work. The results of this system investigation demonstrate that the naturalness and intelligibility of the synthesized speech are strongly influenced by the fluency and correctness of the translated text.

  10. GesRec3D: a real-time coded gesture-to-speech system with automatic segmentation and recognition thresholding using dissimilarity measures

    Craven, Michael P; Curtis, K. Mervyn

    2004-01-01

    A complete microcomputer system is described, GesRec3D, which facilitates the data acquisition, segmentation, learning, and recognition of 3-Dimensional arm gestures, with application as a Augmentative and Alternative Communication (AAC) aid for people with motor and speech disability. The gesture data is acquired from a Polhemus electro-magnetic tracker system, with sensors attached to the finger, wrist and elbow of one arm. Coded gestures are linked to user-defined text, to be spoken by a t...

  11. Feasibility of Technology Enabled Speech Disorder Screening.

    Duenser, Andreas; Ward, Lauren; Stefani, Alessandro; Smith, Daniel; Freyne, Jill; Morgan, Angela; Dodd, Barbara

    2016-01-01

    One in twenty Australian children suffers from a speech disorder. Early detection of such problems can significantly improve literacy and academic outcomes for these children, reduce health and educational burden and ongoing social costs. Here we present the development of a prototype and feasibility tests of a screening and decision support tool to assess speech disorders in young children. The prototype incorporates speech signal processing, machine learning and expert knowledge to automatically classify phonemes of normal and disordered speech. We discuss these results and our future work towards the development of a mobile tool to facilitate broad, early speech disorder screening by non-experts. PMID:27440284

  12. Survey On Speech Synthesis

    A. Indumathi

    2012-12-01

    Full Text Available The primary goal of this paper is to provide an overview of existing Text-To-Speech (TTS Techniques by highlighting its usage and advantage. First Generation Techniques includes Formant Synthesis and Articulatory Synthesis. Formant Synthesis works by using individually controllable formant filters, which can be set to produce accurate estimations of the vocal-track transfer function. Articulatory Synthesis produces speech by direct modeling of Human articulator behavior. Second Generation Techniques incorporates Concatenative synthesis and Sinusoidal synthesis. Concatenative synthesis generates speech output by concatenating the segments of recorded speech. Generally, Concatenative synthesis generates the natural sounding synthesized speech. Sinusoidal Synthesis use a harmonic model and decompose each frame into a set of harmonics of an estimated fundamental frequency. The model parameters are the amplitudes and periods of the harmonics. With these, the value of the fundamental can be changed while keeping the same basic spectral..In adding, Third Generation includes Hidden Markov Model (HMM and Unit Selection Synthesis.HMM trains the parameter module and produce high quality Speech. Finally, Unit Selection operates by selecting the best sequence of units from a large speech database which matches the specification.

  13. Automatic Speaker Recognition System

    Parul,R. B. Dubey

    2012-12-01

    Full Text Available Spoken language is used by human to convey many types of information. Primarily, speech convey message via words. Owing to advanced speech technologies, people's interactions with remote machines, such as phone banking, internet browsing, and secured information retrieval by voice, is becoming popular today. Speaker verification and speaker identification are important for authentication and verification in security purpose. Speaker identification methods can be divided into text independent and text-dependent. Speaker recognition is the process of automatically recognizing speaker voice on the basis of individual information included in the input speech waves. It consists of comparing a speech signal from an unknown speaker to a set of stored data of known speakers. This process recognizes who has spoken by matching input signal with pre- stored samples. The work is focussed to improve the performance of the speaker verification under noisy conditions.

  14. WE-A-17A-10: Fast, Automatic and Accurate Catheter Reconstruction in HDR Brachytherapy Using An Electromagnetic 3D Tracking System

    Poulin, E; Racine, E; Beaulieu, L [CHU de Quebec - Universite Laval, Quebec, Quebec (Canada); Binnekamp, D [Integrated Clinical Solutions and Marketing, Philips Healthcare, Best, DA (Netherlands)

    2014-06-15

    Purpose: In high dose rate brachytherapy (HDR-B), actual catheter reconstruction protocols are slow and errors prompt. The purpose of this study was to evaluate the accuracy and robustness of an electromagnetic (EM) tracking system for improved catheter reconstruction in HDR-B protocols. Methods: For this proof-of-principle, a total of 10 catheters were inserted in gelatin phantoms with different trajectories. Catheters were reconstructed using a Philips-design 18G biopsy needle (used as an EM stylet) and the second generation Aurora Planar Field Generator from Northern Digital Inc. The Aurora EM system exploits alternating current technology and generates 3D points at 40 Hz. Phantoms were also scanned using a μCT (GE Healthcare) and Philips Big Bore clinical CT system with a resolution of 0.089 mm and 2 mm, respectively. Reconstructions using the EM stylet were compared to μCT and CT. To assess the robustness of the EM reconstruction, 5 catheters were reconstructed twice and compared. Results: Reconstruction time for one catheter was 10 seconds or less. This would imply that for a typical clinical implant of 17 catheters, the total reconstruction time would be less than 3 minutes. When compared to the μCT, the mean EM tip identification error was 0.69 ± 0.29 mm while the CT error was 1.08 ± 0.67 mm. The mean 3D distance error was found to be 0.92 ± 0.37 mm and 1.74 ± 1.39 mm for the EM and CT, respectively. EM 3D catheter trajectories were found to be significantly more accurate (unpaired t-test, p < 0.05). A mean difference of less than 0.5 mm was found between successive EM reconstructions. Conclusion: The EM reconstruction was found to be faster, more accurate and more robust than the conventional methods used for catheter reconstruction in HDR-B. This approach can be applied to any type of catheters and applicators. We would like to disclose that the equipments, used in this study, is coming from a collaboration with Philips Medical.

  15. Robust speech recognition using articulatory information

    Kirchhoff, Katrin

    1999-01-01

    Current automatic speech recognition systems make use of a single source of information about their input, viz. a preprocessed form of the acoustic speech signal, which encodes the time-frequency distribution of signal energy. The goal of this thesis is to investigate the benefits of integrating articulatory information into state-of-the art speech recognizers, either as a genuine alternative to standard acoustic representations, or as an additional source of information. Articulatory informa...

  16. Annotating Speech Corpus for Prosody Modeling in Indian Language Text to Speech Systems

    Kiruthiga S

    2012-01-01

    Full Text Available A spoken language system, it may either be a speech synthesis or a speech recognition system, starts with building a speech corpora. We give a detailed survey of issues and a methodology that selects the appropriate speech unit in building a speech corpus for Indian language Text to Speech systems. The paper ultimately aims to improve the intelligibility of the synthesized speech in Text to Speech synthesis systems. To begin with, an appropriate text file should be selected for building the speech corpus. Then a corresponding speech file is generated and stored. This speech file is the phonetic representation of the selected text file. The speech file is processed in different levels viz., paragraphs, sentences, phrases, words, syllables and phones. These are called the speech units of the file. Researches have been done taking these units as the basic unit for processing. This paper analyses the researches done using phones, diphones, triphones, syllables and polysyllables as their basic unit for speech synthesis. The paper also provides a recommended set of combinations for polysyllables. Concatenative speech synthesis involves the concatenation of these basic units to synthesize an intelligent, natural sounding speech. The speech units are annotated with relevant prosodic information about each unit, manually or automatically, based on an algorithm. The database consisting of the units along with their annotated information is called as the annotated speech corpus. A Clustering technique is used in the annotated speech corpus that provides way to select the appropriate unit for concatenation, based on the lowest total join cost of the speech unit.

  17. Speech production in amplitude-modulated noise

    Macdonald, Ewen N; Raufer, Stefan

    2013-01-01

    The Lombard effect refers to the phenomenon where talkers automatically increase their level of speech in a noisy environment. While many studies have characterized how the Lombard effect influences different measures of speech production (e.g., F0, spectral tilt, etc.), few have investigated the...

  18. The role of speech in the user interface : perspective and application

    Abewusi, A.B.

    1994-01-01

    Consideration must be given to the implication of speech as a communication medium before deciding to use speech input or output in an interactive environment. There are several effective control strategies for improving the quality of speech. The utility of the speech has been demonstrated by application to several illustrative problems where their application has proved effective despite all the limitation of synthetic speech output and automatic speech recognition systems. (Résumé d'auteur)

  19. Current trends in multilingual speech processing

    Hervé Bourlard; John Dines; Mathew Magimai-Doss; Philip N Garner; David Imseng; Petr Motlicek; Hui Liang; Lakshmi Saheer; Fabio Valente

    2011-10-01

    In this paper, we describe recent work at Idiap Research Institute in the domain of multilingual speech processing and provide some insights into emerging challenges for the research community. Multilingual speech processing has been a topic of ongoing interest to the research community for many years and the field is now receiving renewed interest owing to two strong driving forces. Firstly, technical advances in speech recognition and synthesis are posing new challenges and opportunities to researchers. For example, discriminative features are seeing wide application by the speech recognition community, but additional issues arise when using such features in a multilingual setting. Another example is the apparent convergence of speech recognition and speech synthesis technologies in the form of statistical parametric methodologies. This convergence enables the investigation of new approaches to unified modelling for automatic speech recognition and text-to-speech synthesis (TTS) as well as cross-lingual speaker adaptation for TTS. The second driving force is the impetus being provided by both government and industry for technologies to help break down domestic and international language barriers, these also being barriers to the expansion of policy and commerce. Speech-to-speech and speech-to-text translation are thus emerging as key technologies at the heart of which lies multilingual speech processing.

  20. 汉语词性自动标注系统的设计与实现%The Design and Implementation of the Chinese Part-of-speech Automatic Tagging System

    王素格; 张水奎

    2001-01-01

    介绍了汉语词性自动标注系统的设计与实现.谊系统实现了统计与规则相结合的方法进行汉语词性自动标注.描述了该系统的总体结构,以及所使用的非兼类词表、兼类词表、标记集和词性标注规则的组织,特别时稀疏矩阵及其存储方法进行了详细的介绍.%In this paper, the Chinese part-of-speech automatic tagging system is presented, which has implemented statistics-based and rulebased tagging methods, introduced its whole structure and organized a series of word tables such as the ambiguous word table,nonambiguous word table,tag-set and POS tagging rules. Especially, the processing and storing methods of sparse matrix are described in more detail.

  1. Fully Automated Assessment of the Severity of Parkinson's Disease from Speech.

    Bayestehtashk, Alireza; Asgari, Meysam; Shafran, Izhak; McNames, James

    2015-01-01

    For several decades now, there has been sporadic interest in automatically characterizing the speech impairment due to Parkinson's disease (PD). Most early studies were confined to quantifying a few speech features that were easy to compute. More recent studies have adopted a machine learning approach where a large number of potential features are extracted and the models are learned automatically from the data. In the same vein, here we characterize the disease using a relatively large cohort of 168 subjects, collected from multiple (three) clinics. We elicited speech using three tasks - the sustained phonation task, the diadochokinetic task and a reading task, all within a time budget of 4 minutes, prompted by a portable device. From these recordings, we extracted 1582 features for each subject using openSMILE, a standard feature extraction tool. We compared the effectiveness of three strategies for learning a regularized regression and find that ridge regression performs better than lasso and support vector regression for our task. We refine the feature extraction to capture pitch-related cues, including jitter and shimmer, more accurately using a time-varying harmonic model of speech. Our results show that the severity of the disease can be inferred from speech with a mean absolute error of about 5.5, explaining 61% of the variance and consistently well-above chance across all clinics. Of the three speech elicitation tasks, we find that the reading task is significantly better at capturing cues than diadochokinetic or sustained phonation task. In all, we have demonstrated that the data collection and inference can be fully automated, and the results show that speech-based assessment has promising practical application in PD. The techniques reported here are more widely applicable to other paralinguistic tasks in clinical domain. PMID:25382935

  2. Speech Recognition on Mobile Devices

    Tan, Zheng-Hua; Lindberg, Børge

    2010-01-01

    The enthusiasm of deploying automatic speech recognition (ASR) on mobile devices is driven both by remarkable advances in ASR technology and by the demand for efficient user interfaces on such devices as mobile phones and personal digital assistants (PDAs). This chapter presents an overview of ASR...

  3. Epoch-based analysis of speech signals

    B Yegnanarayana; Suryakanth V Gangashetty

    2011-10-01

    Speech analysis is traditionally performed using short-time analysis to extract features in time and frequency domains. The window size for the analysis is fixed somewhat arbitrarily, mainly to account for the time varying vocal tract system during production. However, speech in its primary mode of excitation is produced due to impulse-like excitation in each glottal cycle. Anchoring the speech analysis around the glottal closure instants (epochs) yields significant benefits for speech analysis. Epoch-based analysis of speech helps not only to segment the speech signals based on speech production characteristics, but also helps in accurate analysis of speech. It enables extraction of important acoustic-phonetic features such as glottal vibrations, formants, instantaneous fundamental frequency, etc. Epoch sequence is useful to manipulate prosody in speech synthesis applications. Accurate estimation of epochs helps in characterizing voice quality features. Epoch extraction also helps in speech enhancement and multispeaker separation. In this tutorial article, the importance of epochs for speech analysis is discussed, and methods to extract the epoch information are reviewed. Applications of epoch extraction for some speech applications are demonstrated.

  4. Automatic translation among spoken languages

    Walter, Sharon M.; Costigan, Kelly

    1994-02-01

    The Machine Aided Voice Translation (MAVT) system was developed in response to the shortage of experienced military field interrogators with both foreign language proficiency and interrogation skills. Combining speech recognition, machine translation, and speech generation technologies, the MAVT accepts an interrogator's spoken English question and translates it into spoken Spanish. The spoken Spanish response of the potential informant can then be translated into spoken English. Potential military and civilian applications for automatic spoken language translation technology are discussed in this paper.

  5. Cued Speech: A visual communication mode for the Deaf society

    Heracleous, Panikos; Beautemps, Denis

    2010-01-01

    Cued Speech is a visual mode of communication that uses handshapes and placements in combination with the mouth movements of speech to make the phonemes of a spoken language look different from each other and clearly understandable to deaf individuals. The aim of Cued Speech is to overcome the problems of lip reading and thus enable deaf persons to wholly understand spoken language. In this study, automatic phoneme recognition in Cued Speech for French based on hidden Markov model (HMMs) is i...

  6. Perception of Speech Sounds in School-Aged Children with Speech Sound Disorders.

    Preston, Jonathan L; Irwin, Julia R; Turcios, Jacqueline

    2015-11-01

    Children with speech sound disorders may perceive speech differently than children with typical speech development. The nature of these speech differences is reviewed with an emphasis on assessing phoneme-specific perception for speech sounds that are produced in error. Category goodness judgment, or the ability to judge accurate and inaccurate tokens of speech sounds, plays an important role in phonological development. The software Speech Assessment and Interactive Learning System, which has been effectively used to assess preschoolers' ability to perform goodness judgments, is explored for school-aged children with residual speech errors (RSEs). However, data suggest that this particular task may not be sensitive to perceptual differences in school-aged children. The need for the development of clinical tools for assessment of speech perception in school-aged children with RSE is highlighted, and clinical suggestions are provided. PMID:26458198

  7. Speech characteristics in depression.

    Stassen, H H; Bomben, G; Günther, E

    1991-01-01

    This study examined the relationship between speech characteristics and psychopathology throughout the course of affective disturbances. Our sample comprised 20 depressive, hospitalized patients who had been selected according to the following criteria: (1) first admission; (2) long-term patient; (3) early entry into study; (4) late entry into study; (5) low scorer; (6) high scorer, and (7) distinct retarded-depressive symptomatology. Since our principal goal was to model the course of affective disturbances in terms of speech parameters, a total of 6 repeated measurements had been carried out over a 2-week period, including 3 different psychopathological instruments and speech recordings from automatic speech as well as from reading out loud. It turned out that neither applicability nor efficiency of single-parameter models depend in any way on the given, clinically defined subgroups. On the other hand, however, no significant differences between the clinically defined subgroups showed up with regard to basic speech parameters, except for the fact that low scorers seemed to take their time when producing utterances (this in contrast to all other patients who, on the average, had a considerably shorter recording time). As to the relationship between psychopathology and speech parameters over time, we found significant correlations: (1) in 60% of cases between the apathic syndrome and energy/dynamics; (2) in 50% of cases between the retarded-depressive syndrome and energy/dynamics; (3) in 45% of cases between the apathic syndrome and mean vocal pitch, and (4) in 71% of low scores between the somatic-depressive syndrome and time duration of pauses. All in all, single parameter models turned out to cover only specific aspects of the individual courses of affective disturbances, thus speaking against a simple approach which applies in general. PMID:1886971

  8. The Phase Spectra Based Feature for Robust Speech Recognition

    Abbasian ALI

    2009-07-01

    Full Text Available Speech recognition in adverse environment is one of the major issue in automatic speech recognition nowadays. While most current speech recognition system show to be highly efficient for ideal environment but their performance go down extremely when they are applied in real environment because of noise effected speech. In this paper a new feature representation based on phase spectra and Perceptual Linear Prediction (PLP has been suggested which can be used for robust speech recognition. It is shown that this new features can improve the performance of speech recognition not only in clean condition but also in various levels of noise condition when it is compared to PLP features.

  9. Speech coding

    Ravishankar, C., Hughes Network Systems, Germantown, MD

    1998-05-08

    Speech is the predominant means of communication between human beings and since the invention of the telephone by Alexander Graham Bell in 1876, speech services have remained to be the core service in almost all telecommunication systems. Original analog methods of telephony had the disadvantage of speech signal getting corrupted by noise, cross-talk and distortion Long haul transmissions which use repeaters to compensate for the loss in signal strength on transmission links also increase the associated noise and distortion. On the other hand digital transmission is relatively immune to noise, cross-talk and distortion primarily because of the capability to faithfully regenerate digital signal at each repeater purely based on a binary decision. Hence end-to-end performance of the digital link essentially becomes independent of the length and operating frequency bands of the link Hence from a transmission point of view digital transmission has been the preferred approach due to its higher immunity to noise. The need to carry digital speech became extremely important from a service provision point of view as well. Modem requirements have introduced the need for robust, flexible and secure services that can carry a multitude of signal types (such as voice, data and video) without a fundamental change in infrastructure. Such a requirement could not have been easily met without the advent of digital transmission systems, thereby requiring speech to be coded digitally. The term Speech Coding is often referred to techniques that represent or code speech signals either directly as a waveform or as a set of parameters by analyzing the speech signal. In either case, the codes are transmitted to the distant end where speech is reconstructed or synthesized using the received set of codes. A more generic term that is applicable to these techniques that is often interchangeably used with speech coding is the term voice coding. This term is more generic in the sense that the

  10. HUMAN SPEECH EMOTION RECOGNITION

    Maheshwari Selvaraj

    2016-02-01

    Full Text Available Emotions play an extremely important role in human mental life. It is a medium of expression of one’s perspective or one’s mental state to others. Speech Emotion Recognition (SER can be defined as extraction of the emotional state of the speaker from his or her speech signal. There are few universal emotions- including Neutral, Anger, Happiness, Sadness in which any intelligent system with finite computational resources can be trained to identify or synthesize as required. In this work spectral and prosodic features are used for speech emotion recognition because both of these features contain the emotional information. Mel-frequency cepstral coefficients (MFCC is one of the spectral features. Fundamental frequency, loudness, pitch and speech intensity and glottal parameters are the prosodic features which are used to model different emotions. The potential features are extracted from each utterance for the computational mapping between emotions and speech patterns. Pitch can be detected from the selected features, using which gender can be classified. Support Vector Machine (SVM, is used to classify the gender in this work. Radial Basis Function and Back Propagation Network is used to recognize the emotions based on the selected features, and proved that radial basis function produce more accurate results for emotion recognition than the back propagation network.

  11. Speech recognition from spectral dynamics

    Hynek Hermansky

    2011-10-01

    Information is carried in changes of a signal. The paper starts with revisiting Dudley’s concept of the carrier nature of speech. It points to its close connection to modulation spectra of speech and argues against short-term spectral envelopes as dominant carriers of the linguistic information in speech. The history of spectral representations of speech is briefly discussed. Some of the history of gradual infusion of the modulation spectrum concept into Automatic recognition of speech (ASR) comes next, pointing to the relationship of modulation spectrum processing to wellaccepted ASR techniques such as dynamic speech features or RelAtive SpecTrAl (RASTA) filtering. Next, the frequency domain perceptual linear prediction technique for deriving autoregressive models of temporal trajectories of spectral power in individual frequency bands is reviewed. Finally, posterior-based features, which allow for straightforward application of modulation frequency domain information, are described. The paper is tutorial in nature, aims at a historical global overview of attempts for using spectral dynamics in machine recognition of speech, and does not always provide enough detail of the described techniques. However, extensive references to earlier work are provided to compensate for the lack of detail in the paper.

  12. Automatic differentiation bibliography

    Corliss, G.F. (comp.)

    1992-07-01

    This is a bibliography of work related to automatic differentiation. Automatic differentiation is a technique for the fast, accurate propagation of derivative values using the chain rule. It is neither symbolic nor numeric. Automatic differentiation is a fundamental tool for scientific computation, with applications in optimization, nonlinear equations, nonlinear least squares approximation, stiff ordinary differential equation, partial differential equations, continuation methods, and sensitivity analysis. This report is an updated version of the bibliography which originally appeared in Automatic Differentiation of Algorithms: Theory, Implementation, and Application.

  13. Predicting Speech Intelligibility

    HINES, ANDREW

    2012-01-01

    Hearing impairment, and specifically sensorineural hearing loss, is an increasingly prevalent condition, especially amongst the ageing population. It occurs primarily as a result of damage to hair cells that act as sound receptors in the inner ear and causes a variety of hearing perception problems, most notably a reduction in speech intelligibility. Accurate diagnosis of hearing impairments is a time consuming process and is complicated by the reliance on indirect measurements based on patie...

  14. Speech enhancement with LSTM recurrent neural networks and its application to noise-robust ASR

    Weninger, Felix; Erdogan, Hakan; Watanabe, Shinji; Vincent, Emmanuel; Le Roux, Jonathan; Hershey, John R.; Schuller, Björn

    2015-01-01

    We evaluate some recent developments in recurrent neural network (RNN) based speech enhancement in the light of noise-robust automatic speech recognition (ASR). The proposed framework is based on Long Short-Term Memory (LSTM) RNNs which are discriminatively trained according to an optimal speech reconstruction objective. We demonstrate that LSTM speech enhancement, even when used ' na¨vely ' as front-end processing, delivers competitive results on the CHiME-2 speech recognition task. Furtherm...

  15. Phonetic Alphabet for Speech Recognition of Czech

    J. Uhlir; Psutka, J.; J. Nouza

    1997-01-01

    In the paper we introduce and discuss an alphabet that has been proposed for phonemicly oriented automatic speech recognition. The alphabet, denoted as a PAC (Phonetic Alphabet for Czech) consists of 48 basic symbols that allow for distinguishing all major events occurring in spoken Czech language. The symbols can be used both for phonetic transcription of Czech texts as well as for labeling recorded speech signals. From practical reasons, the alphabet occurs in two versions; one utilizes Cze...

  16. Unsupervised Topic Adaptation for Lecture Speech Retrieval

    Fujii, Atsushi; Itou, Katunobu; Akiba, Tomoyosi; Ishikawa, Tetsuya

    2004-01-01

    We are developing a cross-media information retrieval system, in which users can view specific segments of lecture videos by submitting text queries. To produce a text index, the audio track is extracted from a lecture video and a transcription is generated by automatic speech recognition. In this paper, to improve the quality of our retrieval system, we extensively investigate the effects of adapting acoustic and language models on speech recognition. We perform an MLLR-based method to adapt...

  17. An articulatorily constrained, maximum entropy approach to speech recognition and speech coding

    Hogden, J.

    1996-12-31

    Hidden Markov models (HMM`s) are among the most popular tools for performing computer speech recognition. One of the primary reasons that HMM`s typically outperform other speech recognition techniques is that the parameters used for recognition are determined by the data, not by preconceived notions of what the parameters should be. This makes HMM`s better able to deal with intra- and inter-speaker variability despite the limited knowledge of how speech signals vary and despite the often limited ability to correctly formulate rules describing variability and invariance in speech. In fact, it is often the case that when HMM parameter values are constrained using the limited knowledge of speech, recognition performance decreases. However, the structure of an HMM has little in common with the mechanisms underlying speech production. Here, the author argues that by using probabilistic models that more accurately embody the process of speech production, he can create models that have all the advantages of HMM`s, but that should more accurately capture the statistical properties of real speech samples--presumably leading to more accurate speech recognition. The model he will discuss uses the fact that speech articulators move smoothly and continuously. Before discussing how to use articulatory constraints, he will give a brief description of HMM`s. This will allow him to highlight the similarities and differences between HMM`s and the proposed technique.

  18. Hate speech

    Anne Birgitta Nilsen

    2014-03-01

    Full Text Available The manifesto of the Norwegian terrorist Anders Behring Breivik is based on the “Eurabia” conspiracy theory. This theory is a key starting point for hate speech amongst many right-wing extremists in Europe, but also has ramifications beyond these environments. In brief, proponents of the Eurabia theory claim that Muslims are occupying Europe and destroying Western culture, with the assistance of the EU and European governments. By contrast, members of Al-Qaeda and other extreme Islamists promote the conspiracy theory “the Crusade” in their hate speech directed against the West. Proponents of the latter theory argue that the West is leading a crusade to eradicate Islam and Muslims, a crusade that is similarly facilitated by their governments. This article presents analyses of texts written by right-wing extremists and Muslim extremists in an effort to shed light on how hate speech promulgates conspiracy theories in order to spread hatred and intolerance.The aim of the article is to contribute to a more thorough understanding of hate speech’s nature by applying rhetorical analysis. Rhetorical analysis is chosen because it offers a means of understanding the persuasive power of speech. It is thus a suitable tool to describe how hate speech works to convince and persuade. The concepts from rhetorical theory used in this article are ethos, logos and pathos. The concept of ethos is used to pinpoint factors that contributed to Osama bin Laden's impact, namely factors that lent credibility to his promotion of the conspiracy theory of the Crusade. In particular, Bin Laden projected common sense, good morals and good will towards his audience. He seemed to have coherent and relevant arguments; he appeared to possess moral credibility; and his use of language demonstrated that he wanted the best for his audience.The concept of pathos is used to define hate speech, since hate speech targets its audience's emotions. In hate speech it is the

  19. Recognizing intentions in infant-directed speech: evidence for universals.

    Bryant, Gregory A; Barrett, H Clark

    2007-08-01

    In all languages studied to date, distinct prosodic contours characterize different intention categories of infant-directed (ID) speech. This vocal behavior likely exists universally as a species-typical trait, but little research has examined whether listeners can accurately recognize intentions in ID speech using only vocal cues, without access to semantic information. We recorded native-English-speaking mothers producing four intention categories of utterances (prohibition, approval, comfort, and attention) as both ID and adult-directed (AD) speech, and we then presented the utterances to Shuar adults (South American hunter-horticulturalists). Shuar subjects were able to reliably distinguish ID from AD speech and were able to reliably recognize the intention categories in both types of speech, although performance was significantly better with ID speech. This is the first demonstration that adult listeners in an indigenous, nonindustrialized, and nonliterate culture can accurately infer intentions from both ID speech and AD speech in a language they do not speak. PMID:17680948

  20. Strategies for distant speech recognitionin reverberant environments

    Delcroix, Marc; Yoshioka, Takuya; Ogawa, Atsunori; Kubo, Yotaro; Fujimoto, Masakiyo; Ito, Nobutaka; Kinoshita, Keisuke; Espi, Miquel; Araki, Shoko; Hori, Takaaki; Nakatani, Tomohiro

    2015-12-01

    Reverberation and noise are known to severely affect the automatic speech recognition (ASR) performance of speech recorded by distant microphones. Therefore, we must deal with reverberation if we are to realize high-performance hands-free speech recognition. In this paper, we review a recognition system that we developed at our laboratory to deal with reverberant speech. The system consists of a speech enhancement (SE) front-end that employs long-term linear prediction-based dereverberation followed by noise reduction. We combine our SE front-end with an ASR back-end that uses neural networks for acoustic and language modeling. The proposed system achieved top scores on the ASR task of the REVERB challenge. This paper describes the different technologies used in our system and presents detailed experimental results that justify our implementation choices and may provide hints for designing distant ASR systems.

  1. Speech Enhancement

    Benesty, Jacob; Jensen, Jesper Rindom; Christensen, Mads Græsbøll;

    and their performance bounded and assessed in terms of noise reduction and speech distortion. The book shows how various filter designs can be obtained in this framework, including the maximum SNR, Wiener, LCMV, and MVDR filters, and how these can be applied in various contexts, like in single......Speech enhancement is a classical problem in signal processing, yet still largely unsolved. Two of the conventional approaches for solving this problem are linear filtering, like the classical Wiener filter, and subspace methods. These approaches have traditionally been treated as different classes...

  2. Speech enhancement

    Benesty, Jacob; Chen, Jingdong

    2006-01-01

    We live in a noisy world! In all applications (telecommunications, hands-free communications, recording, human-machine interfaces, etc.) that require at least one microphone, the signal of interest is usually contaminated by noise and reverberation. As a result, the microphone signal has to be ""cleaned"" with digital signal processing tools before it is played out, transmitted, or stored.This book is about speech enhancement. Different well-known and state-of-the-art methods for noise reduction, with one or multiple microphones, are discussed. By speech enhancement, we mean not only noise red

  3. Speech-enabled Computer-aided Translation

    Mesa-Lao, Bartolomé

    2014-01-01

    The present study has surveyed post-editor trainees’ views and attitudes before and after the introduction of speech technology as a front end to a computer-aided translation workbench. The aim of the survey was (i) to identify attitudes and perceptions among post-editor trainees before performing...... a post-editing task using automatic speech recognition (ASR); and (ii) to assess the degree to which post-editors’ attitudes and expectations to the use of speech technology changed after actually using it. The survey was based on two questionnaires: the first one administered before the...

  4. Towards robust speech acquisition using sensor arrays

    Maganti, Hari Krishna

    2007-01-01

    An integrated system approach was developed to address the problem of distant speech acquisition in multi-party meetings, using multiple microphones and cameras. Microphone array processing techniques have presented a potential alternative to close-talking microphones by providing speech enhancement through spatial filtering and directional discrimination. These techniques relied on accurate speaker locations for optimal performance. Tracking accurate speaker locations solely based on audio w...

  5. Toward Speech and Nonverbal Behaviors Integration for Humanoid Robot

    Wei Wang; Xiaodan Huang

    2012-01-01

    It is essential to integrate speeches and nonverbal behaviors for a humanoid robot in human‐robot interaction. This paper presents an approach using multi‐object genetic algorithm to match the speeches and behaviors automatically. Firstly, with humanoid robot’s emotion status, we construct a hierarchical structure to link voice characteristics and nonverbal behaviors. Secondly, these behaviors corresponding to speeches are matched and integrated into an action sequence based on genetic algori...

  6. The Role of Visual Spatial Attention in Audiovisual Speech Perception

    Andersen, Tobias; Tiippana, K.; Laarni, J.; Kojo, I.; Sams, M.

    2008-01-01

    Auditory and visual information is integrated when perceiving speech, as evidenced by the McGurk effect in which viewing an incongruent talking face categorically alters auditory speech perception. Audiovisual integration in speech perception has long been considered automatic and pre-attentive but recent reports have challenged this view. Here we study the effect of visual spatial attention on the McGurk effect. By presenting a movie of two faces symmetrically displaced to each side of a cen...

  7. The Use of Speech Recognition Technology in Automotive Applications

    Gellatly, Andrew William

    1997-01-01

    The research objectives were (1) to perform a detailed review of the literature on speech recognition technology and the attentional demands of driving; (2) to develop decision tools that assist designers of in-vehicle systems; (3) to experimentally examine automatic speech recognition (ASR) design parameters, input modalities, and driver ages; and (4) to provide human factors recommendations for the use of speech recognition technology in automotive applicatio...

  8. Language and Speech Processing

    Mariani, Joseph

    2008-01-01

    Speech processing addresses various scientific and technological areas. It includes speech analysis and variable rate coding, in order to store or transmit speech. It also covers speech synthesis, especially from text, speech recognition, including speaker and language identification, and spoken language understanding. This book covers the following topics: how to realize speech production and perception systems, how to synthesize and understand speech using state-of-the-art methods in signal processing, pattern recognition, stochastic modelling computational linguistics and human factor studi

  9. Speech Clarity Index (Ψ): A Distance-Based Speech Quality Indicator and Recognition Rate Prediction for Dysarthric Speakers with Cerebral Palsy

    Kayasith, Prakasith; Theeramunkong, Thanaruk

    It is a tedious and subjective task to measure severity of a dysarthria by manually evaluating his/her speech using available standard assessment methods based on human perception. This paper presents an automated approach to assess speech quality of a dysarthric speaker with cerebral palsy. With the consideration of two complementary factors, speech consistency and speech distinction, a speech quality indicator called speech clarity index (Ψ) is proposed as a measure of the speaker's ability to produce consistent speech signal for a certain word and distinguished speech signal for different words. As an application, it can be used to assess speech quality and forecast speech recognition rate of speech made by an individual dysarthric speaker before actual exhaustive implementation of an automatic speech recognition system for the speaker. The effectiveness of Ψ as a speech recognition rate predictor is evaluated by rank-order inconsistency, correlation coefficient, and root-mean-square of difference. The evaluations had been done by comparing its predicted recognition rates with ones predicted by the standard methods called the articulatory and intelligibility tests based on the two recognition systems (HMM and ANN). The results show that Ψ is a promising indicator for predicting recognition rate of dysarthric speech. All experiments had been done on speech corpus composed of speech data from eight normal speakers and eight dysarthric speakers.

  10. Experimental comparison between speech transmission index, rapid speech transmission index, and speech intelligibility index.

    Larm, Petra; Hongisto, Valtteri

    2006-02-01

    During the acoustical design of, e.g., auditoria or open-plan offices, it is important to know how speech can be perceived in various parts of the room. Different objective methods have been developed to measure and predict speech intelligibility, and these have been extensively used in various spaces. In this study, two such methods were compared, the speech transmission index (STI) and the speech intelligibility index (SII). Also the simplification of the STI, the room acoustics speech transmission index (RASTI), was considered. These quantities are all based on determining an apparent speech-to-noise ratio on selected frequency bands and summing them using a specific weighting. For comparison, some data were needed on the possible differences of these methods resulting from the calculation scheme and also measuring equipment. Their prediction accuracy was also of interest. Measurements were made in a laboratory having adjustable noise level and absorption, and in a real auditorium. It was found that the measurement equipment, especially the selection of the loudspeaker, can greatly affect the accuracy of the results. The prediction accuracy of the RASTI was found acceptable, if the input values for the prediction are accurately known, even though the studied space was not ideally diffuse. PMID:16521772

  11. Speech coding

    Gersho, Allen

    1990-05-01

    Recent advances in algorithms and techniques for speech coding now permit high quality voice reproduction at remarkably low bit rates. The advent of powerful single-ship signal processors has made it cost effective to implement these new and sophisticated speech coding algorithms for many important applications in voice communication and storage. Some of the main ideas underlying the algorithms of major interest today are reviewed. The concept of removing redundancy by linear prediction is reviewed, first in the context of predictive quantization or DPCM. Then linear predictive coding, adaptive predictive coding, and vector quantization are discussed. The concepts of excitation coding via analysis-by-synthesis, vector sum excitation codebooks, and adaptive postfiltering are explained. The main idea of vector excitation coding (VXC) or code excited linear prediction (CELP) are presented. Finally low-delay VXC coding and phonetic segmentation for VXC are described.

  12. Hate speech

    Anne Birgitta Nilsen

    2014-01-01

    The manifesto of the Norwegian terrorist Anders Behring Breivik is based on the “Eurabia” conspiracy theory. This theory is a key starting point for hate speech amongst many right-wing extremists in Europe, but also has ramifications beyond these environments. In brief, proponents of the Eurabia theory claim that Muslims are occupying Europe and destroying Western culture, with the assistance of the EU and European governments. By contrast, members of Al-Qaeda and other extreme Islamists prom...

  13. Emotion Recognition from Persian Speech with Neural Network

    Mina Hamidi

    2012-09-01

    Full Text Available In this paper, we report an effort towards automatic recognition of emotional states from continuous Persian speech. Due to the unavailability of appropriate database in the Persian language for emotion recognition, at first, we built a database of emotional speech in Persian. This database consists of 2400 wave clips modulated with anger, disgust, fear, sadness, happiness and normal emotions. Then we extract prosodic features, including features related to the pitch, intensity and global characteristics of the speech signal. Finally, we applied neural networks for automatic recognition of emotion. The resulting average accuracy was about 78%.

  14. Speech and Communication Disorders

    ... or understand speech. Causes include Hearing disorders and deafness Voice problems, such as dysphonia or those caused by cleft lip or palate Speech problems like stuttering Developmental disabilities Learning disorders Autism spectrum disorder Brain injury Stroke Some speech and ...

  15. Speech disorders - children

    ... of speech disorders may disappear on their own. Speech therapy may help with more severe symptoms or speech problems that do not improve. In therapy, the child will learn how to create certain sounds.

  16. Speech Enhancement based on Compressive Sensing Algorithm

    Sulong, Amart; Gunawan, Teddy S.; Khalifa, Othman O.; Chebil, Jalel

    2013-12-01

    There are various methods, in performance of speech enhancement, have been proposed over the years. The accurate method for the speech enhancement design mainly focuses on quality and intelligibility. The method proposed with high performance level. A novel speech enhancement by using compressive sensing (CS) is a new paradigm of acquiring signals, fundamentally different from uniform rate digitization followed by compression, often used for transmission or storage. Using CS can reduce the number of degrees of freedom of a sparse/compressible signal by permitting only certain configurations of the large and zero/small coefficients, and structured sparsity models. Therefore, CS is significantly provides a way of reconstructing a compressed version of the speech in the original signal by taking only a small amount of linear and non-adaptive measurement. The performance of overall algorithms will be evaluated based on the speech quality by optimise using informal listening test and Perceptual Evaluation of Speech Quality (PESQ). Experimental results show that the CS algorithm perform very well in a wide range of speech test and being significantly given good performance for speech enhancement method with better noise suppression ability over conventional approaches without obvious degradation of speech quality.

  17. Automatic sequences

    Haeseler, Friedrich

    2003-01-01

    Automatic sequences are sequences which are produced by a finite automaton. Although they are not random they may look as being random. They are complicated, in the sense of not being not ultimately periodic, they may look rather complicated, in the sense that it may not be easy to name the rule by which the sequence is generated, however there exists a rule which generates the sequence. The concept automatic sequences has special applications in algebra, number theory, finite automata and formal languages, combinatorics on words. The text deals with different aspects of automatic sequences, in particular:· a general introduction to automatic sequences· the basic (combinatorial) properties of automatic sequences· the algebraic approach to automatic sequences· geometric objects related to automatic sequences.

  18. Modeling speech imitation and ecological learning of auditory-motor maps

    Claudia eCanevari; Leonardo eBadino; Alessandro eD'Ausilio; Luciano eFadiga; Giorgio eMetta

    2013-01-01

    Classical models of speech consider an antero-posterior distinction between perceptive and productive functions. However, the selective alteration of neural activity in speech motor centers, via transcranial magnetic stimulation, was shown to affect speech discrimination. On the automatic speech recognition (ASR) side, the recognition systems have classically relied solely on acoustic data, achieving rather good performance in optimal listening conditions. The main limitations of current ASR ...

  19. Commercial applications of speech interface technology: an industry at the threshold.

    Oberteuffer, J A

    1995-01-01

    Speech interface technology, which includes automatic speech recognition, synthetic speech, and natural language processing, is beginning to have a significant impact on business and personal computer use. Today, powerful and inexpensive microprocessors and improved algorithms are driving commercial applications in computer command, consumer, data entry, speech-to-text, telephone, and voice verification. Robust speaker-independent recognition systems for command and navigation in personal com...

  20. Segmentation of the speech signal based on changes in energy distribution in the spectrum

    Jassem, W.; Kudzdela, H.; Domagala, P.

    1983-08-01

    A simple algorithm is proposed for automatic phonetic segmentation of the acoustic speech signal on the MERA 303 desk-top minicomputer. The algorithm is verified with Polish linguistic material spoken by two subjects. The proposed algorithm detects approximately 80 percent of the boundaries between enunciated segments correctly, a result no worse than that obtained using more complex methods. Speech recognition programs are discussed as speech perception models, and the nature of categorical perception of human speech sounds is examined.

  1. A Research of Speech Emotion Recognition Based on Deep Belief Network and SVM

    Chenchen Huang

    2014-01-01

    Full Text Available Feature extraction is a very important part in speech emotion recognition, and in allusion to feature extraction in speech emotion recognition problems, this paper proposed a new method of feature extraction, using DBNs in DNN to extract emotional features in speech signal automatically. By training a 5 layers depth DBNs, to extract speech emotion feature and incorporate multiple consecutive frames to form a high dimensional feature. The features after training in DBNs were the input of nonlinear SVM classifier, and finally speech emotion recognition multiple classifier system was achieved. The speech emotion recognition rate of the system reached 86.5%, which was 7% higher than the original method.

  2. A Comprehensive Noise Robust Speech Parameterization Algorithm Using Wavelet Packet Decomposition-Based Denoising and Speech Feature Representation Techniques

    Kotnik Bojan

    2007-01-01

    Full Text Available This paper concerns the problem of automatic speech recognition in noise-intense and adverse environments. The main goal of the proposed work is the definition, implementation, and evaluation of a novel noise robust speech signal parameterization algorithm. The proposed procedure is based on time-frequency speech signal representation using wavelet packet decomposition. A new modified soft thresholding algorithm based on time-frequency adaptive threshold determination was developed to efficiently reduce the level of additive noise in the input noisy speech signal. A two-stage Gaussian mixture model (GMM-based classifier was developed to perform speech/nonspeech as well as voiced/unvoiced classification. The adaptive topology of the wavelet packet decomposition tree based on voiced/unvoiced detection was introduced to separately analyze voiced and unvoiced segments of the speech signal. The main feature vector consists of a combination of log-root compressed wavelet packet parameters, and autoregressive parameters. The final output feature vector is produced using a two-staged feature vector postprocessing procedure. In the experimental framework, the noisy speech databases Aurora 2 and Aurora 3 were applied together with corresponding standardized acoustical model training/testing procedures. The automatic speech recognition performance achieved using the proposed noise robust speech parameterization procedure was compared to the standardized mel-frequency cepstral coefficient (MFCC feature extraction procedures ETSI ES 201 108 and ETSI ES 202 050.

  3. Prediction Method of Speech Recognition Performance Based on HMM-based Speech Synthesis Technique

    Terashima, Ryuta; Yoshimura, Takayoshi; Wakita, Toshihiro; Tokuda, Keiichi; Kitamura, Tadashi

    We describe an efficient method that uses a HMM-based speech synthesis technique as a test pattern generator for evaluating the word recognition rate. The recognition rates of each word and speaker can be evaluated by the synthesized speech by using this method. The parameter generation technique can be formulated as an algorithm that can determine the speech parameter vector sequence O by maximizing P(O¦Q,λ) given the model parameter λ and the state sequence Q, under a dynamic acoustic feature constraint. We conducted recognition experiments to illustrate the validity of the method. Approximately 100 speakers were used to train the speaker dependent models for the speech synthesis used in these experiments, and the synthetic speech was generated as the test patterns for the target speech recognizer. As a result, the recognition rate of the HMM-based synthesized speech shows a good correlation with the recognition rate of the actual speech. Furthermore, we find that our method can predict the speaker recognition rate with approximately 2% error on average. Therefore the evaluation of the speaker recognition rate will be performed automatically by using the proposed method.

  4. Effective Prediction of Errors by Non-native Speakers Using Decision Tree for Speech Recognition-Based CALL System

    Wang, Hongcui; Kawahara, Tatsuya

    CALL (Computer Assisted Language Learning) systems using ASR (Automatic Speech Recognition) for second language learning have received increasing interest recently. However, it still remains a challenge to achieve high speech recognition performance, including accurate detection of erroneous utterances by non-native speakers. Conventionally, possible error patterns, based on linguistic knowledge, are added to the lexicon and language model, or the ASR grammar network. However, this approach easily falls in the trade-off of coverage of errors and the increase of perplexity. To solve the problem, we propose a method based on a decision tree to learn effective prediction of errors made by non-native speakers. An experimental evaluation with a number of foreign students learning Japanese shows that the proposed method can effectively generate an ASR grammar network, given a target sentence, to achieve both better coverage of errors and smaller perplexity, resulting in significant improvement in ASR accuracy.

  5. Multi-thread Parallel Speech Recognition for Mobile Applications

    LOJKA Martin

    2014-05-01

    Full Text Available In this paper, the server based solution of the multi-thread large vocabulary automatic speech recognition engine is described along with the Android OS and HTML5 practical application examples. The basic idea was to bring speech recognition available for full variety of applications for computers and especially for mobile devices. The speech recognition engine should be independent of commercial products and services (where the dictionary could not be modified. Using of third-party services could be also a security and privacy problem in specific applications, when the unsecured audio data could not be sent to uncontrolled environments (voice data transferred to servers around the globe. Using our experience with speech recognition applications, we have been able to construct a multi-thread speech recognition serverbased solution designed for simple applications interface (API to speech recognition engine modified to specific needs of particular application.

  6. Phonetic Alphabet for Speech Recognition of Czech

    J. Uhlir

    1997-12-01

    Full Text Available In the paper we introduce and discuss an alphabet that has been proposed for phonemicly oriented automatic speech recognition. The alphabet, denoted as a PAC (Phonetic Alphabet for Czech consists of 48 basic symbols that allow for distinguishing all major events occurring in spoken Czech language. The symbols can be used both for phonetic transcription of Czech texts as well as for labeling recorded speech signals. From practical reasons, the alphabet occurs in two versions; one utilizes Czech native characters and the other employs symbols similar to those used for English in the DARPA and NIST alphabets.

  7. Sentence Clustering Using Parts-of-Speech

    Richard Khoury

    2012-02-01

    Full Text Available Clustering algorithms are used in many Natural Language Processing (NLP tasks. They have proven to be popular and effective tools to use to discover groups of similar linguistic items. In this exploratory paper, we propose a new clustering algorithm to automatically cluster together similar sentences based on the sentences’ part-of-speech syntax. The algorithm generates and merges together the clusters using a syntactic similarity metric based on a hierarchical organization of the parts-of-speech. We demonstrate the features of this algorithm by implementing it in a question type classification system, in order to determine the positive or negative impact of different changes to the algorithm.

  8. Post-editing through Speech Recognition

    Mesa-Lao, Bartolomé

    In the past couple of years automatic speech recognition (ASR) software has quietly created a niche for itself in many situations of our lives. Nowadays it can be found at the other end of customer-support hotlines, it is built into operating systems and it is offered as an alternative text...... the most popular computer-aided translation workbenches in the market (i.e. MemoQ) together with one of the most well-known ASR packages (i.e. Dragon Naturally Speaking from Nuance). Two data correction modes will be considered: a) keyboard vs. b) keyboard and speech combined. These two different ways...

  9. Robust coarticulatory modeling for continuous speech recognition

    Schwartz, R.; Chow, Y. L.; Dunham, M. O.; Kimball, O.; Krasner, M.; Kubala, F.; Makhoul, J.; Price, P.; Roucos, S.

    1986-10-01

    The purpose of this project is to perform research into algorithms for the automatic recognition of individual sounds or phonemes in continuous speech. The algorithms developed should be appropriate for understanding large-vocabulary continuous speech input and are to be made available to the Strategic Computing Program for incorporation in a complete word recognition system. This report describes process to date in developing phonetic models that are appropriate for continuous speech recognition. In continuous speech, the acoustic realization of each phoneme depends heavily on the preceding and following phonemes: a process known as coarticulation. Thus, while there are relatively few phonemes in English (on the order of fifty or so), the number of possible different accoustic realizations is in the thousands. Therefore, to develop high-accuracy recognition algorithms, one may need to develop literally thousands of relatively distance phonetic models to represent the various phonetic context adequately. Developing a large number of models usually necessitates having a large amount of speech to provide reliable estimates of the model parameters. The major contributions of this work are the development of: (1) A simple but powerful formalism for modeling phonemes in context; (2) Robust training methods for the reliable estimation of model parameters by utilizing the available speech training data in a maximally effective way; and (3) Efficient search strategies for phonetic recognition while maintaining high recognition accuracy.

  10. Genetic Advances in the Study of Speech and Language Disorders

    Newbury, D.F.; Monaco, A P

    2010-01-01

    Summary Developmental speech and language disorders cover a wide range of childhood conditions with overlapping but heterogeneous phenotypes and underlying etiologies. This characteristic heterogeneity hinders accurate diagnosis, can complicate treatment strategies, and causes difficulties in the identification of causal factors. Nonetheless, over the last decade, genetic variants have been identified that may predispose certain individuals to different aspects of speech and language difficul...

  11. Detection and Separation of Speech Events in Meeting Recordings Using a Microphone Array

    Yamada Miichi

    2007-01-01

    Full Text Available When applying automatic speech recognition (ASR to meeting recordings including spontaneous speech, the performance of ASR is greatly reduced by the overlap of speech events. In this paper, a method of separating the overlapping speech events by using an adaptive beamforming (ABF framework is proposed. The main feature of this method is that all the information necessary for the adaptation of ABF, including microphone calibration, is obtained from meeting recordings based on the results of speech-event detection. The performance of the separation is evaluated via ASR using real meeting recordings.

  12. Detection and Separation of Speech Events in Meeting Recordings Using a Microphone Array

    Futoshi Asano

    2007-07-01

    Full Text Available When applying automatic speech recognition (ASR to meeting recordings including spontaneous speech, the performance of ASR is greatly reduced by the overlap of speech events. In this paper, a method of separating the overlapping speech events by using an adaptive beamforming (ABF framework is proposed. The main feature of this method is that all the information necessary for the adaptation of ABF, including microphone calibration, is obtained from meeting recordings based on the results of speech-event detection. The performance of the separation is evaluated via ASR using real meeting recordings.

  13. Handling of errors for increasing automatic feedback reliability in foreign language prosody learning Gestion d’erreurs pour la fiabilisation des retours automatiques en apprentissage de la prosodie d’une langue seconde

    Denis Jouvet

    2013-06-01

    Full Text Available The success of future systems for computer assisted foreign language learning relies on providing the learner personalized diagnosis and relevant corrections of its pronunciations. After a presentation of the problem of reliable automatic prosodic feedbacks in language learning, we present our work related to the processing of some errors stemming from the learner and from the system itself. The first part deals with the relevant rejection of incorrect entries (for example due to learner’s errors while being tolerant to non-native speech deviations. The second part focuses on the automatic phonetic segmentation of nonnative speech. A detailed analysis has showed the benefit of taking into account non-native variants, and lead to determining the classes of phonemes whose temporal boundaries are the most accurate and which should be favored in the design of exercises for language learning.

  14. An Agent-based Framework for Speech Investigation

    Walsh, Michael; O'Hare, G.M.P.; Carson-Berndsen, Julie

    2005-01-01

    This paper presents a novel agent-based framework for investigating speech recognition which combines statistical data and explicit phonological knowledge in order to explore strategies aimed at augmenting the performance of automatic speech recognition (ASR) systems. This line of research is motivated by a desire to provide solutions to some of the more notable problems encountered, including in particular the problematic phenomena of coarticulation, underspecified input...

  15. Deep Denoising Auto-encoder for Statistical Speech Synthesis

    Wu, Zhenzhou; Takaki, Shinji; Yamagishi, Junichi

    2015-01-01

    This paper proposes a deep denoising auto-encoder technique to extract better acoustic features for speech synthesis. The technique allows us to automatically extract low-dimensional features from high dimensional spectral features in a non-linear, data-driven, unsupervised way. We compared the new stochastic feature extractor with conventional mel-cepstral analysis in analysis-by-synthesis and text-to-speech experiments. Our results confirm that the proposed method increases the quality of s...

  16. Minimal Pair Distinctions and Intelligibility in Preschool Children with and without Speech Sound Disorders

    Hodge, Megan M.; Gotzke, Carrie L.

    2011-01-01

    Listeners' identification of young children's productions of minimally contrastive words and predictive relationships between accurately identified words and intelligibility scores obtained from a 100-word spontaneous speech sample were determined for 36 children with typically developing speech (TDS) and 36 children with speech sound disorders…

  17. Speech and Language Impairments

    ... easily be mistaken for other disabilities such as autism or learning disabilities, so it’s very important to ensure that the child receives a thorough evaluation by a certified speech-language pathologist. Back to top What Causes Speech ...

  18. Speech impairment (adult)

    ... impairment; Impairment of speech; Inability to speak; Aphasia; Dysarthria; Slurred speech; Dysphonia voice disorders ... in others the condition does not get better. DYSARTHRIA With dysarthria, the person has ongoing difficulty expressing ...

  19. Speech perception as categorization

    Holt, Lori L.; Lotto, Andrew J.

    2010-01-01

    Speech perception (SP) most commonly refers to the perceptual mapping from the highly variable acoustic speech signal to a linguistic representation, whether it be phonemes, diphones, syllables, or words. This is an example of categorization, in that potentially discriminable speech sounds are assigned to functionally equivalent classes. In this tutorial, we present some of the main challenges to our understanding of the categorization of speech sounds and the conceptualization of SP that has...

  20. 无汞重铬酸钾-自动电位滴定法准确测定矿石中的全铁含量%Accurate Determination of Total Iron in Ores by Automatic Potentiometric Titration without Potassium Dichromate

    赵怀颖; 温宏利; 夏月莲; 巩爱华; 马生凤

    2012-01-01

    铁矿石样品采用Na2O2碱熔进行前处理,自动电位滴定法准确测定矿石中全铁的含量.对于样品溶液Fe3+的还原方式,考察了SnCl2-HgC12、SnC12、TiC13、SnCl2-TiCl3四种方式,确定选用SnCl2-TiCl3联合还原,不仅避免了有毒试剂的使用,而且滴定终点电位突跃明显.自动电位滴定法的相对误差(RE)为0.13%,精密度(RSD)为0.22%,优于手动滴定法,避免了手动滴定受终点颜色判断误差、分析者水平等因素影响的不足.将建立的SnCl2-TiCl3-K2Cr2O7自动电位滴定法应用于6个铁含量大于30%的矿石标准物质分析,RE<0.2%,RSD<0.3%(n=10).该方法对于钒钛磁铁矿样品GBW07226a、GBW07224无需分离,可直接测定,样品分解方法简单快捷,适用性强,样品不会飞溅且分解完全,适用于需要较高准确度的铁矿石尤其是高含量铁矿石样品的分析工作.%For iron ores, this paper discusses an alkali fusion method of sodium peroxide to resolve the ores without splash and complete decomposition. Four methods of reducing Fe3+ to Fe2+ are also discussed, using stannous chloride and titanium trichloride as this not only avoids the use of toxic reagents, but also has a clearly potential jump at the end of the titration. Finally, the manual titration method is replaced by an automatic potentiometric titration to avoid manual errors, such as the judgement of the end of titration by colour and the level of experience of the analyst. The relative error can be reduced to 0. 13% and the relative standard deviation is 0. 22%. A new SnCl2 - TiCl3 - K2 Cr2 O7 automatic potentiometric titration method has been developed. It has been applied to detect six National Standard Reference iron ore samples where the content of iron is higher than 30% and the relative error is lower than 0. 2% , the relative standard deviation being lower than 0. 3% (n = l0). The magnetite GBW 07226a and GBW 07224 with high vanadium and titanium can be determined

  1. Speech Recognition for Dental Electronic Health Record

    Nagy, Miroslav; Hanzlíček, Petr; Zvárová, Jana; Dostálová, T.; Seydlová, M.; Hippmann, R.; Smidl, L.; Trmal, J.; Psutka, J.

    Brno: VUTIUM Press, 2008 - (Jan, J.; Kozumplík, J.; Provazník, I.). s. 47-47 ISBN 978-80-214-3612-1. [Biosignal 2008. International EURASIP Conference /19./. 29.06.2008-01.07.2008, Brno] Institutional research plan: CEZ:AV0Z10300504 Keywords : automatic speech recognition * electronic health record * dental medicine Subject RIV: IN - Informatics, Computer Science

  2. The role of visual spatial attention in audiovisual speech perception

    Andersen, Tobias; Tiippana, K.; Laarni, J.;

    2009-01-01

    recent reports have challenged this view. Here we study the effect of visual spatial attention on the McGurk effect. By presenting a movie of two faces symmetrically displaced to each side of a central fixation point and dubbed with a single auditory speech track, we were able to discern the influences......Auditory and visual information is integrated when perceiving speech, as evidenced by the McGurk effect in which viewing an incongruent talking face categorically alters auditory speech perception. Audiovisual integration in speech perception has long been considered automatic and pre-attentive but...... from each of the faces and from the voice on the auditory speech percept. We found that directing visual spatial attention towards a face increased the influence of that face on auditory perception. However, the influence of the voice on auditory perception did not change suggesting that audiovisual...

  3. Auto Spell Suggestion for High Quality Speech Synthesis in Hindi

    Kabra, Shikha; Agarwal, Ritika

    2014-02-01

    The goal of Text-to-Speech (TTS) synthesis in a particular language is to convert arbitrary input text to intelligible and natural sounding speech. However, for a particular language like Hindi, which is a highly confusing language (due to very close spellings), it is not an easy task to identify errors/mistakes in input text and an incorrect text degrade the quality of output speech hence this paper is a contribution to the development of high quality speech synthesis with the involvement of Spellchecker which generates spell suggestions for misspelled words automatically. Involvement of spellchecker would increase the efficiency of speech synthesis by providing spell suggestions for incorrect input text. Furthermore, we have provided the comparative study for evaluating the resultant effect on to phonetic text by adding spellchecker on to input text.

  4. Talking Speech Input.

    Berliss-Vincent, Jane; Whitford, Gigi

    2002-01-01

    This article presents both the factors involved in successful speech input use and the potential barriers that may suggest that other access technologies could be more appropriate for a given individual. Speech input options that are available are reviewed and strategies for optimizing use of speech recognition technology are discussed. (Contains…

  5. Speech-Language Pathologists

    ... INDEX | OOH SITE MAP | EN ESPAÑOL Healthcare > Speech-Language Pathologists PRINTER-FRIENDLY EN ESPAÑOL Summary What They ... workers and occupations. What They Do -> What Speech-Language Pathologists Do About this section Speech-language pathologists ...

  6. THE BASIS FOR SPEECH PREVENTION

    Jordan JORDANOVSKI

    1997-06-01

    Full Text Available The speech is a tool for accurate communication of ideas. When we talk about speech prevention as a practical realization of the language, we are referring to the fact that it should be comprised of the elements of the criteria as viewed from the perspective of the standards. This criteria, in the broad sense of the word, presupposes an exact realization of the thought expressed between the speaker and the recipient.The absence of this criterion catches the eye through the practical realization of the language and brings forth consequences, often hidden very deeply in the human psyche. Their outer manifestation already represents a delayed reaction of the social environment. The foundation for overcoming and standardization of this phenomenon must be the anatomy-physiological patterns of the body, accomplished through methods in concordance with the nature of the body.

  7. Automatic readout micrometer

    A measuring system is disclosed for surveying and very accurately positioning objects with respect to a reference line. A principal use of this surveying system is for accurately aligning the electromagnets which direct a particle beam emitted from a particle accelerator. Prior art surveying systems require highly skilled surveyors. Prior art systems include, for example, optical surveying systems which are susceptible to operator reading errors, and celestial navigation-type surveying systems, with their inherent complexities. The present invention provides an automatic readout micrometer which can very accurately measure distances. The invention has a simplicity of operation which practically eliminates the possibilities of operator optical reading error, owning to the elimination of traditional optical alignments for making measurements. The invention has an extendable arm which carries a laser surveying target. The extendable arm can be continuously positioned over its entire length of travel by either a coarse or fine adjustment without having the fine adjustment outrun the coarse adjustment until a reference laser beam is centered on the target as indicated by a digital readout. The length of the micrometer can then be accurately and automatically read by a computer and compared with a standardized set of alignment measurements. Due to its construction, the micrometer eliminates any errors due to temperature changes when the system is operated within a standard operating temperature range

  8. Automated Gesturing for Virtual Characters: Speech-driven and Text-driven Approaches

    Goranka Zoric

    2006-04-01

    Full Text Available We present two methods for automatic facial gesturing of graphically embodied animated agents. In one case, conversational agent is driven by speech in automatic Lip Sync process. By analyzing speech input, lip movements are determined from the speech signal. Another method provides virtual speaker capable of reading plain English text and rendering it in a form of speech accompanied by the appropriate facial gestures. Proposed statistical model for generating virtual speaker’s facial gestures can be also applied as addition to lip synchronization process in order to obtain speech driven facial gesturing. In this case statistical model will be triggered with the input speech prosody instead of lexical analysis of the input text.

  9. Emotion Recognition from Persian Speech with Neural Network

    Mina Hamidi

    2012-10-01

    Full Text Available In this paper, we report an effort towards automatic recognition of emotional states from continuousPersian speech. Due to the unavailability of appropriate database in the Persian language for emotionrecognition, at first, we built a database of emotional speech in Persian. This database consists of 2400wave clips modulated with anger, disgust, fear, sadness, happiness and normal emotions. Then we extractprosodic features, including features related to the pitch, intensity and global characteristics of the speechsignal. Finally, we applied neural networks for automatic recognition of emotion. The resulting averageaccuracy was about 78%.

  10. Adaptive Recognition of Phonemes from Speaker - Connected-Speech Using Alisa.

    Osella, Stephen Albert

    The purpose of this dissertation research is to investigate a novel approach to automatic speech recognition (ASR). The successes that have been achieved in ASR have relied heavily on the use of a language grammar, which significantly constrains the ASR process. By using grammar to provide most of the recognition ability, the ASR system does not have to be as accurate at the low-level recognition stage. The ALISA Phonetic Transcriber (APT) algorithm is proposed as a way to improve ASR by enhancing the lowest -level recognition stage. The objective of the APT algorithm is to classify speech frames (a short sequence of speech signal samples) into a small set of phoneme classes. The APT algorithm constructs the mapping from speech frames to phoneme labels through a multi-layer feedforward process. A design principle of APT is that final decisions are delayed as long as possible. Instead of attempting to optimize the decision making at each processing level individually, each level generates a list of candidate solutions that are passed on to the next level of processing. The later processing levels use these candidate solutions to resolve ambiguities. The scope of this dissertation is the design of the APT algorithm up to the speech-frame classification stage. In future research, the APT algorithm will be extended to the word recognition stage. In particular, the APT algorithm could serve as the front-end stage to a Hidden Markov Model (HMM) based word recognition system. In such a configuration, the APT algorithm would provide the HMM with the requisite phoneme state-probability estimates. To date, the APT algorithm has been tested with the TIMIT and NTIMIT speech databases. The APT algorithm has been trained and tested on the SX and SI sentence texts using both male and female speakers. Results indicate better performance than those results obtained using a neural network based speech-frame classifier. The performance of the APT algorithm has been evaluated for

  11. Unvoiced Speech Recognition Using Tissue-Conductive Acoustic Sensor

    Hiroshi Saruwatari

    2007-01-01

    Full Text Available We present the use of stethoscope and silicon NAM (nonaudible murmur microphones in automatic speech recognition. NAM microphones are special acoustic sensors, which are attached behind the talker's ear and can capture not only normal (audible speech, but also very quietly uttered speech (nonaudible murmur. As a result, NAM microphones can be applied in automatic speech recognition systems when privacy is desired in human-machine communication. Moreover, NAM microphones show robustness against noise and they might be used in special systems (speech recognition, speech transform, etc. for sound-impaired people. Using adaptation techniques and a small amount of training data, we achieved for a 20 k dictation task a 93.9% word accuracy for nonaudible murmur recognition in a clean environment. In this paper, we also investigate nonaudible murmur recognition in noisy environments and the effect of the Lombard reflex on nonaudible murmur recognition. We also propose three methods to integrate audible speech and nonaudible murmur recognition using a stethoscope NAM microphone with very promising results.

  12. Unvoiced Speech Recognition Using Tissue-Conductive Acoustic Sensor

    Heracleous Panikos

    2007-01-01

    Full Text Available We present the use of stethoscope and silicon NAM (nonaudible murmur microphones in automatic speech recognition. NAM microphones are special acoustic sensors, which are attached behind the talker's ear and can capture not only normal (audible speech, but also very quietly uttered speech (nonaudible murmur. As a result, NAM microphones can be applied in automatic speech recognition systems when privacy is desired in human-machine communication. Moreover, NAM microphones show robustness against noise and they might be used in special systems (speech recognition, speech transform, etc. for sound-impaired people. Using adaptation techniques and a small amount of training data, we achieved for a 20 k dictation task a word accuracy for nonaudible murmur recognition in a clean environment. In this paper, we also investigate nonaudible murmur recognition in noisy environments and the effect of the Lombard reflex on nonaudible murmur recognition. We also propose three methods to integrate audible speech and nonaudible murmur recognition using a stethoscope NAM microphone with very promising results.

  13. A new method for extraction of speech features using spectral delta characteristics and invariant integration

    FARSI, Hassan; KUHIMOGHADAM, Samana

    2014-01-01

    We propose a new feature extraction algorithm that is robust against noise. Nonlinear filtering and temporal masking are used for the proposed algorithm. Since the current automatic speech recognition systems use invariant-integration and delta-delta techniques for speech feature extraction, the proposed algorithm improves speech recognition accuracy appropriately using a delta-spectral feature instead of invariant integration. One of the nonenvironmental factors that reduce recognitio...

  14. Employment of Spectral Voicing Information for Speech and Speaker Recognition in Noisy Conditions

    Jan&#;ovič, Peter; Köküer, M&#;nevver

    2008-01-01

    This chapter described our recent research on representation and modelling of speech signals for automatic speech and speaker recognition in noisy conditions. The chapter consisted of three parts. In the first part, we presented a novel method for estimation of the voicing information of speech spectra in the presence of noise. The presented method is based on calculating a similarity between the shape of signal short-term spectrum and the spectrum of the frame-analysis window. It does not re...

  15. Unobtrusive multimodal emotion detection in adaptive interfaces: speech and facial expressions

    Truong, K.P.; Leeuwen, D.A. van; Neerincx, M.A.

    2007-01-01

    Two unobtrusive modalities for automatic emotion recognition are discussed: speech and facial expressions. First, an overview is given of emotion recognition studies based on a combination of speech and facial expressions. We will identify difficulties concerning data collection, data fusion, system

  16. A Computer-Aided Evaluation of Error Patterns in Aphasic Speech

    Chan, Sharon; Tsigka, Styliani; Boschetti, Federico; Capasso, Rita

    2010-01-01

    The objective of this research is to provide an improved automated computational tool to study aphasic production. Using the speech production of Italian aphasic patients, the present study demonstrates the possibility of applying an integrated algorithm to automatically assess and generate error patterns typical of aphasic speech. Philological…

  17. Helium Speech: An Application of Standing Waves

    Wentworth, Christopher D.

    2011-01-01

    Taking a breath of helium gas and then speaking or singing to the class is a favorite demonstration for an introductory physics course, as it usually elicits appreciative laughter, which serves to energize the class session. Students will usually report that the helium speech "raises the frequency" of the voice. A more accurate description of the…

  18. Automatic analysis of multiparty meetings

    Steve Renals

    2011-10-01

    This paper is about the recognition and interpretation of multiparty meetings captured as audio, video and other signals. This is a challenging task since the meetings consist of spontaneous and conversational interactions between a number of participants: it is a multimodal, multiparty, multistream problem. We discuss the capture and annotation of the Augmented Multiparty Interaction (AMI) meeting corpus, the development of a meeting speech recognition system, and systems for the automatic segmentation, summarization and social processing of meetings, together with some example applications based on these systems.

  19. Fifty years of progress in speech waveform coding

    Atal, Bishnu S.

    2004-10-01

    Over the past 50 years, sustained research in speech coding has made it possible to encode speech with high speech quality at rates as low as 4 kb/s. The technology is now used in many applications, such as digital cellular phones, personal computers, and packet telephony. The early research in speech coding was aimed at reproducing speech spectra using a small number of slowly varying parameters. The focus of research shifted later to accurate reproduction of speech waveforms at low bit rates. The introduction of linear predictive coding (LPC) led to the development of new algorithms, such as adaptive predictive coding, multipulse and code-excited LPC. Code-excited LPC has become the method of choice for low bit rate speech coding and is used in most voice transmission standards. Digital speech communication is rapidly moving away from traditional circuit-switched to packet-switched networks based on IP protocols (VoIP). The focus of speech coding research is now on providing to low cost, reliable, and secure transmission of high-quality speech on IP networks.

  20. Digital speech processing using Matlab

    Gopi, E S

    2014-01-01

    Digital Speech Processing Using Matlab deals with digital speech pattern recognition, speech production model, speech feature extraction, and speech compression. The book is written in a manner that is suitable for beginners pursuing basic research in digital speech processing. Matlab illustrations are provided for most topics to enable better understanding of concepts. This book also deals with the basic pattern recognition techniques (illustrated with speech signals using Matlab) such as PCA, LDA, ICA, SVM, HMM, GMM, BPN, and KSOM.

  1. Towards A Clinical Tool For Automatic Intelligibility Assessment.

    Berisha, Visar; Utianski, Rene; Liss, Julie

    2013-01-01

    An important, yet under-explored, problem in speech processing is the automatic assessment of intelligibility for pathological speech. In practice, intelligibility assessment is often done through subjective tests administered by speech pathologists; however research has shown that these tests are inconsistent, costly, and exhibit poor reliability. Although some automatic methods for intelligibility assessment for telecommunications exist, research specific to pathological speech has been limited. Here, we propose an algorithm that captures important multi-scale perceptual cues shown to correlate well with intelligibility. Nonlinear classifiers are trained at each time scale and a final intelligibility decision is made using ensemble learning methods from machine learning. Preliminary results indicate a marked improvement in intelligibility assessment over published baseline results. PMID:25004985

  2. Automatic Differentiation of Algorithms for Machine Learning

    Baydin, Atilim Gunes; Pearlmutter, Barak A.

    2014-01-01

    Automatic differentiation --- the mechanical transformation of numeric computer programs to calculate derivatives efficiently and accurately --- dates to the origin of the computer age. Reverse mode automatic differentiation both antedates and generalizes the method of backwards propagation of errors used in machine learning. Despite this, practitioners in a variety of fields, including machine learning, have been little influenced by automatic differentiation, and make scant use of available...

  3. Semantic and Phonetic Automatic Reconstruction of Medical Dictations

    Petrik, Stefan; Drexel, Christina; Fessler, Leo; Jancsary, Jeremy; Klein, Alexandra; Kubin, Gernot; Matiasek, Johannes; Pernkopf, Franz; Trost, Harald

    2010-01-01

    Abstract Automatic speech recognition (ASR) has become a valuable tool in large document production environments like medical dictation. While manual post-processing is still needed for correcting speech-recognition errors and for creating documents which adhere to various stylistic and formatting conventions, a large part of the document production process is carried out by the ASR system. For improving the quality of the system output, knowledge about the multi-layered relationsh...

  4. Likelihood-Maximizing-Based Multiband Spectral Subtraction for Robust Speech Recognition

    Bagher BabaAli

    2009-01-01

    Full Text Available Automatic speech recognition performance degrades significantly when speech is affected by environmental noise. Nowadays, the major challenge is to achieve good robustness in adverse noisy conditions so that automatic speech recognizers can be used in real situations. Spectral subtraction (SS is a well-known and effective approach; it was originally designed for improving the quality of speech signal judged by human listeners. SS techniques usually improve the quality and intelligibility of speech signal while speech recognition systems need compensation techniques to reduce mismatch between noisy speech features and clean trained acoustic model. Nevertheless, correlation can be expected between speech quality improvement and the increase in recognition accuracy. This paper proposes a novel approach for solving this problem by considering SS and the speech recognizer not as two independent entities cascaded together, but rather as two interconnected components of a single system, sharing the common goal of improved speech recognition accuracy. This will incorporate important information of the statistical models of the recognition engine as a feedback for tuning SS parameters. By using this architecture, we overcome the drawbacks of previously proposed methods and achieve better recognition accuracy. Experimental evaluations show that the proposed method can achieve significant improvement of recognition rates across a wide range of signal to noise ratios.

  5. Automatic Recognition of Element Classes and Boundaries in the Birdsong with Variable Sequences.

    Koumura, Takuya; Okanoya, Kazuo

    2016-01-01

    Researches on sequential vocalization often require analysis of vocalizations in long continuous sounds. In such studies as developmental ones or studies across generations in which days or months of vocalizations must be analyzed, methods for automatic recognition would be strongly desired. Although methods for automatic speech recognition for application purposes have been intensively studied, blindly applying them for biological purposes may not be an optimal solution. This is because, unlike human speech recognition, analysis of sequential vocalizations often requires accurate extraction of timing information. In the present study we propose automated systems suitable for recognizing birdsong, one of the most intensively investigated sequential vocalizations, focusing on the three properties of the birdsong. First, a song is a sequence of vocal elements, called notes, which can be grouped into categories. Second, temporal structure of birdsong is precisely controlled, meaning that temporal information is important in song analysis. Finally, notes are produced according to certain probabilistic rules, which may facilitate the accurate song recognition. We divided the procedure of song recognition into three sub-steps: local classification, boundary detection, and global sequencing, each of which corresponds to each of the three properties of birdsong. We compared the performances of several different ways to arrange these three steps. As results, we demonstrated a hybrid model of a deep convolutional neural network and a hidden Markov model was effective. We propose suitable arrangements of methods according to whether accurate boundary detection is needed. Also we designed the new measure to jointly evaluate the accuracy of note classification and boundary detection. Our methods should be applicable, with small modification and tuning, to the songs in other species that hold the three properties of the sequential vocalization. PMID:27442240

  6. Emotional speech acoustic model for Malay: iterative versus isolated unit training.

    Mustafa, Mumtaz Begum; Ainon, Raja Noor

    2013-10-01

    The ability of speech synthesis system to synthesize emotional speech enhances the user's experience when using this kind of system and its related applications. However, the development of an emotional speech synthesis system is a daunting task in view of the complexity of human emotional speech. The more recent state-of-the-art speech synthesis systems, such as the one based on hidden Markov models, can synthesize emotional speech with acceptable naturalness with the use of a good emotional speech acoustic model. However, building an emotional speech acoustic model requires adequate resources including segment-phonetic labels of emotional speech, which is a problem for many under-resourced languages, including Malay. This research shows how it is possible to build an emotional speech acoustic model for Malay with minimal resources. To achieve this objective, two forms of initialization methods were considered: iterative training using the deterministic annealing expectation maximization algorithm and the isolated unit training. The seed model for the automatic segmentation is a neutral speech acoustic model, which was transformed to target emotion using two transformation techniques: model adaptation and context-dependent boundary refinement. Two forms of evaluation have been performed: an objective evaluation measuring the prosody error and a listening evaluation to measure the naturalness of the synthesized emotional speech. PMID:24116440

  7. Indirect Speech Acts

    李威

    2001-01-01

    Indirect speech acts are frequently used in verbal communication, the interpretation of them is of great importance in order to meet the demands of the development of students' communicative competence. This paper, therefore, intends to present Searle' s indirect speech acts and explore the way how indirect speech acts are interpreted in accordance with two influential theories. It consists of four parts. Part one gives a general introduction to the notion of speech acts theory. Part two makes an elaboration upon the conception of indirect speech act theory proposed by Searle and his supplement and development of illocutionary acts. Part three deals with the interpretation of indirect speech acts. Part four draws implication from the previous study and also serves as the conclusion of the dissertation.

  8. Esophageal speeches modified by the Speech Enhancer Program®

    Manochiopinig, Sriwimon; Boonpramuk, Panuthat

    2014-01-01

    Esophageal speech appears to be the first choice of speech treatment for a laryngectomy. However, many laryngectomy people are unable to speak well. The aim of this study was to evaluate post-modified speech quality of Thai esophageal speakers using the Speech Enhancer Program®. The method adopted was to approach five speech–language pathologists to assess the speech accuracy and intelligibility of the words and continuing speech of the seven laryngectomy people. A comparison study was conduc...

  9. Speech Alarms Pilot Study

    Sandor, Aniko; Moses, Haifa

    2016-01-01

    Speech alarms have been used extensively in aviation and included in International Building Codes (IBC) and National Fire Protection Association's (NFPA) Life Safety Code. However, they have not been implemented on space vehicles. Previous studies conducted at NASA JSC showed that speech alarms lead to faster identification and higher accuracy. This research evaluated updated speech and tone alerts in a laboratory environment and in the Human Exploration Research Analog (HERA) in a realistic setup.

  10. Context dependent speech recognition

    Andersson, Sebastian

    2006-01-01

    Poor speech recognition is a problem when developing spoken dialogue systems, but several studies has showed that speech recognition can be improved by post-processing of recognition output that use the dialogue context, acoustic properties of a user utterance and other available resources to train a statistical model to use as a filter between the speech recogniser and dialogue manager. In this thesis a corpus of logged interactions between users and a dialogue system was used...

  11. Speech input and output

    Class, F.; Mangold, H.; Stall, D.; Zelinski, R.

    1981-12-01

    Possibilities for acoustical dialogs with electronic data processing equipment were investigated. Speech recognition is posed as recognizing word groups. An economical, multistage classifier for word string segmentation is presented and its reliability in dealing with continuous speech (problems of temporal normalization and context) is discussed. Speech synthesis is considered in terms of German linguistics and phonetics. Preprocessing algorithms for total synthesis of written texts were developed. A macrolanguage, MUSTER, is used to implement this processing in an acoustic data information system (ADES).

  12. Principles of speech coding

    Ogunfunmi, Tokunbo

    2010-01-01

    It is becoming increasingly apparent that all forms of communication-including voice-will be transmitted through packet-switched networks based on the Internet Protocol (IP). Therefore, the design of modern devices that rely on speech interfaces, such as cell phones and PDAs, requires a complete and up-to-date understanding of the basics of speech coding. Outlines key signal processing algorithms used to mitigate impairments to speech quality in VoIP networksOffering a detailed yet easily accessible introduction to the field, Principles of Speech Coding provides an in-depth examination of the

  13. Advances in Speech Recognition

    Neustein, Amy

    2010-01-01

    This volume is comprised of contributions from eminent leaders in the speech industry, and presents a comprehensive and in depth analysis of the progress of speech technology in the topical areas of mobile settings, healthcare and call centers. The material addresses the technical aspects of voice technology within the framework of societal needs, such as the use of speech recognition software to produce up-to-date electronic health records, not withstanding patients making changes to health plans and physicians. Included will be discussion of speech engineering, linguistics, human factors ana

  14. Ear, Hearing and Speech

    Poulsen, Torben

    2000-01-01

    An introduction is given to the the anatomy and the function of the ear, basic psychoacoustic matters (hearing threshold, loudness, masking), the speech signal and speech intelligibility. The lecture note is written for the course: Fundamentals of Acoustics and Noise Control (51001)......An introduction is given to the the anatomy and the function of the ear, basic psychoacoustic matters (hearing threshold, loudness, masking), the speech signal and speech intelligibility. The lecture note is written for the course: Fundamentals of Acoustics and Noise Control (51001)...

  15. Advances in speech processing

    Ince, A. Nejat

    1992-10-01

    The field of speech processing is undergoing a rapid growth in terms of both performance and applications and this is fueled by the advances being made in the areas of microelectronics, computation, and algorithm design. The use of voice for civil and military communications is discussed considering advantages and disadvantages including the effects of environmental factors such as acoustic and electrical noise and interference and propagation. The structure of the existing NATO communications network and the evolving Integrated Services Digital Network (ISDN) concept are briefly reviewed to show how they meet the present and future requirements. The paper then deals with the fundamental subject of speech coding and compression. Recent advances in techniques and algorithms for speech coding now permit high quality voice reproduction at remarkably low bit rates. The subject of speech synthesis is next treated where the principle objective is to produce natural quality synthetic speech from unrestricted text input. Speech recognition where the ultimate objective is to produce a machine which would understand conversational speech with unrestricted vocabulary, from essentially any talker, is discussed. Algorithms for speech recognition can be characterized broadly as pattern recognition approaches and acoustic phonetic approaches. To date, the greatest degree of success in speech recognition has been obtained using pattern recognition paradigms. It is for this reason that the paper is concerned primarily with this technique.

  16. [The voice and speech].

    Pesák, J; Honová, J; Majtner, J; Vojtĕchovský, K

    1998-01-01

    Biophysics is the science comprising the sum of biophysical disciplines describing living systems. It also includes the biophysics of voice and speech. The latter deals with physiological acoustics, phonetics, phoniatry as well as logopaedics. In connection with the problems of voice and speech, including also their teaching problems, a common language is often being sought for appropriate to all the interested scientific branches. As a result of our efforts aimed at removing the existing barriers we have tried to set up a University Society for the Study of Voice and Speech. One of its first activities was also, besides other events, the realization of a videofilm On voice and speech. PMID:10803289

  17. Suppression of the µ Rhythm during Speech and Non-Speech Discrimination Revealed by Independent Component Analysis: Implications for Sensorimotor Integration in Speech Processing

    Bowers, Andrew; Saltuklaroglu, Tim; Harkrider, Ashley; Cuellar, Megan

    2013-01-01

    Background Constructivist theories propose that articulatory hypotheses about incoming phonetic targets may function to enhance perception by limiting the possibilities for sensory analysis. To provide evidence for this proposal, it is necessary to map ongoing, high-temporal resolution changes in sensorimotor activity (i.e., the sensorimotor μ rhythm) to accurate speech and non-speech discrimination performance (i.e., correct trials.) Methods Sixteen participants (15 female and 1 male) were a...

  18. 'What is it?' A functional MRI and SPECT study of ictal speech in a second language

    Neuronal networks involved in second language (L2) processing vary between normal subjects. Patients with epilepsy may have ictal speech automatisms in their second language. To delineate the brain systems involved in L2 ictal speech, we combined functional MRI during bilingual tasks and ictal - inter-ictal single-photon emission computed tomography in a patient who presented L2 ictal speech productions. These analyses showed that the networks activated by the seizure and those activated by L2 processing intersected in the right hippocampus. These results may provide some insights both into the pathophysiology of ictal speech and into the brain organization for L2. (authors)

  19. Analysis of vocal signal in its amplitude - time representation. speech synthesis-by-rules

    In the first part of this dissertation, the natural speech production and the resulting acoustic waveform are examined under various aspects: communication, phonetics, frequency and temporal analysis. Our own study of direct signal is compared to other researches in these different fields, and fundamental features of vocal signals are described. The second part deals with the numerous methods already used for automatic text-to-speech synthesis. In the last part, we expose the new speech synthesis-by-rule methods that we have worked out, and we present in details the structure of the real-time speech synthesiser that we have implemented on a mini-computer. (author)

  20. Speech-Language Therapy (For Parents)

    ... 5 Things to Know About Zika & Pregnancy Speech-Language Therapy KidsHealth > For Parents > Speech-Language Therapy Print ... with speech and/or language disorders. Speech Disorders, Language Disorders, and Feeding Disorders A speech disorder refers ...

  1. Time-expanded speech and speech recognition in older adults.

    Vaughan, Nancy E; Furukawa, Izumi; Balasingam, Nirmala; Mortz, Margaret; Fausti, Stephen A

    2002-01-01

    Speech understanding deficits are common in older adults. In addition to hearing sensitivity, changes in certain cognitive functions may affect speech recognition. One such change that may impact the ability to follow a rapidly changing speech signal is processing speed. When speakers slow the rate of their speech naturally in order to speak clearly, speech recognition is improved. The acoustic characteristics of naturally slowed speech are of interest in developing time-expansion algorithms to improve speech recognition for older listeners. In this study, we tested younger normally hearing, older normally hearing, and older hearing-impaired listeners on time-expanded speech using increased duration and increased intensity of unvoiced consonants. Although all groups performed best on unprocessed speech, performance with processed speech was better with the consonant gain feature without time expansion in the noise condition and better at the slowest time-expanded rate in the quiet condition. The effects of signal processing on speech recognition are discussed. PMID:17642020

  2. Emotional State Categorization from Speech: Machine vs. Human

    Shaukat, Arslan

    2010-01-01

    This paper presents our investigations on emotional state categorization from speech signals with a psychologically inspired computational model against human performance under the same experimental setup. Based on psychological studies, we propose a multistage categorization strategy which allows establishing an automatic categorization model flexibly for a given emotional speech categorization task. We apply the strategy to the Serbian Emotional Speech Corpus (GEES) and the Danish Emotional Speech Corpus (DES), where human performance was reported in previous psychological studies. Our work is the first attempt to apply machine learning to the GEES corpus where the human recognition rates were only available prior to our study. Unlike the previous work on the DES corpus, our work focuses on a comparison to human performance under the same experimental settings. Our studies suggest that psychology-inspired systems yield behaviours that, to a great extent, resemble what humans perceived and their performance ...

  3. Speech Compression for Noise-Corrupted Thai Expressive Speech

    Suphattharachai Chomphan

    2011-01-01

    Full Text Available Problem statement: In speech communication, speech coding aims at preserving the speech quality with lower coding bitrate. When considering the communication environment, various types of noises deteriorates the speech quality. The expressive speech with different speaking styles may cause different speech quality with the same coding method. Approach: This research proposed a study of speech compression for noise-corrupted Thai expressive speech by using two coding methods of CS-ACELP and MP-CELP. The speech material included a hundredmale speech utterances and a hundred female speech utterances. Four speaking styles included enjoyable, sad, angry and reading styles. Five sentences of Thai speech were chosen. Three types of noises were included (train, car and air conditioner. Five levels of each type of noise were varied from 0-20 dB. The subjective test of mean opinion score was exploited in the evaluation process. Results: The experimental results showed that CS-ACELP gave the better speech quality than that of MP-CELP at all three bitrates of 6000, 8600-12600 bps. When considering the levels of noise, the 20-dB noise gave the best speech quality, while 0-dB noise gave the worst speech quality. When considering the speech gender, female speech gave the better results than that of male speech. When considering the types of noise, the air-conditioner noise gave the best speech quality, while the train noise gave the worst speech quality. Conclusion: From the study, it can be seen that coding methods, types of noise, levels of noise, speech gender influence on the coding speech quality.

  4. Improving Alaryngeal Speech Intelligibility.

    Christensen, John M.; Dwyer, Patricia E.

    1990-01-01

    Laryngectomized patients using esophageal speech or an electronic artificial larynx have difficulty producing correct voicing contrasts between homorganic consonants. This paper describes a therapy technique that emphasizes "pushing harder" on voiceless consonants to improve alaryngeal speech intelligibility and proposes focusing on the production…

  5. Speech Situations and TEFL

    吴树奇; 高建国

    2008-01-01

    This paper deals with how speech situations or ratherspeech implicatures affect TEFL.As far as the writer is concerned,they have much influence on many aspect of language teaching.To illustrate this point explicitly,the writer focuses on the influence of speech situations upon pronunciation,intonation,lexical meanings,sentence comprehension and the grammatical study of the English language.

  6. Speech and Language Delay

    ... child depends on the cause of the speech delay. Your doctor will tell you the cause of your child's problem and explain any treatments that might fix the problem or make it better. A speech and language pathologist might be helpful in making treatment plans. This ...

  7. Private Speech in Ballet

    Johnston, Dale

    2006-01-01

    Authoritarian teaching practices in ballet inhibit the use of private speech. This paper highlights the critical importance of private speech in the cognitive development of young ballet students, within what is largely a non-verbal art form. It draws upon research by Russian psychologist Lev Vygotsky and contemporary socioculturalists, to…

  8. Speech processing standards

    Ince, A. Nejat

    1990-05-01

    Speech processing standards are given for 64, 32, 16 kb/s and lower rate speech and more generally, speech-band signals which are or will be promulgated by CCITT and NATO. The International Telegraph and Telephone Consultative Committee (CCITT) of the International body which deals, among other things, with speech processing within the context of ISDN. Within NATO there are also bodies promulgating standards which make interoperability, possible without complex and expensive interfaces. Some of the applications for low-bit rate voice and the related work undertaken by CCITT Study Groups which are responsible for developing standards in terms of encoding algorithms, codec design objectives as well as standards on the assessment of speech quality, are highlighted.

  9. Speech recognition systems on the Cell Broadband Engine

    Liu, Y; Jones, H; Vaidya, S; Perrone, M; Tydlitat, B; Nanda, A

    2007-04-20

    In this paper we describe our design, implementation, and first results of a prototype connected-phoneme-based speech recognition system on the Cell Broadband Engine{trademark} (Cell/B.E.). Automatic speech recognition decodes speech samples into plain text (other representations are possible) and must process samples at real-time rates. Fortunately, the computational tasks involved in this pipeline are highly data-parallel and can receive significant hardware acceleration from vector-streaming architectures such as the Cell/B.E. Identifying and exploiting these parallelism opportunities is challenging, but also critical to improving system performance. We observed, from our initial performance timings, that a single Cell/B.E. processor can recognize speech from thousands of simultaneous voice channels in real time--a channel density that is orders-of-magnitude greater than the capacity of existing software speech recognizers based on CPUs (central processing units). This result emphasizes the potential for Cell/B.E.-based speech recognition and will likely lead to the future development of production speech systems using Cell/B.E. clusters.

  10. Recognizing intentions in infant-directed speech: Evidence for universals

    Bryant, GA; Barrett, HC

    2007-01-01

    In all languages studied to date, distinct prosodic contours characterize different intention categories of infant-directed (ID) speech. This vocal behavior likely exists universally as a species-typical trait, but little research has examined whether listeners can accurately recognize intentions in ID speech using only vocal cues, without access to semantic information. We recorded native-English-speaking mothers producing four intention categories of utterances (prohibition, approval, comfo...

  11. Speech Acts In President Barack Obama Victory Speech 2012

    Januarini, Erna

    2016-01-01

    In the thesis, entitled Speech Acts In President Barack Obama's Victory Speech 2012. The author analyzes the illocutionary acts and direct and indirect speech acts and by Barack Obama as a speaker based on representative, directive, expressive, commissive, and declaration. The purpose of this thesis is to find the types of illocutionary acts and direct and indirect speech acts and in Barack Obama's victory speech 2012. In writing this thesis, the author uses a qualitative method from Huberman...

  12. 78 FR 49693 - Speech-to-Speech and Internet Protocol (IP) Speech-to-Speech Telecommunications Relay Services...

    2013-08-15

    ... Rulemaking, published at 73 FR 47120, August 13, 2008 (2008 STS NPRM). The Commission sought comment on... Abbreviated Dialing Arrangements, CC Docket No. 92-105, Report and Order, published at 65 FR 54799, September... COMMISSION 47 CFR Part 64 Speech-to-Speech and Internet Protocol (IP) Speech-to-Speech...

  13. 78 FR 49717 - Speech-to-Speech and Internet Protocol (IP) Speech-to-Speech Telecommunications Relay Services...

    2013-08-15

    ..., Report and Order and Further Notice of Proposed Rulemaking, published at 77 FR 25609, May 1, 2012 (VRS... Nos. 03-123 and 08-15, Notice of Proposed Rulemaking, published at 73 FR 47120, August 13, 2008 (2008... COMMISSION 47 CFR Part 64 Speech-to-Speech and Internet Protocol (IP) Speech-to-Speech...

  14. Going to a Speech Therapist

    ... What's in this article? What Do Speech Therapists Help With? Who Needs Speech Therapy? What's It Like? How Long Will Treatment Last? Some kids have trouble saying certain sounds or words. This can be frustrating ... speech therapists (also called speech-language pathologists ). What ...

  15. Automated Intelligibility Assessment of Pathological Speech Using Phonological Features

    Catherine Middag

    2009-01-01

    Full Text Available It is commonly acknowledged that word or phoneme intelligibility is an important criterion in the assessment of the communication efficiency of a pathological speaker. People have therefore put a lot of effort in the design of perceptual intelligibility rating tests. These tests usually have the drawback that they employ unnatural speech material (e.g., nonsense words and that they cannot fully exclude errors due to listener bias. Therefore, there is a growing interest in the application of objective automatic speech recognition technology to automate the intelligibility assessment. Current research is headed towards the design of automated methods which can be shown to produce ratings that correspond well with those emerging from a well-designed and well-performed perceptual test. In this paper, a novel methodology that is built on previous work (Middag et al., 2008 is presented. It utilizes phonological features, automatic speech alignment based on acoustic models that were trained on normal speech, context-dependent speaker feature extraction, and intelligibility prediction based on a small model that can be trained on pathological speech samples. The experimental evaluation of the new system reveals that the root mean squared error of the discrepancies between perceived and computed intelligibilities can be as low as 8 on a scale of 0 to 100.

  16. Adverse Conditions and ASR Techniques for Robust Speech User Interface

    Urmila Shrawankar

    2011-09-01

    Full Text Available The main motivation for Automatic Speech Recognition (ASR is efficient interfaces to computers, and for the interfaces to be natural and truly useful, it should provide coverage for a large group of users. The purpose of these tasks is to further improve man-machine communication. ASR systems exhibit unacceptable degradations in performance when the acoustical environments used for training and testing the system are not the same. The goal of this research is to increase the robustness of the speech recognition systems with respect to changes in the environment. A system can be labeled as environment-independent if the recognition accuracy for a new environment is the same or higher than that obtained when the system is retrained for that environment. Attaining such performance is the dream of the researchers. This paper elaborates some of the difficulties with Automatic Speech Recognition (ASR. These difficulties are classified into Speakers characteristics and environmental conditions, and tried to suggest some techniques to compensate variations in speech signal. This paper focuses on the robustness with respect to speakers variations and changes in the acoustical environment. We discussed several different external factors that change the environment and physiological differences that affect the performance of a speech recognition system followed by techniques that are helpful to design a robust ASR system

  17. Global Freedom of Speech

    Binderup, Lars Grassme

    2007-01-01

    opposed to a legal norm, that curbs exercises of the right to free speech that offend the feelings or beliefs of members from other cultural groups. The paper rejects the suggestion that acceptance of such a norm is in line with liberal egalitarian thinking. Following a review of the classical liberal...... egalitarian reasons for free speech - reasons from overall welfare, from autonomy and from respect for the equality of citizens - it is argued that these reasons outweigh the proposed reasons for curbing culturally offensive speech. Currently controversial cases such as that of the Danish Cartoon Controversy...

  18. Fishing for meaningful units in connected speech

    Henrichsen, Peter Juel; Christiansen, Thomas Ulrich

    2009-01-01

    In many branches of spoken language analysis including ASR, the set of smallest meaningful units of speech is taken to coincide with the set of phones or phonemes. However, fishing for phones is difficult, error-prone, and computationally expensive. We present an experiment, based on machine...... far lower than for phonemic recognition. Our findings show that it is possible to automatically characterize a linguistic message, without detailed spectral information or presumptions about the target units. Further, fishing for simple meaningful cues and enhancing these selectively would potentially...

  19. Audio-Visual Speech Recognition Using MPEG-4 Compliant Visual Features

    Aleksic Petar S

    2002-01-01

    Full Text Available We describe an audio-visual automatic continuous speech recognition system, which significantly improves speech recognition performance over a wide range of acoustic noise levels, as well as under clean audio conditions. The system utilizes facial animation parameters (FAPs supported by the MPEG-4 standard for the visual representation of speech. We also describe a robust and automatic algorithm we have developed to extract FAPs from visual data, which does not require hand labeling or extensive training procedures. The principal component analysis (PCA was performed on the FAPs in order to decrease the dimensionality of the visual feature vectors, and the derived projection weights were used as visual features in the audio-visual automatic speech recognition (ASR experiments. Both single-stream and multistream hidden Markov models (HMMs were used to model the ASR system, integrate audio and visual information, and perform a relatively large vocabulary (approximately 1000 words speech recognition experiments. The experiments performed use clean audio data and audio data corrupted by stationary white Gaussian noise at various SNRs. The proposed system reduces the word error rate (WER by 20% to 23% relatively to audio-only speech recognition WERs, at various SNRs (0–30 dB with additive white Gaussian noise, and by 19% relatively to audio-only speech recognition WER under clean audio conditions.

  20. Automatic assessment of vowel space area.

    Sandoval, Steven; Berisha, Visar; Utianski, Rene L; Liss, Julie M; Spanias, Andreas

    2013-11-01

    Vowel space area (VSA) is an attractive metric for the study of speech production deficits and reductions in intelligibility, in addition to the traditional study of vowel distinctiveness. Traditional VSA estimates are not currently sufficiently sensitive to map to production deficits. The present report describes an automated algorithm using healthy, connected speech rather than single syllables and estimates the entire vowel working space rather than corner vowels. Analyses reveal a strong correlation between the traditional VSA and automated estimates. When the two methods diverge, the automated method seems to provide a more accurate area since it accounts for all vowels. PMID:24181994

  1. Speech dereverberation for enhancement and recognition using dynamic features constrained deep neural networks and feature adaptation

    Xiao, Xiong; Zhao, Shengkui; Ha Nguyen, Duc Hoang; Zhong, Xionghu; Jones, Douglas L.; Chng, Eng Siong; Li, Haizhou

    2016-01-01

    This paper investigates deep neural networks (DNN) based on nonlinear feature mapping and statistical linear feature adaptation approaches for reducing reverberation in speech signals. In the nonlinear feature mapping approach, DNN is trained from parallel clean/distorted speech corpus to map reverberant and noisy speech coefficients (such as log magnitude spectrum) to the underlying clean speech coefficients. The constraint imposed by dynamic features (i.e., the time derivatives of the speech coefficients) are used to enhance the smoothness of predicted coefficient trajectories in two ways. One is to obtain the enhanced speech coefficients with a least square estimation from the coefficients and dynamic features predicted by DNN. The other is to incorporate the constraint of dynamic features directly into the DNN training process using a sequential cost function. In the linear feature adaptation approach, a sparse linear transform, called cross transform, is used to transform multiple frames of speech coefficients to a new feature space. The transform is estimated to maximize the likelihood of the transformed coefficients given a model of clean speech coefficients. Unlike the DNN approach, no parallel corpus is used and no assumption on distortion types is made. The two approaches are evaluated on the REVERB Challenge 2014 tasks. Both speech enhancement and automatic speech recognition (ASR) results show that the DNN-based mappings significantly reduce the reverberation in speech and improve both speech quality and ASR performance. For the speech enhancement task, the proposed dynamic feature constraint help to improve cepstral distance, frequency-weighted segmental signal-to-noise ratio (SNR), and log likelihood ratio metrics while moderately degrades the speech-to-reverberation modulation energy ratio. In addition, the cross transform feature adaptation improves the ASR performance significantly for clean-condition trained acoustic models.

  2. Speech and Swallowing

    ... Español In Your Area NPF Shop Speech and Swallowing Problems Make Text Smaller Make Text Larger You ... How do I know if I have a swallowing problem? I have recently lost weight without trying. ...

  3. Speech disorders - children

    ... this page: //medlineplus.gov/ency/article/001430.htm Speech disorders - children To use the sharing features on ... PA: Elsevier Saunders; 2011:chap 32. Read More Autism spectrum disorder Cerebral palsy Hearing loss Intellectual disability ...

  4. Speech impairment (adult)

    ... ALS or Lou Gehrig disease), cerebral palsy, myasthenia gravis, or multiple sclerosis (MS) Facial trauma Facial weakness, ... provider will likely ask about the speech impairment. Questions may include when the problem developed, whether there ...

  5. Deep Multimodal Learning for Audio-Visual Speech Recognition

    Mroueh, Youssef; Marcheret, Etienne; Goel, Vaibhava

    2015-01-01

    In this paper, we present methods in deep multimodal learning for fusing speech and visual modalities for Audio-Visual Automatic Speech Recognition (AV-ASR). First, we study an approach where uni-modal deep networks are trained separately and their final hidden layers fused to obtain a joint feature space in which another deep network is built. While the audio network alone achieves a phone error rate (PER) of $41\\%$ under clean condition on the IBM large vocabulary audio-visual studio datase...

  6. One-against-all weighted dynamic time warping for language-independent and speaker-dependent speech recognition in adverse conditions.

    Xianglilan Zhang

    Full Text Available Considering personal privacy and difficulty of obtaining training material for many seldom used English words and (often non-English names, language-independent (LI with lightweight speaker-dependent (SD automatic speech recognition (ASR is a promising option to solve the problem. The dynamic time warping (DTW algorithm is the state-of-the-art algorithm for small foot-print SD ASR applications with limited storage space and small vocabulary, such as voice dialing on mobile devices, menu-driven recognition, and voice control on vehicles and robotics. Even though we have successfully developed two fast and accurate DTW variations for clean speech data, speech recognition for adverse conditions is still a big challenge. In order to improve recognition accuracy in noisy environment and bad recording conditions such as too high or low volume, we introduce a novel one-against-all weighted DTW (OAWDTW. This method defines a one-against-all index (OAI for each time frame of training data and applies the OAIs to the core DTW process. Given two speech signals, OAWDTW tunes their final alignment score by using OAI in the DTW process. Our method achieves better accuracies than DTW and merge-weighted DTW (MWDTW, as 6.97% relative reduction of error rate (RRER compared with DTW and 15.91% RRER compared with MWDTW are observed in our extensive experiments on one representative SD dataset of four speakers' recordings. To the best of our knowledge, OAWDTW approach is the first weighted DTW specially designed for speech data in adverse conditions.

  7. Computer-generated speech

    Aimthikul, Y.

    1981-12-01

    This thesis reviews the essential aspects of speech synthesis and distinguishes between the two prevailing techniques: compressed digital speech and phonemic synthesis. It then presents the hardware details of the five speech modules evaluated. FORTRAN programs were written to facilitate message creation and retrieval with four of the modules driven by a PDP-11 minicomputer. The fifth module was driven directly by a computer terminal. The compressed digital speech modules (T.I. 990/306, T.S.I. Series 3D and N.S. Digitalker) each contain a limited vocabulary produced by the manufacturers while both the phonemic synthesizers made by Votrax permit an almost unlimited set of sounds and words. A text-to-phoneme rules program was adapted for the PDP-11 (running under the RSX-11M operating system) to drive the Votrax Speech Pac module. However, the Votrax Type'N Talk unit has its own built-in translator. Comparison of these modules revealed that the compressed digital speech modules were superior in pronouncing words on an individual basis but lacked the inflection capability that permitted the phonemic synthesizers to generate more coherent phrases. These findings were necessarily highly subjective and dependent on the specific words and phrases studied. In addition, the rapid introduction of new modules by manufacturers will necessitate new comparisons. However, the results of this research verified that all of the modules studied do possess reasonable quality of speech that is suitable for man-machine applications. Furthermore, the development tools are now in place to permit the addition of computer speech output in such applications.

  8. SPEECH DISORDERS ENCOUNTERED DURING SPEECH THERAPY AND THERAPY TECHNIQUES

    İlhan ERDEM

    2013-06-01

    Full Text Available Speech which is a physical and mental process, agreed signs and sounds to create a sense of mind to the message that change . Process to identify the sounds of speech it is essential to know the structure and function of various organs which allows to happen the conversation. Speech is a physical and mental process so many factors can lead to speech disorders. Speech disorder can be about language acquisitions as well as it can be caused medical and psychological many factors. Disordered speech, language, medical and psychological conditions as well as acquisitions also be caused by many factors. Speaking, is the collective work of many organs, such as an orchestra. Mental dimension of the speech disorder which is a very complex skill so it must be found which of these obstacles inhibit conversation. Speech disorder is a defect in speech flow, rhythm, tizliğinde, beats, the composition and vocalization. In this study, speech disorders such as articulation disorders, stuttering, aphasia, dysarthria, a local dialect speech, , language and lip-laziness, rapid speech peech defects in a term of language skills. This causes of speech disorders were investigated and presented suggestions for remedy was discussed.

  9. Practical speech user interface design

    Lewis, James R

    2010-01-01

    Although speech is the most natural form of communication between humans, most people find using speech to communicate with machines anything but natural. Drawing from psychology, human-computer interaction, linguistics, and communication theory, Practical Speech User Interface Design provides a comprehensive yet concise survey of practical speech user interface (SUI) design. It offers practice-based and research-based guidance on how to design effective, efficient, and pleasant speech applications that people can really use. Focusing on the design of speech user interfaces for IVR application

  10. Social Expectation Improves Speech Perception in Noise.

    McGowan, Kevin B

    2015-12-01

    Listeners' use of social information during speech perception was investigated by measuring transcription accuracy of Chinese-accented speech in noise while listeners were presented with a congruent Chinese face, an incongruent Caucasian face, or an uninformative silhouette. When listeners were presented with a Chinese face they transcribed more accurately than when presented with the Caucasian face. This difference existed both for listeners with a relatively high level of experience and for listeners with a relatively low level of experience with Chinese-accented English. Overall, these results are inconsistent with a model of social speech perception in which listener bias reduces attendance to the acoustic signal. These results are generally consistent with exemplar models of socially indexed speech perception predicting that activation of a social category will raise base activation levels of socially appropriate episodic traces, but the similar performance of more and less experienced listeners suggests the need for a more nuanced view with a role for both detailed experience and listener stereotypes. PMID:27483742

  11. Hybrid model decomposition of speech and noise in a radial basis function neural model framework

    Sørensen, Helge Bjarup Dissing; Hartmann, Uwe

    1994-01-01

    The aim of the paper is to focus on a new approach to automatic speech recognition in noisy environments where the noise has either stationary or non-stationary statistical characteristics. The aim is to perform automatic recognition of speech in the presence of additive car noise. The technique...... applied is based on a combination of the hidden Markov model (HMM) decomposition method, for speech recognition in noise, developed by Varga and Moore (1990) from DRA and the hybrid (HMM/RBF) recognizer containing hidden Markov models and radial basis function (RBF) neural networks, developed by Singer...... and Lippmann (1992) from MIT Lincoln Lab. The present authors modified the hybrid recognizer to fit into the decomposition method to achieve high performance speech recognition in noisy environments. The approach has been denoted the hybrid model decomposition method and it provides an optimal method...

  12. Improving the speech intelligibility in classrooms

    Lam, Choi Ling Coriolanus

    One of the major acoustical concerns in classrooms is the establishment of effective verbal communication between teachers and students. Non-optimal acoustical conditions, resulting in reduced verbal communication, can cause two main problems. First, they can lead to reduce learning efficiency. Second, they can also cause fatigue, stress, vocal strain and health problems, such as headaches and sore throats, among teachers who are forced to compensate for poor acoustical conditions by raising their voices. Besides, inadequate acoustical conditions can induce the usage of public address system. Improper usage of such amplifiers or loudspeakers can lead to impairment of students' hearing systems. The social costs of poor classroom acoustics will be large to impair the learning of children. This invisible problem has far reaching implications for learning, but is easily solved. Many researches have been carried out that they have accurately and concisely summarized the research findings on classrooms acoustics. Though, there is still a number of challenging questions remaining unanswered. Most objective indices for speech intelligibility are essentially based on studies of western languages. Even several studies of tonal languages as Mandarin have been conducted, there is much less on Cantonese. In this research, measurements have been done in unoccupied rooms to investigate the acoustical parameters and characteristics of the classrooms. The speech intelligibility tests, which based on English, Mandarin and Cantonese, and the survey were carried out on students aged from 5 years old to 22 years old. It aims to investigate the differences in intelligibility between English, Mandarin and Cantonese of the classrooms in Hong Kong. The significance on speech transmission index (STI) related to Phonetically Balanced (PB) word scores will further be developed. Together with developed empirical relationship between the speech intelligibility in classrooms with the variations

  13. Accurate guitar tuning by cochlear implant musicians.

    Thomas Lu

    Full Text Available Modern cochlear implant (CI users understand speech but find difficulty in music appreciation due to poor pitch perception. Still, some deaf musicians continue to perform with their CI. Here we show unexpected results that CI musicians can reliably tune a guitar by CI alone and, under controlled conditions, match simultaneously presented tones to <0.5 Hz. One subject had normal contralateral hearing and produced more accurate tuning with CI than his normal ear. To understand these counterintuitive findings, we presented tones sequentially and found that tuning error was larger at ∼ 30 Hz for both subjects. A third subject, a non-musician CI user with normal contralateral hearing, showed similar trends in performance between CI and normal hearing ears but with less precision. This difference, along with electric analysis, showed that accurate tuning was achieved by listening to beats rather than discriminating pitch, effectively turning a spectral task into a temporal discrimination task.

  14. Accurate guitar tuning by cochlear implant musicians.

    Lu, Thomas; Huang, Juan; Zeng, Fan-Gang

    2014-01-01

    Modern cochlear implant (CI) users understand speech but find difficulty in music appreciation due to poor pitch perception. Still, some deaf musicians continue to perform with their CI. Here we show unexpected results that CI musicians can reliably tune a guitar by CI alone and, under controlled conditions, match simultaneously presented tones to <0.5 Hz. One subject had normal contralateral hearing and produced more accurate tuning with CI than his normal ear. To understand these counterintuitive findings, we presented tones sequentially and found that tuning error was larger at ∼ 30 Hz for both subjects. A third subject, a non-musician CI user with normal contralateral hearing, showed similar trends in performance between CI and normal hearing ears but with less precision. This difference, along with electric analysis, showed that accurate tuning was achieved by listening to beats rather than discriminating pitch, effectively turning a spectral task into a temporal discrimination task. PMID:24651081

  15. Speech recognition: Acoustic, phonetic and lexical knowledge

    Zue, V. W.

    1985-08-01

    During this reporting period we continued to make progress on the acquisition of acoustic-phonetic and lexical knowledge. We completed development of a continuous digit recognition system. The system was constructed to investigate the use of acoustic-phonetic knowledge in a speech recognition system. The significant achievements of this study include the development of a soft-failure procedure for lexical access and the discovery of a set of acoustic-phonetic features for verification. We completed a study of the constraints that lexical stress imposes on word recognition. We found that lexical stress information alone can, on the average, reduce the number of word candidates from a large dictionary by more than 80 percent. In conjunction with this study, we successfully developed a system that automatically determines the stress pattern of a word from the acoustic signal. We performed an acoustic study on the characteristics of nasal consonants and nasalized vowels. We have also developed recognition algorithms for nasal murmurs and nasalized vowels in continuous speech. We finished the preliminary development of a system that aligns a speech waveform with the corresponding phonetic transcription.

  16. Comparative wavelet, PLP, and LPC speech recognition techniques on the Hindi speech digits database

    Mishra, A. N.; Shrotriya, M. C.; Sharan, S. N.

    2010-02-01

    In view of the growing use of automatic speech recognition in the modern society, we study various alternative representations of the speech signal that have the potential to contribute to the improvement of the recognition performance. In this paper wavelet based features using different wavelets are used for Hindi digits recognition. The recognition performance of these features has been compared with Linear Prediction Coefficients (LPC) and Perceptual Linear Prediction (PLP) features. All features have been tested using Hidden Markov Model (HMM) based classifier for speaker independent Hindi digits recognition. The recognition performance of PLP features is11.3% better than LPC features. The recognition performance with db10 features has shown a further improvement of 12.55% over PLP features. The recognition performance with db10 is best among all wavelet based features.

  17. Constructing a Deep Neural Network based Spectral Model for Statistical Speech Synthesis

    Takaki, Shinji; Yamagishi, Junichi

    2015-01-01

    This paper proposes a deep denoising auto-encoder technique to extract better acoustic features for speech synthesis. The technique allows us to automatically extract low-dimensional features from high dimensional spectral features in a non-linear, data-driven, unsupervised way. We compared the new stochastic feature extractor with conventional mel-cepstral analysis in analysis-by-synthesis and text-to-speech experiments. Our results confirm that the proposed method increases the quality of s...

  18. Mandarin Digits Speech Recognition Using Support Vector Machines

    XIE Xiang; KUANG Jing-ming

    2005-01-01

    A method of applying support vector machine (SVM) in speech recognition was proposed, and a speech recognition system for mandarin digits was built up by SVMs. In the system, vectors were linearly extracted from speech feature sequence to make up time-aligned input patterns for SVM, and the decisions of several 2-class SVM classifiers were employed for constructing an N-class classifier. Four kinds of SVM kernel functions were compared in the experiments of speaker-independent speech recognition of mandarin digits. And the kernel of radial basis function has the highest accurate rate of 99.33%, which is better than that of the baseline system based on hidden Markov models (HMM) (97.08%). And the experiments also show that SVM can outperform HMM especially when the samples for learning were very limited.

  19. Assessment of speech intelligibility in background noise and reverberation

    Nielsen, Jens Bo

    important deviation between the two new tests and the original HINT is a new procedure used for equalizing the intelligibility of the speech material during the development process. This procedure produces more accurately equalized sentences than achieved with the original HINT procedure. This study also......Reliable methods for assessing speech intelligibility are essential within hearing research, audiology, and related areas. Such methods can be used for obtaining a better understanding of how speech intelligibility is affected by, e.g., various environmental factors or different types of hearing...... impairment. In this thesis, two sentence-based tests for speech intelligibility in Danish were developed. The first test is the Conversational Language Understanding Evaluation (CLUE), which is based on the principles of the original American-English Hearing in Noise Test (HINT). The second test is a...

  20. Analysis of Phonetic Transcriptions for Danish Automatic Speech Recognition

    Kirkedal, Andreas Søeborg

    recognition system depends heavily on the dictionary and the transcriptions therein. This paper presents an analysis of phonetic/phonemic features that are salient for current Danish ASR systems. This preliminary study consists of a series of experiments using an ASR system trained on the DK-PAROLE corpus....... The analysis indicates that transcribing e.g. stress or vowel duration has a negative impact on performance. The best performance is obtained with coarse phonetic annotation and improves performance 1% word error rate and 3.8% sentence error rate....

  1. Automatic Error Detection in Part of Speech Tagging

    Elworthy, D

    1994-01-01

    A technique for detecting errors made by Hidden Markov Model taggers is described, based on comparing observable values of the tagging process with a threshold. The resulting approach allows the accuracy of the tagger to be improved by accepting a lower efficiency, defined as the proportion of words which are tagged. Empirical observations are presented which demonstrate the validity of the technique and suggest how to choose an appropriate threshold.

  2. Statistical Language Modeling for Automatic Speech Recognition of Agglutinative Languages

    Ar&#;soy, Ebru; Kurimo, Mikko; Sara&#;lar, Murat; Hirsim&#;ki, Teemu; Pylkk&#;nen, Janne; Alum&#;e, Tanel; Sak, Ha&#;im

    2008-01-01

    This work presents statistical language models trained on different agglutinative languages utilizing a lexicon based on the recently proposed unsupervised statistical morphs. The significance of this work is that similarly generated sub-word unit lexica are developed and successfully evaluated in three different LVCSR systems in different languages. In each case the morph-based approach is at least as good or better than a very large vocabulary wordbased LVCSR language model. Even though usi...

  3. Robust Speech/Non-Speech Classification in Heterogeneous Multimedia Content

    Huijbregts, Marijn; Jong, de Franciska

    2011-01-01

    In this paper we present a speech/non-speech classification method that allows high quality classification without the need to know in advance what kinds of audible non-speech events are present in an audio recording and that does not require a single parameter to be tuned on in-domain data. Because

  4. Use of digitized speech materials in audiological research.

    Kamm, C; Carterette, E C; Morgan, D E; Dirks, D D

    1980-12-01

    Computer methods, based on the theory and application of signal processing, combined with numerical methods for simulating mathematical processes, facilitate greater objectivity in most aspects of speech intelligibility testing, including specification of the stimuli, control of the tests, evaluation of the responses, corrective feedback, and automatic interpretation. This paper discusses several basic issues in digital signal processing, and also describes the application of computer-aided procedures for recording and delivery of speech materials for audiologic research. Examples of the use of computer procedures for manipulation of digitized stimuli demonstrate the increased efficiency and versatility of these procedures compared to more conventional tape recording methods. In addition, the use of digitized recordings allows more reliable specification of speech levels than conventional calibration methods involving observations of signal peaks on a VU-meter. PMID:7442206

  5. Server-based Speech Technologies for Mobile Robotic Applications

    ONDAS Stanislav

    2013-05-01

    Full Text Available Paper proposes the server-based technologies and the overall solution of the multimodal interface (speech and touchscreen usable for mobileapplications in robotics as well as in other domain. The server-based automatic speech recognition server, able to handle several audio input streams, has been designed, developed and connected to the Android application. It receives input data stream and sends back the recognition result. The second important technology was designed and implemented to synthesize artificial speech. The server-based TTSsolution was prepared and connected. HMM-based approach was applied including recording and training of new voices. Finally, the simple client application for Android devices was developed and tested. Thediscussion of related problems is proposed also in the paper.

  6. Exploring expressivity and emotion with artificial voice and speech technologies.

    Pauletto, Sandra; Balentine, Bruce; Pidcock, Chris; Jones, Kevin; Bottaci, Leonardo; Aretoulaki, Maria; Wells, Jez; Mundy, Darren P; Balentine, James

    2013-10-01

    Emotion in audio-voice signals, as synthesized by text-to-speech (TTS) technologies, was investigated to formulate a theory of expression for user interface design. Emotional parameters were specified with markup tags, and the resulting audio was further modulated with post-processing techniques. Software was then developed to link a selected TTS synthesizer with an automatic speech recognition (ASR) engine, producing a chatbot that could speak and listen. Using these two artificial voice subsystems, investigators explored both artistic and psychological implications of artificial speech emotion. Goals of the investigation were interdisciplinary, with interest in musical composition, augmentative and alternative communication (AAC), commercial voice announcement applications, human-computer interaction (HCI), and artificial intelligence (AI). The work-in-progress points towards an emerging interdisciplinary ontology for artificial voices. As one study output, HCI tools are proposed for future collaboration. PMID:24024543

  7. Denial Denied: Freedom of Speech

    Glen Newey

    2009-12-01

    Full Text Available Free speech is a widely held principle. This is in some ways surprising, since formal and informal censorship of speech is widespread, and rather different issues seem to arise depending on whether the censorship concerns who speaks, what content is spoken or how it is spoken. I argue that despite these facts, free speech can indeed be seen as a unitary principle. On my analysis, the core of the free speech principle is the denial of the denial of speech, whether to a speaker, to a proposition, or to a mode of expression. Underlying free speech is the principle of freedom of association, according to which speech is both a precondition of future association (e.g. as a medium for negotiation and a mode of association in its own right. I conclude by applying this account briefly to two contentious issues: hate speech and pornography.

  8. Speech spectrogram expert

    Johannsen, J.; Macallister, J.; Michalek, T.; Ross, S.

    1983-01-01

    Various authors have pointed out that humans can become quite adept at deriving phonetic transcriptions from speech spectrograms (as good as 90percent accuracy at the phoneme level). The authors describe an expert system which attempts to simulate this performance. The speech spectrogram expert (spex) is actually a society made up of three experts: a 2-dimensional vision expert, an acoustic-phonetic expert, and a phonetics expert. The visual reasoning expert finds important visual features of the spectrogram. The acoustic-phonetic expert reasons about how visual features relates to phonemes, and about how phonemes change visually in different contexts. The phonetics expert reasons about allowable phoneme sequences and transformations, and deduces an english spelling for phoneme strings. The speech spectrogram expert is highly interactive, allowing users to investigate hypotheses and edit rules. 10 references.

  9. Punctuation in Quoted Speech

    Doran, C F

    1996-01-01

    Quoted speech is often set off by punctuation marks, in particular quotation marks. Thus, it might seem that the quotation marks would be extremely useful in identifying these structures in texts. Unfortunately, the situation is not quite so clear. In this work, I will argue that quotation marks are not adequate for either identifying or constraining the syntax of quoted speech. More useful information comes from the presence of a quoting verb, which is either a verb of saying or a punctual verb, and the presence of other punctuation marks, usually commas. Using a lexicalized grammar, we can license most quoting clauses as text adjuncts. A distinction will be made not between direct and indirect quoted speech, but rather between adjunct and non-adjunct quoting clauses.

  10. Protection limits on free speech

    李敏

    2014-01-01

    Freedom of speech is one of the basic rights of citizens should receive broad protection, but in the real context of China under what kind of speech can be protected and be restricted, how to grasp between state power and free speech limit is a question worth considering. People tend to ignore the freedom of speech and its function, so that some of the rhetoric cannot be demonstrated in the open debates.

  11. An attention-gating recurrent working memory architecture for emergent speech representation

    Elshaw, Mark; Moore, Roger K.; Klein, Michael

    2010-06-01

    This paper describes an attention-gating recurrent self-organising map approach for emergent speech representation. Inspired by evidence from human cognitive processing, the architecture combines two main neural components. The first component, the attention-gating mechanism, uses actor-critic learning to perform selective attention towards speech. Through this selective attention approach, the attention-gating mechanism controls access to working memory processing. The second component, the recurrent self-organising map memory, develops a temporal-distributed representation of speech using phone-like structures. Representing speech in terms of phonetic features in an emergent self-organised fashion, according to research on child cognitive development, recreates the approach found in infants. Using this representational approach, in a fashion similar to infants, should improve the performance of automatic recognition systems through aiding speech segmentation and fast word learning.

  12. The University and Free Speech

    Grcic, Joseph

    2014-01-01

    Free speech is a necessary condition for the growth of knowledge and the implementation of real and rational democracy. Educational institutions play a central role in socializing individuals to function within their society. Academic freedom is the right to free speech in the context of the university and tenure, properly interpreted, is a necessary component of protecting academic freedom and free speech.

  13. Speech in Parkinson's disease

    Širca, Patricija

    2012-01-01

    The thesis presents an analysis of speech of four male subjects with a diagnosis of Parkinson's disease associated with dementia. The analysis was performed on the record of the description of each one. All persons were asked to describe the scene in the picture, taken from the Boston test for aphasia, entitled: The cookie theft. Description was shot with a recorder and then converted to written words. With the help of pre-prepared check list, the speech has been properly evaluated. Each w...

  14. SEMI-AUTOMATIC SPEAKER VERIFICATION SYSTEM

    E. V. Bulgakova

    2016-03-01

    Full Text Available Subject of Research. The paper presents a semi-automatic speaker verification system based on comparing of formant values, statistics of phone lengths and melodic characteristics as well. Due to the development of speech technology, there is an increased interest now in searching for expert speaker verification systems, which have high reliability and low labour intensiveness because of the automation of data processing for the expert analysis. System Description. We present a description of a novel system analyzing similarity or distinction of speaker voices based on comparing statistics of phone lengths, formant features and melodic characteristics. The characteristic feature of the proposed system based on fusion of methods is a weak correlation between the analyzed features that leads to a decrease in the error rate of speaker recognition. The system advantage is the possibility to carry out rapid analysis of recordings since the processes of data preprocessing and making decision are automated. We describe the functioning methods as well as fusion of methods to combine their decisions. Main Results. We have tested the system on the speech database of 1190 target trials and 10450 non-target trials, including the Russian speech of the male and female speakers. The recognition accuracy of the system is 98.59% on the database containing records of the male speech, and 96.17% on the database containing records of the female speech. It was also experimentally established that the formant method is the most reliable of all used methods. Practical Significance. Experimental results have shown that proposed system is applicable for the speaker recognition task in the course of phonoscopic examination.

  15. Automatic measurement and representation of prosodic features

    Ying, Goangshiuan Shawn

    Effective measurement and representation of prosodic features of the acoustic signal for use in automatic speech recognition and understanding systems is the goal of this work. Prosodic features-stress, duration, and intonation-are variations of the acoustic signal whose domains are beyond the boundaries of each individual phonetic segment. Listeners perceive prosodic features through a complex combination of acoustic correlates such as intensity, duration, and fundamental frequency (F0). We have developed new tools to measure F0 and intensity features. We apply a probabilistic global error correction routine to an Average Magnitude Difference Function (AMDF) pitch detector. A new short-term frequency-domain Teager energy algorithm is used to measure the energy of a speech signal. We have conducted a series of experiments performing lexical stress detection on words in continuous English speech from two speech corpora. We have experimented with two different approaches, a segment-based approach and a rhythm unit-based approach, in lexical stress detection. The first approach uses pattern recognition with energy- and duration-based measurements as features to build Bayesian classifiers to detect the stress level of a vowel segment. In the second approach we define rhythm unit and use only the F0-based measurement and a scoring system to determine the stressed segment in the rhythm unit. A duration-based segmentation routine was developed to break polysyllabic words into rhythm units. The long-term goal of this work is to develop a system that can effectively detect the stress pattern for each word in continuous speech utterances. Stress information will be integrated as a constraint for pruning the word hypotheses in a word recognition system based on hidden Markov models.

  16. Evaluation of cleft palate speech.

    Smith, Bonnie; Guyette, Thomas W

    2004-04-01

    Children born with palatal clefts are at risk for speech/language delay and speech problems related to palatal insufficiency. These individuals require regular speech evaluations, starting in the first year of life and often continuing into adulthood. The primary role of the speech pathologist on the cleft palate/craniofacial team is to evaluate whether deviations in oral cavity structures, such as the velopharynx, negatively impact speech production. This article focuses on the assessment of velopharyngeal function before and after palatal surgery. PMID:15145667

  17. Denial Denied: Freedom of Speech

    Glen Newey

    2009-01-01

    Free speech is a widely held principle. This is in some ways surprising, since formal and informal censorship of speech is widespread, and rather different issues seem to arise depending on whether the censorship concerns who speaks, what content is spoken or how it is spoken. I argue that despite these facts, free speech can indeed be seen as a unitary principle. On my analysis, the core of the free speech principle is the denial of the denial of speech, whether to a speaker, to a propositio...

  18. Packet speech systems technology

    Weinstein, C. J.; Blankenship, P. E.

    1982-09-01

    The long-range objectives of the Packet Speech Systems Technology Program are to develop and demonstrate techniques for efficient digital speech communications on networks suitable for both voice and data, and to investigate and develop techniques for integrated voice and data communication in packetized networks, including wideband common-user satellite links. Specific areas of concern are: the concentration of statistically fluctuating volumes of voice traffic, the adaptation of communication strategies to varying conditions of network links and traffic volume, and the interconnection of wideband satellite networks to terrestrial systems. Previous efforts in this area have led to new vocoder structures for improved narrowband voice performance and multiple-rate transmission, and to demonstrations of conversational speech and conferencing on the ARPANET and the Atlantic Packet Satellite Network. The current program has two major thrusts: i.e., the development and refinement of practical low-cost, robust, narrowband, and variable-rate speech algorithms and voice terminal structures; and the establishment of an experimental wideband satellite network to serve as a unique facility for the realistic investigation of voice/data networking strategies.

  19. Black History Speech

    Noldon, Carl

    2007-01-01

    The author argues in this speech that one cannot expect students in the school system to know and understand the genius of Black history if the curriculum is Eurocentric, which is a residue of racism. He states that his comments are designed for the enlightenment of those who suffer from a school system that "hypocritically manipulates Black…

  20. Hearing speech in music

    Seth-Reino Ekström

    2011-01-01

    Full Text Available The masking effect of a piano composition, played at different speeds and in different octaves, on speech-perception thresholds was investigated in 15 normal-hearing and 14 moderately-hearing-impaired subjects. Running speech (just follow conversation, JFC testing and use of hearing aids increased the everyday validity of the findings. A comparison was made with standard audiometric noises [International Collegium of Rehabilitative Audiology (ICRA noise and speech spectrum-filtered noise (SPN]. All masking sounds, music or noise, were presented at the same equivalent sound level (50 dBA. The results showed a significant effect of piano performance speed and octave (P<.01. Low octave and fast tempo had the largest effect; and high octave and slow tempo, the smallest. Music had a lower masking effect than did ICRA noise with two or six speakers at normal vocal effort (P<.01 and SPN (P<.05. Subjects with hearing loss had higher masked thresholds than the normal-hearing subjects (P<.01, but there were smaller differences between masking conditions (P<.01. It is pointed out that music offers an interesting opportunity for studying masking under realistic conditions, where spectral and temporal features can be varied independently. The results have implications for composing music with vocal parts, designing acoustic environments and creating a balance between speech perception and privacy in social settings.

  1. Free Speech Yearbook 1979.

    Kane, Peter E., Ed.

    The seven articles in this collection deal with theoretical and practical freedom of speech issues. Topics covered are: the United States Supreme Court, motion picture censorship, and the color line; judicial decision making; the established scientific community's suppression of the ideas of Immanuel Velikovsky; the problems of avant-garde jazz,…

  2. Charisma in business speeches

    Niebuhr, Oliver; Brem, Alexander; Novák-Tót, Eszter;

    2016-01-01

    Charisma is a key component of spoken language interaction; and it is probably for this reason that charismatic speech has been the subject of intensive research for centuries. However, what is still largely missing is a quantitative and objective line of research that, firstly, involves analyses...

  3. Hearing speech in music.

    Ekström, Seth-Reino; Borg, Erik

    2011-01-01

    The masking effect of a piano composition, played at different speeds and in different octaves, on speech-perception thresholds was investigated in 15 normal-hearing and 14 moderately-hearing-impaired subjects. Running speech (just follow conversation, JFC) testing and use of hearing aids increased the everyday validity of the findings. A comparison was made with standard audiometric noises [International Collegium of Rehabilitative Audiology (ICRA) noise and speech spectrum-filtered noise (SPN)]. All masking sounds, music or noise, were presented at the same equivalent sound level (50 dBA). The results showed a significant effect of piano performance speed and octave (PMusic had a lower masking effect than did ICRA noise with two or six speakers at normal vocal effort (Pmusic offers an interesting opportunity for studying masking under realistic conditions, where spectral and temporal features can be varied independently. The results have implications for composing music with vocal parts, designing acoustic environments and creating a balance between speech perception and privacy in social settings. PMID:21768731

  4. 1984 Newbery Acceptance Speech.

    Cleary, Beverly

    1984-01-01

    This acceptance speech for an award honoring "Dear Mr. Henshaw," a book about feelings of a lonely child of divorce intended for eight-, nine-, and ten-year-olds, highlights children's letters to author. Changes in society that affect children, the inception of "Dear Mr. Henshaw," and children's reactions to books are highlighted. (EJS)

  5. Speech intelligibility measure for vocal control of an automaton

    Naranjo, Michel; Tsirigotis, Georgios

    1998-07-01

    The acceleration of investigations in Speech Recognition allows to augur, in the next future, a wide establishment of Vocal Control Systems in the production units. The communication between a human and a machine necessitates technical devices that emit, or are submitted to important noise perturbations. The vocal interface introduces a new control problem of a deterministic automaton using uncertain information. The purpose is to place exactly the automaton in a final state, ordered by voice, from an unknown initial state. The whole Speech Processing procedure, presented in this paper, has for input the temporal speech signal of a word and for output a recognised word labelled with an intelligibility index given by the recognition quality. In the first part, we present the essential psychoacoustic concepts for the automatic calculation of the loudness of a speech signal. The architecture of a Time Delay Neural Network is presented in second part where we also give the results of the recognition. The theory of the fuzzy subset, in third part, allows to extract at the same time a recognised word and its intelligibility index. In the fourth part, an Anticipatory System models the control of a Sequential Machine. A prediction phase and an updating one appear which involve data coming from the information system. A Bayesian decision strategy is used and the criterion is a weighted sum of criteria defined from information, minimum path functions and speech intelligibility measure.

  6. Coding pitch differences in voiceless fricatives: Whispered relative to normal speech.

    Heeren, Willemijn F L

    2015-12-01

    Intonation can be perceived in whispered speech despite the absence of the fundamental frequency. In the past, acoustic correlates of pitch in whisper have been sought in vowel content, but, recently, studies of normal speech demonstrated correlates of intonation in consonants as well. This study examined how consonants may contribute to the coding of intonation in whispered relative to normal speech. The acoustic characteristics of whispered, voiceless fricatives /s/ and /f/, produced at different pitch targets (low, mid, high), were investigated and compared to corresponding normal speech productions to assess if whisper contained secondary or compensatory pitch correlates. Furthermore, listener sensitivity to fricative cues to pitch in whisper was established, also relative to normal speech. Consistent with recent studies, acoustic correlates of whispered and normal speech fricatives systematically varied with pitch target. Comparable findings across speech modes showed that acoustic correlates were secondary. Discrimination of vowel-fricative-vowel stimuli was less accurate and slower in whispered than normal speech, which is attributed to differences in acoustic cues available. Perception of fricatives presented without their vowel contexts, however, revealed comparable processing speeds and response accuracies between speech modes, supporting the finding that within fricatives, acoustic correlates of pitch are similar across speech modes. PMID:26723300

  7. Metaheuristic applications to speech enhancement

    Kunche, Prajna

    2016-01-01

    This book serves as a basic reference for those interested in the application of metaheuristics to speech enhancement. The major goal of the book is to explain the basic concepts of optimization methods and their use in heuristic optimization in speech enhancement to scientists, practicing engineers, and academic researchers in speech processing. The authors discuss why it has been a challenging problem for researchers to develop new enhancement algorithms that aid in the quality and intelligibility of degraded speech. They present powerful optimization methods to speech enhancement that can help to solve the noise reduction problems. Readers will be able to understand the fundamentals of speech processing as well as the optimization techniques, how the speech enhancement algorithms are implemented by utilizing optimization methods, and will be given the tools to develop new algorithms. The authors also provide a comprehensive literature survey regarding the topic.

  8. Discrete Model Reference Adaptive Control System for Automatic Profiling Machine

    Peng Song; Guo-kai Xu; Xiu-chun Zhao

    2012-01-01

    Automatic profiling machine is a movement system that has a high degree of parameter variation and high frequency of transient process, and it requires an accurate control in time. In this paper, the discrete model reference adaptive control system of automatic profiling machine is discussed. Firstly, the model of automatic profiling machine is presented according to the parameters of DC motor. Then the design of the discrete model reference adaptive control is proposed, and the control rules...

  9. Classification of cooperative and competitive overlaps in speech using cues from the context, overlapper, and overlappee

    Truong, Khiet P.

    2013-01-01

    One of the major properties of overlapping speech is that it can be perceived as competitive or cooperative. For the development of real-time spoken dialog systems and the analysis of affective and social human behavior in conversations, it is important to (automatically) distinguish between these two types of overlap. We investigate acoustic characteristics of cooperative and competitive overlaps with the aim to develop automatic classifiers for the classification of overlaps. In addition to...

  10. The cortical representation of the speech envelope is earlier for audiovisual speech than audio speech.

    Crosse, Michael J; Lalor, Edmund C

    2014-04-01

    Visual speech can greatly enhance a listener's comprehension of auditory speech when they are presented simultaneously. Efforts to determine the neural underpinnings of this phenomenon have been hampered by the limited temporal resolution of hemodynamic imaging and the fact that EEG and magnetoencephalographic data are usually analyzed in response to simple, discrete stimuli. Recent research has shown that neuronal activity in human auditory cortex tracks the envelope of natural speech. Here, we exploit this finding by estimating a linear forward-mapping between the speech envelope and EEG data and show that the latency at which the envelope of natural speech is represented in cortex is shortened by >10 ms when continuous audiovisual speech is presented compared with audio-only speech. In addition, we use a reverse-mapping approach to reconstruct an estimate of the speech stimulus from the EEG data and, by comparing the bimodal estimate with the sum of the unimodal estimates, find no evidence of any nonlinear additive effects in the audiovisual speech condition. These findings point to an underlying mechanism that could account for enhanced comprehension during audiovisual speech. Specifically, we hypothesize that low-level acoustic features that are temporally coherent with the preceding visual stream may be synthesized into a speech object at an earlier latency, which may provide an extended period of low-level processing before extraction of semantic information. PMID:24401714

  11. Superior Speech Acquisition and Robust Automatic Speech Recognition for Integrated Spacesuit Audio Systems Project

    National Aeronautics and Space Administration — Astronauts suffer from poor dexterity of their hands due to the clumsy spacesuit gloves during Extravehicular Activity (EVA) operations and NASA has had a widely...

  12. Accurate Finite Difference Algorithms

    Goodrich, John W.

    1996-01-01

    Two families of finite difference algorithms for computational aeroacoustics are presented and compared. All of the algorithms are single step explicit methods, they have the same order of accuracy in both space and time, with examples up to eleventh order, and they have multidimensional extensions. One of the algorithm families has spectral like high resolution. Propagation with high order and high resolution algorithms can produce accurate results after O(10(exp 6)) periods of propagation with eight grid points per wavelength.

  13. Analytical Study on Fundamental Frequency Contours of Thai Expressive Speech Using Fujisaki's Model

    Suphattharachai Chomphan

    2010-01-01

    Full Text Available Problem statement: In spontaneous speech communication, prosody is an important factor that must be taken into account, since the prosody effects on not only the naturalness but also the intelligibility of speech. Focusing on synthesis of Thai expressive speech, a number of systems has been developed for years. However, the expressive speech with various speaking styles has not been accomplished. To achieve the generation of expressive speech, we need to model the fundamental frequency (F0 contours accurately to preserve the speech prosody. Approach: Therefore this study proposes an analysis of model parameters for Thai speech prosody with three speaking styles and two genders which is a preliminary work for speech synthesis. Fujisaki's modeling; a powerful tool to model the F0 contour has been adopted, while the speaking styles of happiness, sadness and reading have been considered. Seven derived parameters from the Fujisaki's model are as follows. The first parameter is baseline frequency which is the lowest level of F0 contour. The second and third parameters are the numbers of phrase commands and tone commands which reflect the frequencies of surges of the utterance in global and local levels, respectively. The fourth and fifth parameters are phrase command and tone command durations which reflect the speed of speaking and the length of a syllable, respectively. The sixth and seventh parameters are amplitudes of phrase command and tone command which reflect the energy of the global speech and the energy of local syllable. Results: In the experiments, each speaking style includes 200 samples of one sentence with male and female speech. Therefore our speech database contains 1200 utterances in total. The results show that most of the proposed parameters can distinguish three kinds of speaking styles explicitly. Conclusion: From the finding, it is a strong evidence to further apply the successful parameters in the speech synthesis systems or

  14. Predicting speech intelligibility in conditions with nonlinearly processed noisy speech

    Jørgensen, Søren; Dau, Torsten

    2013-01-01

    that a measure of the across audio-frequency variance at the output of the modulation-frequency selective process in the model is sufficient to account for the phase jitter distortion. Thus, a joint spectro-temporal modulation analysis, as proposed in [3], does not seem to be required. The results are......The speech-based envelope power spectrum model (sEPSM; [1]) was proposed in order to overcome the limitations of the classical speech transmission index (STI) and speech intelligibility index (SII). The sEPSM applies the signal-tonoise ratio in the envelope domain (SNRenv), which was demonstrated...... to successfully predict speech intelligibility in conditions with nonlinearly processed noisy speech, such as processing with spectral subtraction. Moreover, a multiresolution version (mr-sEPSM) was demonstrated to account for speech intelligibility in various conditions with stationary and...

  15. Direct and indirect measures of speech articulator motions using low power EM sensors

    Barnes, T; Burnett, G; Gable, T; Holzrichter, J F; Ng, L

    1999-05-12

    Low power Electromagnetic (EM) Wave sensors can measure general properties of human speech articulator motions, as speech is produced. See Holzrichter, Burnett, Ng, and Lea, J.Acoust.Soc.Am. 103 (1) 622 (1998). Experiments have demonstrated extremely accurate pitch measurements (< 1 Hz per pitch cycle) and accurate onset of voiced speech. Recent measurements of pressure-induced tracheal motions enable very good spectra and amplitude estimates of a voiced excitation function. The use of the measured excitation functions and pitch synchronous processing enable the determination of each pitch cycle of an accurate transfer function and, indirectly, of the corresponding articulator motions. In addition, direct measurements have been made of EM wave reflections from articulator interfaces, including jaw, tongue, and palate, simultaneously with acoustic and glottal open/close signals. While several types of EM sensors are suitable for speech articulator measurements, the homodyne sensor has been found to provide good spatial and temporal resolution for several applications.

  16. Hate Speech: Power in the Marketplace.

    Harrison, Jack B.

    1994-01-01

    A discussion of hate speech and freedom of speech on college campuses examines the difference between hate speech from normal, objectionable interpersonal comments and looks at Supreme Court decisions on the limits of student free speech. Two cases specifically concerning regulation of hate speech on campus are considered: Chaplinsky v. New…

  17. A deep auto-encoder based low-dimensional feature extraction from FFT spectral envelopes for statistical parametric speech synthesis

    Takaki, Shinji; Yamagishi, Junichi

    2016-01-01

    In the state-of-the-art statistical parametric speech synthesis system, a speech analysis module, e.g. STRAIGHT spectral analysis, is generally used for obtaining accurate and stable spectral envelopes, and then low-dimensional acoustic features extracted from obtained spectral envelopes are used for training acoustic models. However, a spectral envelope estimation algorithm used in such a speech analysis module includes various processing derived from human knowledge. In this paper, we prese...

  18. CrowdInside: Automatic Construction of Indoor Floorplans

    Alzantot, Moustafa; Youssef, Moustafa

    2012-01-01

    The existence of a worldwide indoor floorplans database can lead to significant growth in location-based applications, especially for indoor environments. In this paper, we present CrowdInside: a crowdsourcing-based system for the automatic construction of buildings floorplans. CrowdInside leverages the smart phones sensors that are ubiquitously available with humans who use a building to automatically and transparently construct accurate motion traces. These accurate traces are generated bas...

  19. An acoustic feature-based similarity scoring system for speech rehabilitation assistance.

    Syauqy, Dahnial; Wu, Chao-Min; Setyawati, Onny

    2016-08-01

    The purpose of this study is to develop a tool to assist speech therapy and rehabilitation, which focused on automatic scoring based on the comparison of the patient's speech with another normal speech on several aspects including pitch, vowel, voiced-unvoiced segments, strident fricative and sound intensity. The pitch estimation employed the use of cepstrum-based algorithm for its robustness; the vowel classification used multilayer perceptron (MLP) to classify vowel from pitch and formants; and the strident fricative detection was based on the major peak spectral intensity, location and the pitch existence in the segment. In order to evaluate the performance of the system, this study analyzed eight patient's speech recordings (four males, four females; 4-58-years-old), which had been recorded in previous study in cooperation with Taipei Veterans General Hospital and Taoyuan General Hospital. The experiment result on pitch algorithm showed that the cepstrum method had 5.3% of gross pitch error from a total of 2086 frames. On the vowel classification algorithm, MLP method provided 93% accuracy (men), 87% (women) and 84% (children). In total, the overall results showed that 156 tool's grading results (81%) were consistent compared to 192 audio and visual observations done by four experienced respondents. Implication for Rehabilitation Difficulties in communication may limit the ability of a person to transfer and exchange information. The fact that speech is one of the primary means of communication has encouraged the needs of speech diagnosis and rehabilitation. The advances of technology in computer-assisted speech therapy (CAST) improve the quality, time efficiency of the diagnosis and treatment of the disorders. The present study attempted to develop tool to assist speech therapy and rehabilitation, which provided simple interface to let the assessment be done even by the patient himself without the need of particular knowledge of speech processing while at the

  20. Variation and Synthetic Speech

    Miller, C; Massey, N; Miller, Corey; Karaali, Orhan; Massey, Noel

    1997-01-01

    We describe the approach to linguistic variation taken by the Motorola speech synthesizer. A pan-dialectal pronunciation dictionary is described, which serves as the training data for a neural network based letter-to-sound converter. Subsequent to dictionary retrieval or letter-to-sound generation, pronunciations are submitted a neural network based postlexical module. The postlexical module has been trained on aligned dictionary pronunciations and hand-labeled narrow phonetic transcriptions. This architecture permits the learning of individual postlexical variation, and can be retrained for each speaker whose voice is being modeled for synthesis. Learning variation in this way can result in greater naturalness for the synthetic speech that is produced by the system.

  1. [Improving speech comprehension using a new cochlear implant speech processor].

    Müller-Deile, J; Kortmann, T; Hoppe, U; Hessel, H; Morsnowski, A

    2009-06-01

    The aim of this multicenter clinical field study was to assess the benefits of the new Freedom 24 sound processor for cochlear implant (CI) users implanted with the Nucleus 24 cochlear implant system. The study included 48 postlingually profoundly deaf experienced CI users who demonstrated speech comprehension performance with their current speech processor on the Oldenburg sentence test (OLSA) in quiet conditions of at least 80% correct scores and who were able to perform adaptive speech threshold testing using the OLSA in noisy conditions. Following baseline measures of speech comprehension performance with their current speech processor, subjects were upgraded to the Freedom 24 speech processor. After a take-home trial period of at least 2 weeks, subject performance was evaluated by measuring the speech reception threshold with the Freiburg multisyllabic word test and speech intelligibility with the Freiburg monosyllabic word test at 50 dB and 70 dB in the sound field. The results demonstrated highly significant benefits for speech comprehension with the new speech processor. Significant benefits for speech comprehension were also demonstrated with the new speech processor when tested in competing background noise.In contrast, use of the Abbreviated Profile of Hearing Aid Benefit (APHAB) did not prove to be a suitably sensitive assessment tool for comparative subjective self-assessment of hearing benefits with each processor. Use of the preprocessing algorithm known as adaptive dynamic range optimization (ADRO) in the Freedom 24 led to additional improvements over the standard upgrade map for speech comprehension in quiet and showed equivalent performance in noise. Through use of the preprocessing beam-forming algorithm BEAM, subjects demonstrated a highly significant improved signal-to-noise ratio for speech comprehension thresholds (i.e., signal-to-noise ratio for 50% speech comprehension scores) when tested with an adaptive procedure using the Oldenburg

  2. On Speech Act Theory

    邓仁毅

    2009-01-01

    Speech act has developed from the work of linguistic philosophers and originates in Austin's observation and study. It was the particular search for the eonstative, utterances which describe something outside the text and can therefore be judged true or false that prompted John L. Austin to direct his attention to the distinction with so -called performa-tires. The two representative linguists are Aus-tin and Searle.

  3. HATE SPEECH AS COMMUNICATION

    Gladilin Aleksey Vladimirovich

    2012-01-01

    The purpose of the paper is a theoretical comprehension of hate speech from communication point of view, on the one hand, and from the point of view of prejudice, stereotypes and discrimination on the other. Such a comprehension caused by the need to develop objective forensic linguistics methodology to analyze texts that are supposedly extremist. The method of analysis and synthesis is the basic in the investigation. Approach to functions and other elements of communication theory is based o...

  4. Regulating hate speech online

    Banks, James

    2010-01-01

    The exponential growth in the Internet as a means of communication has been emulated by an increase in far-right and extremist web sites and hate based activity in cyberspace. The anonymity and mobility afforded by the Internet has made harassment and expressions of hate effortless in a landscape that is abstract and beyond the realms of traditional law enforcement. This paper examines the complexities of regulating hate speech on the Internet through legal and technological frameworks. It ex...

  5. Speech rhythm: a metaphor?

    Nolan, Francis; Jeon, Hae-Sung

    2014-12-19

    Is speech rhythmic? In the absence of evidence for a traditional view that languages strive to coordinate either syllables or stress-feet with regular time intervals, we consider the alternative that languages exhibit contrastive rhythm subsisting merely in the alternation of stronger and weaker elements. This is initially plausible, particularly for languages with a steep 'prominence gradient', i.e. a large disparity between stronger and weaker elements; but we point out that alternation is poorly achieved even by a 'stress-timed' language such as English, and, historically, languages have conspicuously failed to adopt simple phonological remedies that would ensure alternation. Languages seem more concerned to allow 'syntagmatic contrast' between successive units and to use durational effects to support linguistic functions than to facilitate rhythm. Furthermore, some languages (e.g. Tamil, Korean) lack the lexical prominence which would most straightforwardly underpin prominence of alternation. We conclude that speech is not incontestibly rhythmic, and may even be antirhythmic. However, its linguistic structure and patterning allow the metaphorical extension of rhythm in varying degrees and in different ways depending on the language, and it is this analogical process which allows speech to be matched to external rhythms. PMID:25385774

  6. Speech and the Right Hemisphere

    E. M. R. Critchley

    1991-01-01

    Full Text Available Two facts are well recognized: the location of the speech centre with respect to handedness and early brain damage, and the involvement of the right hemisphere in certain cognitive functions including verbal humour, metaphor interpretation, spatial reasoning and abstract concepts. The importance of the right hemisphere in speech is suggested by pathological studies, blood flow parameters and analysis of learning strategies. An insult to the right hemisphere following left hemisphere damage can affect residual language abilities and may activate non-propositional inner speech. The prosody of speech comprehension even more so than of speech production—identifying the voice, its affective components, gestural interpretation and monitoring one's own speech—may be an essentially right hemisphere task. Errors of a visuospatial type may occur in the learning process. Ease of learning by actors and when learning foreign languages is achieved by marrying speech with gesture and intonation, thereby adopting a right hemisphere strategy.

  7. Automatic speech recognition (zero crossing method). Automatic recognition of isolated vowels

    This note describes a recognition method of isolated vowels, using a preprocessing of the vocal signal. The processing extracts the extrema of the vocal signal and the interval time separating them (Zero crossing distances of the first derivative of the signal). The recognition of vowels uses normalized histograms of the values of these intervals. The program determines a distance between the histogram of the sound to be recognized and histograms models built during a learning phase. The results processed on real time by a minicomputer, are relatively independent of the speaker, the fundamental frequency being not allowed to vary too much (i.e. speakers of the same sex). (author)

  8. Speech recognition in university classrooms

    Wald, Mike; Bain, Keith; Basson, Sara H

    2002-01-01

    The LIBERATED LEARNING PROJECT (LLP) is an applied research project studying two core questions: 1) Can speech recognition (SR) technology successfully digitize lectures to display spoken words as text in university classrooms? 2) Can speech recognition technology be used successfully as an alternative to traditional classroom notetaking for persons with disabilities? This paper addresses these intriguing questions and explores the underlying complex relationship between speech recognition te...

  9. Visualizing structures of speech expressiveness

    Herbelin, Bruno; Jensen, Karl Kristoffer; Graugaard, Lars

    2008-01-01

    Speech is both beautiful and informative. In this work, a conceptual study ofthe speech, through investigation of the tower of Babel, the archetypal phonemes, and astudy of the reasons of uses of language is undertaken in order to create an artistic workinvestigating the nature of speech. The Babel myth speaks about distance created whenaspiring to the heaven as the reason for language division. Meanwhile, Locquin statesthrough thorough investigations that only a few phonemes are present thro...

  10. Lecturer’s Speech Competence

    Svetlana Viktorovna Panina; Svetlana Yurievna Zalutskaya; Galina Egorovna Zhondorova

    2014-01-01

    The analysis of the issue of lecturer’s speech competence is presented. Lecturer’s speech competence is the main component of professional image, the indicator of communicative culture, having a great impact on the quality of pedagogical activity Research objective: to define the main drawbacks of speech competence of lecturers of North-Eastern Federal University named after M. K. Ammosov (NEFU) (Russia, Yakutsk) and suggest the ways of drawbacks corrections in terms of multilingual education...

  11. Speech Recognition Technology: Applications & Future

    Pankaj Pathak

    2010-01-01

    Voice or speech recognition is "the technology by which sounds, words or phrases spoken by humans are converted into electrical signals, and these signals are transformed into coding patterns to which meaning has been assigned", .It is the technology needs a combination of improved artificial intelligence technology and a more sophisticated speech-recognition engine . Initially a primitive device is developed which could recognize speech, by AT & T Bell Laboratories in the 1940s. According to...

  12. Motor Equivalence in Speech Production

    Perrier, Pascal; Fuchs, Susanne

    2015-01-01

    International audience The first section provides a description of the concepts of “motor equivalence” and “degrees of freedom”. It is illustrated with a few examples of motor tasks in general and of speech production tasks in particular. In the second section, the methodology used to investigate experimentally motor equivalence phenomena in speech production is presented. It is mainly based on paradigms that perturb the perception-action loop during on-going speech, either by limiting the...

  13. Speech therapy for Parkinson's disease.

    Scott, S; Caird, F I

    1983-01-01

    Twenty-six patients with the speech disorder of Parkinson's disease received daily speech therapy (prosodic exercises) at home for 2 to 3 weeks. There were significant improvements in speech as assessed by scores for prosodic abnormality and intelligibility' and these were maintained in part for up to 3 months. The degree of improvement was clinically and psychologically important, and relatives commented on the social benefits. The use of a visual reinforcement device produced limited benefi...

  14. The role of temporal resolution in modulation-based speech segregation

    May, Tobias; Bentsen, Thomas; Dau, Torsten

    2015-01-01

    This study is concerned with the challenge of automatically segregating a target speech signal from interfering background noise. A computational speech segregation system is presented which exploits logarithmically-scaled amplitude modulation spectrogram (AMS) features to distinguish between spe...... which the segregation system can manipulate individual T-F units. To clarify the consequences of this trade-off on modulation-based speech segregation performance, the influence of the window duration was systematically investigated......This study is concerned with the challenge of automatically segregating a target speech signal from interfering background noise. A computational speech segregation system is presented which exploits logarithmically-scaled amplitude modulation spectrogram (AMS) features to distinguish between...... speech and noise activity on the basis of individual time-frequency (T-F) units. One important parameter of the segregation system is the window duration of the analysis-synthesis stage, which determines the lower limit of modulation frequencies that can be represented but also the temporal acuity with...

  15. The Promise of NLP and Speech Processing Technologies in Language Assessment

    Chapelle, Carol A.; Chung, Yoo-Ree

    2010-01-01

    Advances in natural language processing (NLP) and automatic speech recognition and processing technologies offer new opportunities for language testing. Despite their potential uses on a range of language test item types, relatively little work has been done in this area, and it is therefore not well understood by test developers, researchers or…

  16. A Joint Approach for Single-Channel Speaker Identification and Speech Separation

    Mowlaee, Pejman; Saeidi, Rahim; Christensen, Mads Græsbøll;

    2012-01-01

    situation where we have prior information of codebook indices, speaker identities and SSR-level, and then, by relaxing these assumptions one by one, we demonstrate the efficiency of the proposed fully blind system. In contrast to previous studies that mostly focus on automatic speech recognition (ASR...

  17. Speech-based recognition of self-reported and observed emotion in a dimensional space

    Truong, Khiet P.; Leeuwen, van David A.; Jong, de Franciska M.G.

    2012-01-01

    The differences between self-reported and observed emotion have only marginally been investigated in the context of speech-based automatic emotion recognition. We address this issue by comparing self-reported emotion ratings to observed emotion ratings and look at how differences between these two t

  18. Musical expertise and foreign speech perception

    Martínez-Montes, Eduardo; Hernández-Pérez, Heivet; Chobert, Julie; Morgado-Rodríguez, Lisbet; Suárez-Murias, Carlos; Valdés-Sosa, Pedro A.; Besson, Mireille

    2013-01-01

    The aim of this experiment was to investigate the influence of musical expertise on the automatic perception of foreign syllables and harmonic sounds. Participants were Cuban students with high level of expertise in music or in visual arts and with the same level of general education and socio-economic background. We used a multi-feature Mismatch Negativity (MMN) design with sequences of either syllables in Mandarin Chinese or harmonic sounds, both comprising deviants in pitch contour, duration and Voice Onset Time (VOT) or equivalent that were either far from (Large deviants) or close to (Small deviants) the standard. For both Mandarin syllables and harmonic sounds, results were clear-cut in showing larger MMNs to pitch contour deviants in musicians than in visual artists. Results were less clear for duration and VOT deviants, possibly because of the specific characteristics of the stimuli. Results are interpreted as reflecting similar processing of pitch contour in speech and non-speech sounds. The implications of these results for understanding the influence of intense musical training from childhood to adulthood and of genetic predispositions for music on foreign language perception are discussed. PMID:24294193

  19. Musical expertise and foreign speech perception

    Eduardo eMartínez-Montes

    2013-11-01

    Full Text Available The aim of this experiment was to investigate the influence of musical expertise on the automatic perception of foreign syllables and harmonic sounds. Participants were Cuban students with high level of expertise in music or in visual arts and with the same level of general education and socio-economic background. We used a multi-feature Mismatch Negativity (MMN design with sequences of either syllables in Mandarin Chinese or harmonic sounds, both comprising deviants in pitch contour, duration and Voice Onset Time (VOT or equivalent that were either far from (Large deviants or close to (Small deviants the standard. For both Mandarin syllables and harmonic sounds, results were clear-cut in showing larger MMNs to pitch contour deviants in musicians than in visual artists. Results were less clear for duration and VOT deviants, possibly because of the specific characteristics of the stimuli. Results are interpreted as reflecting similar processing of pitch contour in speech and non-speech sounds. The implications of these results for understanding the influence of intense musical training from childhood to adulthood and of genetic predispositions for music on foreign language perception is discussed.

  20. Somatosensory basis of speech production.

    Tremblay, Stéphanie; Shiller, Douglas M; Ostry, David J

    2003-06-19

    The hypothesis that speech goals are defined acoustically and maintained by auditory feedback is a central idea in speech production research. An alternative proposal is that speech production is organized in terms of control signals that subserve movements and associated vocal-tract configurations. Indeed, the capacity for intelligible speech by deaf speakers suggests that somatosensory inputs related to movement play a role in speech production-but studies that might have documented a somatosensory component have been equivocal. For example, mechanical perturbations that have altered somatosensory feedback have simultaneously altered acoustics. Hence, any adaptation observed under these conditions may have been a consequence of acoustic change. Here we show that somatosensory information on its own is fundamental to the achievement of speech movements. This demonstration involves a dissociation of somatosensory and auditory feedback during speech production. Over time, subjects correct for the effects of a complex mechanical load that alters jaw movements (and hence somatosensory feedback), but which has no measurable or perceptible effect on acoustic output. The findings indicate that the positions of speech articulators and associated somatosensory inputs constitute a goal of speech movements that is wholly separate from the sounds produced. PMID:12815431

  1. What Is Language? What Is Speech?

    ... Public / Speech, Language and Swallowing / Development What Is Language? What Is Speech? [ en Español ] Kelly's 4-year-old son, Tommy, has speech and language problems. Friends and family have a hard time ...

  2. Fully Automatic Expression-Invariant Face Correspondence

    Salazar, Augusto; Shu, Chang; Prieto, Flavio

    2012-01-01

    We consider the problem of computing accurate point-to-point correspondences among a set of human face scans with varying expressions. Our fully automatic approach does not require any manually placed markers on the scan. Instead, the approach learns the locations of a set of landmarks present in a database and uses this knowledge to automatically predict the locations of these landmarks on a newly available scan. The predicted landmarks are then used to compute point-to-point correspondences between a template model and the newly available scan. To accurately fit the expression of the template to the expression of the scan, we use as template a blendshape model. Our algorithm was tested on a database of human faces of different ethnic groups with strongly varying expressions. Experimental results show that the obtained point-to-point correspondence is both highly accurate and consistent for most of the tested 3D face models.

  3. Neurocognitive mechanisms of audiovisual speech perception

    Ojanen, Ville

    2005-01-01

    Face-to-face communication involves both hearing and seeing speech. Heard and seen speech inputs interact during audiovisual speech perception. Specifically, seeing the speaker's mouth and lip movements improves identification of acoustic speech stimuli, especially in noisy conditions. In addition, visual speech may even change the auditory percept. This occurs when mismatching auditory speech is dubbed onto visual articulation. Research on the brain mechanisms of audiovisual perception a...

  4. Visibility of speech articulation enhances auditory phonetic convergence.

    Dias, James W; Rosenblum, Lawrence D

    2016-01-01

    Talkers automatically imitate aspects of perceived speech, a phenomenon known as phonetic convergence. Talkers have previously been found to converge to auditory and visual speech information. Furthermore, talkers converge more to the speech of a conversational partner who is seen and heard, relative to one who is just heard (Dias & Rosenblum Perception, 40, 1457-1466, 2011). A question raised by this finding is what visual information facilitates the enhancement effect. In the following experiments, we investigated the possible contributions of visible speech articulation to visual enhancement of phonetic convergence within the noninteractive context of a shadowing task. In Experiment 1, we examined the influence of the visibility of a talker on phonetic convergence when shadowing auditory speech either in the clear or in low-level auditory noise. The results suggest that visual speech can compensate for convergence that is reduced by auditory noise masking. Experiment 2 further established the visibility of articulatory mouth movements as being important to the visual enhancement of phonetic convergence. Furthermore, the word frequency and phonological neighborhood density characteristics of the words shadowed were found to significantly predict phonetic convergence in both experiments. Consistent with previous findings (e.g., Goldinger Psychological Review, 105, 251-279, 1998), phonetic convergence was greater when shadowing low-frequency words. Convergence was also found to be greater for low-density words, contrasting with previous predictions of the effect of phonological neighborhood density on auditory phonetic convergence (e.g., Pardo, Jordan, Mallari, Scanlon, & Lewandowski Journal of Memory and Language, 69, 183-195, 2013). Implications of the results for a gestural account of phonetic convergence are discussed. PMID:26358471

  5. Audio-video feature correlation: faces and speech

    Durand, Gwenael; Montacie, Claude; Caraty, Marie-Jose; Faudemay, Pascal

    1999-08-01

    This paper presents a study of the correlation of features automatically extracted from the audio stream and the video stream of audiovisual documents. In particular, we were interested in finding out whether speech analysis tools could be combined with face detection methods, and to what extend they should be combined. A generic audio signal partitioning algorithm as first used to detect Silence/Noise/Music/Speech segments in a full length movie. A generic object detection method was applied to the keyframes extracted from the movie in order to detect the presence or absence of faces. The correlation between the presence of a face in the keyframes and of the corresponding voice in the audio stream was studied. A third stream, which is the script of the movie, is warped on the speech channel in order to automatically label faces appearing in the keyframes with the name of the corresponding character. We naturally found that extracted audio and video features were related in many cases, and that significant benefits can be obtained from the joint use of audio and video analysis methods.

  6. Automatic input rectification

    Long, Fan; Ganesh, Vijay; Carbin, Michael James; Sidiroglou, Stelios; Rinard, Martin

    2012-01-01

    We present a novel technique, automatic input rectification, and a prototype implementation, SOAP. SOAP learns a set of constraints characterizing typical inputs that an application is highly likely to process correctly. When given an atypical input that does not satisfy these constraints, SOAP automatically rectifies the input (i.e., changes the input so that it satisfies the learned constraints). The goal is to automatically convert potentially dangerous inputs into typical inputs that the ...

  7. Automatic Fiscal Stabilizers

    Narcis Eduard Mitu

    2013-11-01

    Full Text Available Policies or institutions (built into an economic system that automatically tend to dampen economic cycle fluctuations in income, employment, etc., without direct government intervention. For example, in boom times, progressive income tax automatically reduces money supply as incomes and spendings rise. Similarly, in recessionary times, payment of unemployment benefits injects more money in the system and stimulates demand. Also called automatic stabilizers or built-in stabilizers.

  8. Enhancing Peer Feedback and Speech Preparation: The Speech Video Activity

    Opt, Susan

    2012-01-01

    In the typical public speaking course, instructors or assistants videotape or digitally record at least one of the term's speeches in class or lab to offer students additional presentation feedback. Students often watch and self-critique their speeches on their own. Peers often give only written feedback on classroom presentations or completed…

  9. Improving speech recognition on a mobile robot platform through the use of top-down visual queues

    Ross, Robert; O'Donoghue, R. P. S.; O'Hare, G. M. P.

    2003-01-01

    In many real-world environments, Automatic Speech Recognition (ASR) technologies fail to provide adequate performance for applications such as human robot dialog. Despite substantial evidence that speech recognition in humans is performed in a top-down as well as bottom-up manner, ASR systems typically fail to capitalize on this, instead relying on a purely statistical, bottom up methodology. In this paper we advocate the use of a knowledge based approach to improving ASR in domains such as m...

  10. Modeling Co-evolution of Speech and Biology.

    de Boer, Bart

    2016-04-01

    Two computer simulations are investigated that model interaction of cultural evolution of language and biological evolution of adaptations to language. Both are agent-based models in which a population of agents imitates each other using realistic vowels. The agents evolve under selective pressure for good imitation. In one model, the evolution of the vocal tract is modeled; in the other, a cognitive mechanism for perceiving speech accurately is modeled. In both cases, biological adaptations to using and learning speech evolve, even though the system of speech sounds itself changes at a more rapid time scale than biological evolution. However, the fact that the available acoustic space is used maximally (a self-organized result of cultural evolution) is constant, and therefore biological evolution does have a stable target. This work shows that when cultural and biological traits are continuous, their co-evolution may lead to cognitive adaptations that are strong enough to detect empirically. PMID:26936622

  11. Control of vocal-tract length in speech

    Riordan, C.J.

    1977-10-01

    Essential for the correct production of vowels is the accurate control of vocal-tract length. Perkell (Psychology of Speech Production (MIT, Cambridge, MA, 1969)) has suggested that two important determinants of vocal-tract length are vertical larynx position and lip spreading/protrusion, often acting together. The present study was designed to determine whether constraining lip spreading/protrusion induces compensatory vertical larynx displacements, particularly on rounded vowels. Upper lip and larynx movement were monitored photoelectrically while French and Mandarin native speakers produced the vowels /i,y,u/ first under normal-speech conditions and then with lip activity constrained. Significant differences were found in upper-lip protrusion and larynx position depending on the vowel uttered. Moreover, the generally low-larynx position of rounded vowels became even lower when lip protrusion was constrained. These results imply that compensatory articulations contribute to a contrast-preserving strategy in speech production.

  12. Comparative Evaluation of Phone Duration Models for Greek Emotional Speech

    Alexandros Lazaridis

    2010-01-01

    Full Text Available Problem statement: In this study we cope with the task of phone duration modeling for Greek emotional speech synthesis. Approach: Various well established machine learning techniques are applied for this purpose to an emotional speech database consisting of five archetypal emotions. The constructed phone duration prediction models are built on phonetic, morphosyntactic and prosodic features that can be extracted only from text. We employ model and regression trees, linear regression, lazy learning algorithms and meta-learning algorithms using regression trees as base classifiers, trained on a Modern Greek emotional database consisting of five emotional categories: anger, fear, joy, neutral and sadness. Results: Model trees based on the M5’ algorithm and meta-learning algorithms using as base classifier regression trees based on the M5’ algorithm proved to perform better. Conclusion: It was observed that the emotional categories of the speech database with the most uniform distribution of phone durations built the most accurate models.

  13. Auditory detection of non-speech and speech stimuli in noise: Native speech advantage.

    Huo, Shuting; Tao, Sha; Wang, Wenjing; Li, Mingshuang; Dong, Qi; Liu, Chang

    2016-05-01

    Detection thresholds of Chinese vowels, Korean vowels, and a complex tone, with harmonic and noise carriers were measured in noise for Mandarin Chinese-native listeners. The harmonic index was calculated as the difference between detection thresholds of the stimuli with harmonic carriers and those with noise carriers. The harmonic index for Chinese vowels was significantly greater than that for Korean vowels and the complex tone. Moreover, native speech sounds were rated significantly more native-like than non-native speech and non-speech sounds. The results indicate that native speech has an advantage over other sounds in simple auditory tasks like sound detection. PMID:27250202

  14. Automatic recognition of element classes and boundaries in the birdsong with variable sequences

    Koumura, Takuya; Okanoya, Kazuo

    2016-01-01

    Researches on sequential vocalization often require analysis of vocalizations in long continuous sounds. In such studies as developmental ones or studies across generations in which days or months of vocalizations must be analyzed, methods for automatic recognition would be strongly desired. Although methods for automatic speech recognition for application purposes have been intensively studied, blindly applying them for biological purposes may not be an optimal solution. This is because, unl...

  15. Mutual Disambiguation of Eye Gaze and Speech for Sight Translation and Reading

    Kulkarni, Rucha; Jain, Kritika; Bansal, Himanshu;

    Researchers are proposing interactive machine translation as a potential method to make language translation process more efficient and usable. Introduction of different modalities like eye gaze and speech are being explored to add to the interactivity of language translation system. Unfortunately......, the raw data provided by Automatic Speech Recognition (ASR) and Eye-Tracking is very noisy and erroneous. This paper describes a technique for reducing the errors of the two modalities, speech and eye-gaze with the help of each other in context of sight translation and reading. Lattice representation...... and composition of the two modalities was used for integration. F-measure for Eye-Gaze and Word Accuracy for ASR were used as metrics to evaluate our results. In reading task, we demonstrated a significant improvement in both Eye-Gaze f-measure and speech Word Accuracy. In sight translation task...

  16. Mutual Disambiguation of Eye Gaze and Speech for Sight Translation and Reading

    Kulkarni, Rucha; Jain, Kritika; Bansal, Himanshu;

    2013-01-01

    Researchers are proposing interactive machine translation as a potential method to make language translation process more efficient and usable. Introduction of different modalities like eye gaze and speech are being explored to add to the interactivity of language translation system. Unfortunately......, the raw data provided by Automatic Speech Recognition (ASR) and Eye-Tracking is very noisy and erroneous. This paper describes a technique for reducing the errors of the two modalities, speech and eye-gaze with the help of each other in context of sight translation and reading. Lattice representation...... and composition of the two modalities was used for integration. F-measure for Eye-Gaze and Word Accuracy for ASR were used as metrics to evaluate our results. In reading task, we demonstrated a significant improvement in both Eye-Gaze f-measure and speech Word Accuracy. In sight translation task...

  17. ACTION OF UNIFORM SEARCH ALGORITHM WHEN SELECTING LANGUAGE UNITS IN THE PROCESS OF SPEECH

    Nekipelova Irina Mikhaylovna

    2013-04-01

    Full Text Available The article is devoted to research of action of uniform search algorithm when selecting by human of language units for speech produce. The process is connected with a speech optimization phenomenon. This makes it possible to shorten the time of cogitation something that human want to say, and to achieve the maximum precision in thoughts expression. The algorithm of uniform search works at consciousness and subconsciousness levels. It favours the forming of automatism produce and perception of speech. Realization of human's cognitive potential in the process of communication starts up complicated mechanism of self-organization and self-regulation of language. In turn, it results in optimization of language system, servicing needs not only human's self-actualization but realization of communication in society. The method of problem-oriented search is used for researching of optimization mechanisms, which are distinctive to speech producing and stabilization of language.

  18. Pattern Recognition Methods and Features Selection for Speech Emotion Recognition System

    Pavol Partila

    2015-01-01

    Full Text Available The impact of the classification method and features selection for the speech emotion recognition accuracy is discussed in this paper. Selecting the correct parameters in combination with the classifier is an important part of reducing the complexity of system computing. This step is necessary especially for systems that will be deployed in real-time applications. The reason for the development and improvement of speech emotion recognition systems is wide usability in nowadays automatic voice controlled systems. Berlin database of emotional recordings was used in this experiment. Classification accuracy of artificial neural networks, k-nearest neighbours, and Gaussian mixture model is measured considering the selection of prosodic, spectral, and voice quality features. The purpose was to find an optimal combination of methods and group of features for stress detection in human speech. The research contribution lies in the design of the speech emotion recognition system due to its accuracy and efficiency.

  19. FUSING SPEECH SIGNAL AND PALMPRINT FEATURES FOR AN SECURED AUTHENTICATION SYSTEM

    P.K. Mahesh

    2011-11-01

    Full Text Available In the application of Biometric authentication, personal identification is regarded as an effective method for automatic recognition, with a high confidence, a person’s identity. Using multimodal biometric systems we typically get better performance compare to single biometric modality. This paper proposes the multimodal biometrics system for identity verification using two traits, i.e., speech signal and palmprint. Integrating the palmprint and speech information increases robustness of person authentication. The proposed system is designed for applications where the training data contains a speech signal and palmprint. It is well known that the performance of person authentication using only speech signal or palmprint is deteriorated by feature changes with time. The final decision is made by fusion at matching score level architecture in which feature vectors are created independently for query measures and are then compared to the enrolment templates, which are stored during database preparation.

  20. Single-channel speech enhancement method based on masking properties and minimum statistics

    江小平; 姚天任; 傅华

    2004-01-01

    A single-channel speech enhancement method of noisy speech signals at very low signal-to-noise ratios is presented, which is based on masking properties of the human auditory system and power spectral density estimation of nonstationary noise. It allows for an automatic adaptation in time and frequency of the parametric enhancement system, and finds the best tradeoff among the amount of noise reduction, the speech distortion, and the levd of musicalresidual noise based on a criterion correlated with perception and SNR. This leads to a significant reduction of the unnatural structure of the residual noise. The results with several noise types show that the enhanced speech is more pleasant to a human listener.

  1. Post-Editing Error Correction Algorithm for Speech Recognition using Bing Spelling Suggestion

    Bassil, Youssef

    2012-01-01

    ASR short for Automatic Speech Recognition is the process of converting a spoken speech into text that can be manipulated by a computer. Although ASR has several applications, it is still erroneous and imprecise especially if used in a harsh surrounding wherein the input speech is of low quality. This paper proposes a post-editing ASR error correction method and algorithm based on Bing's online spelling suggestion. In this approach, the ASR recognized output text is spell-checked using Bing's spelling suggestion technology to detect and correct misrecognized words. More specifically, the proposed algorithm breaks down the ASR output text into several word-tokens that are submitted as search queries to Bing search engine. A returned spelling suggestion implies that a query is misspelled; and thus it is replaced by the suggested correction; otherwise, no correction is performed and the algorithm continues with the next token until all tokens get validated. Experiments carried out on various speeches in differen...

  2. Perceptual learning in speech

    D. Norris; McQueen, J; Cutler, A.

    2003-01-01

    This study demonstrates that listeners use lexical knowledge in perceptual learning of speech sounds. Dutch listeners first made lexical decisions on Dutch words and nonwords. The final fricative of 20 critical words had been replaced by an ambiguous sound, between [f] and [s]. One group of listeners heard ambiguous [f]-final words (e.g., [WI tlo?], from witlof, chicory) and unambiguous [s]-final words (e.g., naaldbos, pine forest). Another group heard the reverse (e.g., ambiguous [na:ldbo?],...

  3. Computational validation of the motor contribution to speech perception.

    Badino, Leonardo; D'Ausilio, Alessandro; Fadiga, Luciano; Metta, Giorgio

    2014-07-01

    Action perception and recognition are core abilities fundamental for human social interaction. A parieto-frontal network (the mirror neuron system) matches visually presented biological motion information onto observers' motor representations. This process of matching the actions of others onto our own sensorimotor repertoire is thought to be important for action recognition, providing a non-mediated "motor perception" based on a bidirectional flow of information along the mirror parieto-frontal circuits. State-of-the-art machine learning strategies for hand action identification have shown better performances when sensorimotor data, as opposed to visual information only, are available during learning. As speech is a particular type of action (with acoustic targets), it is expected to activate a mirror neuron mechanism. Indeed, in speech perception, motor centers have been shown to be causally involved in the discrimination of speech sounds. In this paper, we review recent neurophysiological and machine learning-based studies showing (a) the specific contribution of the motor system to speech perception and (b) that automatic phone recognition is significantly improved when motor data are used during training of classifiers (as opposed to learning from purely auditory data). PMID:24935820

  4. Comparison of Classification Methods for Detecting Emotion from Mandarin Speech

    Pao, Tsang-Long; Chen, Yu-Te; Yeh, Jun-Heng

    It is said that technology comes out from humanity. What is humanity? The very definition of humanity is emotion. Emotion is the basis for all human expression and the underlying theme behind everything that is done, said, thought or imagined. Making computers being able to perceive and respond to human emotion, the human-computer interaction will be more natural. Several classifiers are adopted for automatically assigning an emotion category, such as anger, happiness or sadness, to a speech utterance. These classifiers were designed independently and tested on various emotional speech corpora, making it difficult to compare and evaluate their performance. In this paper, we first compared several popular classification methods and evaluated their performance by applying them to a Mandarin speech corpus consisting of five basic emotions, including anger, happiness, boredom, sadness and neutral. The extracted feature streams contain MFCC, LPCC, and LPC. The experimental results show that the proposed WD-MKNN classifier achieves an accuracy of 81.4% for the 5-class emotion recognition and outperforms other classification techniques, including KNN, MKNN, DW-KNN, LDA, QDA, GMM, HMM, SVM, and BPNN. Then, to verify the advantage of the proposed method, we compared these classifiers by applying them to another Mandarin expressive speech corpus consisting of two emotions. The experimental results still show that the proposed WD-MKNN outperforms others.

  5. Taking a Stand for Speech.

    Moore, Wayne D.

    1995-01-01

    Asserts that freedom of speech issues were among the first major confrontations in U.S. constitutional law. Maintains that lessons from the controversies surrounding the Sedition Act of 1798 have continuing practical relevance. Describes and discusses the significance of freedom of speech to the U.S. political system. (CFR)

  6. Speech Prosody in Cerebellar Ataxia

    Casper, Maureen A.; Raphael, Lawrence J.; Harris, Katherine S.; Geibel, Jennifer M.

    2007-01-01

    Persons with cerebellar ataxia exhibit changes in physical coordination and speech and voice production. Previously, these alterations of speech and voice production were described primarily via perceptual coordinates. In this study, the spatial-temporal properties of syllable production were examined in 12 speakers, six of whom were healthy…

  7. Solid state audio/speech processor analysis

    Smith, A. R.; Landell, B. P.; Vensko, G.

    1980-03-01

    The progress made in evaluating the feasibility of applying charge coupled devices (CCD's) as well as microprocessors to reduce the cost and complexity of automatic speech recognition (ASR) systems are discussed. Experiments were conducted to measure the speed versus accuracy tradeoffs of four speed up techniques. The techniques were demonstrated to be worthwhile in an efficient realtime automatic word recognition (AWR) system. Microprocessor architectures were designed to implement the realtime AWR system and then evaluated in terms of cost and complexity using three different microprocessors: the 8 bit Intel 8085A; the 16 bit Motorola MC68000; and a 16 bit configuration of the AMD 2901A. Of the three, the AMD 2901A proved preferable from both a cost and a performance standpoint. Hardware cost projections for an AWR system featuring an AMD 2901A architecture and a bandpass filter CCD analyzer indicate that the hardware components for such a system should range between $1,500 for a 52 word vocabulary, to about $12,700 for a 780 word vocabulary.

  8. Quality Estimation of Alaryngeal Speech

    R.Dhivya

    2014-01-01

    Full Text Available Quality assessment can be done using subjective listening tests or using objective quality measures. Objective measures quantify quality. The sentence material is chosen from IEEE corpus. Real world noise data was taken from the noisy speech corpus NOIZEUS. Alaryngeal speaker‘s voice (alaryngeal speech is recorded. To enhance the quality of speech produced from the prosthetic device, four classes of enhancement methods encompassing four algorithms mband spectral subtraction algorithm, Karhunen–Loéve transform (KLT subspace algorithm, MASK statistical-model based algorithm and Wavelet Threshold-Wiener algorithm are used. The enhanced speech signals obtained from the four classes of algorithms are evaluated using Perceptual Evaluation of Speech Quality (PESQ. Spectrograms of these enhanced signals are also plotted.

  9. Spatial localization of speech segments

    Karlsen, Brian Lykkegaard

    1999-01-01

    Much is known about human localization of simple stimuli like sinusoids, clicks, broadband noise and narrowband noise in quiet. Less is known about human localization in noise. Even less is known about localization of speech and very few previous studies have reported data from localization of...... distribution of which azimuth angle the target is likely to have originated from. The model is trained on the experimental data. On the basis of the experimental results, it is concluded that the human ability to localize speech segments in adverse noise depends on the speech segment as well as its point of...... speech in noise. This study attempts to answer the question: ``Are there certain features of speech which have an impact on the human ability to determine the spatial location of a speaker in the horizontal plane under adverse noise conditions?''. The study consists of an extensive literature survey on...

  10. Speech Compression Using Multecirculerletet Transform

    Sulaiman Murtadha

    2012-01-01

    Full Text Available Compressing the speech reduces the data storage requirements, leading to reducing the time of transmitting the digitized speech over long-haul links like internet. To obtain best performance in speech compression, wavelet transforms require filters that combine a number of desirable properties, such as orthogonality and symmetry.The MCT bases functions are derived from GHM bases function using 2D linear convolution .The fast computation algorithm methods introduced here added desirable features to the current transform. We further assess the performance of the MCT in speech compression application. This paper discusses the effect of using DWT and MCT (one and two dimension on speech compression. DWT and MCT performances in terms of compression ratio (CR, mean square error (MSE and peak signal to noise ratio (PSNR are assessed. Computer simulation results indicate that the two dimensions MCT offer a better compression ratio, MSE and PSNR than DWT.

  11. Hammerstein Model for Speech Coding

    Turunen Jari

    2003-01-01

    Full Text Available A nonlinear Hammerstein model is proposed for coding speech signals. Using Tsay's nonlinearity test, we first show that the great majority of speech frames contain nonlinearities (over 80% in our test data when using 20-millisecond speech frames. Frame length correlates with the level of nonlinearity: the longer the frames the higher the percentage of nonlinear frames. Motivated by this result, we present a nonlinear structure using a frame-by-frame adaptive identification of the Hammerstein model parameters for speech coding. Finally, the proposed structure is compared with the LPC coding scheme for three phonemes /a/, /s/, and /k/ by calculating the Akaike information criterion of the corresponding residual signals. The tests show clearly that the residual of the nonlinear model presented in this paper contains significantly less information compared to that of the LPC scheme. The presented method is a potential tool to shape the residual signal in an encode-efficient form in speech coding.

  12. Single-Channel Speech Enhancement by NWNS and EMD

    Somlal Das

    2010-12-01

    Full Text Available This paper presents the problem of noise reduction from observed speech by means of improving quality and/or intelligibility of the speech using single-channel speech enhancement method. In this study, we propose two approaches for speech enhancement. One is based on traditional Fourier transform using the strategy of Noise Subtraction (NS that is equivalent to Spectral Subtraction (SS and the other is based on the Empirical Mode Decomposition (EMD using the strategy of adaptive thresholding. First of all, the two different methods are implemented individually and observe that, both the methods are noise dependent and capable to enhance speech signal to a certain limit. Moreover, traditional NS generates unwanted residual noise as well. We implement nonlinear weight to eliminate this effect and propose Nonlinear Weighted Noise Subtraction (NWNS method. In first stage, we estimate the noise and then calculate the Degree Of Noise (DON1 from the ratio of the estimated noise power to the observed speech power in frame basis for different input Signal-to-Noise-Ratio (SNR of the given speech signal. The noise is not accurately estimated using Minima Value Sequence (MVS. So the noise estimation accuracy is improved by adopting DON1 into MVS. The first stage performs well for wideband stationary noises and performed well over wide range of SNRs. Most of the real world noise is narrowband non-stationary and EMD is a powerful tool for analyzing non-linear and non-stationary signals like speech. EMD decomposes any signals into a finite number of band limited signals called intrinsic mode function (IMFs. Since the IMFs having different noise and speech energy distribution, hence each IMF has a different noise and speech variance. These variances change for different IMFs. Therefore an adaptive threshold function is used, which is changed with newly computed variances for each IMF. In the adaptive threshold function, adaptation factor is the ratio of the

  13. Changes in speech production in a child with a cochlear implant: acoustic and kinematic evidence.

    Goffman, Lisa; Ertmer, David J; Erdle, Christa

    2002-10-01

    A method is presented for examining change in motor patterns used to produce linguistic contrasts. In this case study, the method is applied to a child receiving new auditory input following cochlear implantation. This child experienced hearing loss at age 3 years and received a multichannel cochlear implant at age 7 years. Data collection points occurred both pre- and postimplant and included acoustic and kinematic analyses. Overall, this child's speech output was transcribed as accurate across the pre- and postimplant periods. Postimplant, with the onset of new auditory experience, acoustic durations showed a predictable maturational change, usually decreasing in duration. Conversely, the spatiotemporal stability of speech movements initially became more variable postimplantation. The auditory perturbations experienced by this child during development led to changes in the physiological underpinnings of speech production, even when speech output was perceived as accurate. PMID:12381047

  14. Connected digit speech recognition system for Malayalam language

    Cini Kurian; Kannan Balakrishnan

    2013-12-01

    A connected digit speech recognition is important in many applications such as automated banking system, catalogue-dialing, automatic data entry, automated banking system, etc. This paper presents an optimum speaker-independent connected digit recognizer for Malayalam language. The system employs Perceptual Linear Predictive (PLP) cepstral coefficient for speech parameterization and continuous density Hidden Markov Model (HMM) in the recognition process. Viterbi algorithm is used for decoding. The training data base has the utterance of 21 speakers from the age group of 20 to 40 years and the sound is recorded in the normal office environment where each speaker is asked to read 20 set of continuous digits. The system obtained an accuracy of 99.5 % with the unseen data.

  15. PCA-Based Speech Enhancement for Distorted Speech Recognition

    Tetsuya Takiguchi

    2007-09-01

    Full Text Available We investigated a robust speech feature extraction method using kernel PCA (Principal Component Analysis for distorted speech recognition. Kernel PCA has been suggested for various image processing tasks requiring an image model, such as denoising, where a noise-free image is constructed from a noisy input image. Much research for robust speech feature extraction has been done, but it remains difficult to completely remove additive or convolution noise (distortion. The most commonly used noise-removal techniques are based on the spectraldomain operation, and then for speech recognition, the MFCC (Mel Frequency Cepstral Coefficient is computed, where DCT (Discrete Cosine Transform is applied to the mel-scale filter bank output. This paper describes a new PCA-based speech enhancement algorithm using kernel PCA instead of DCT, where the main speech element is projected onto low-order features, while the noise or distortion element is projected onto high-order features. Its effectiveness is confirmed by word recognition experiments on distorted speech.

  16. Hate Speech or Free Speech: Can Broad Campus Speech Regulations Survive Current Judicial Reasoning?

    Heiser, Gregory M.; Rossow, Lawrence F.

    1993-01-01

    Federal courts have found speech regulations overbroad in suits against the University of Michigan and the University of Wisconsin System. Attempts to assess the theoretical justification and probable fate of broad speech regulations that have not been explicitly rejected by the courts. Concludes that strong arguments for broader regulation will…

  17. The timing and effort of lexical access in natural and degraded speech

    Anita Eva Wagner

    2016-03-01

    Full Text Available Understanding speech is effortless in ideal situations, and although adverse conditions, such as caused by hearing impairment, often render it an effortful task, they do not necessarily suspend speech comprehension. A prime example of this is speech perception by cochlear implant users, whose hearing prostheses transmit speech as a significantly degraded signal. It is yet unknown how mechanisms of speech processing deal with such degraded signals, and whether they are affected by effortful processing of speech.This paper compares the automatic process of lexical competition between natural and degraded speech, and combines gaze fixations, which capture the course of lexical disambiguation, with pupillometry, which quantifies the mental effort involved in processing speech. Listeners’ ocular responses were recorded during disambiguation of lexical embeddings with matching and mismatching durational cues. Durational cues were selected due to their substantial role in listeners' quick limitation of the number of lexical candidates for lexical access in natural speech. Results showed that lexical competition increased effort in processing natural stimuli in particular in presence of mismatching cues. Signal degradation reduced listeners’ ability to quickly integrate durational cues in lexical selection, and delayed and prolonged lexical competition. The effort of processing degraded speech was increased overall, and because it had its sources at the pre-lexical level this effect can be attributed to listening to degraded speech rather than to lexical disambiguation. In sum, the course of lexical competition was largely comparable for natural and degraded speech, but showed crucial shifts in timing, and different sources of increased mental effort. We argue that well-timed progress of information from sensory to pre-lexical and lexical stages of processing, which is the result of perceptual adaptation during speech development, is the reason why

  18. The Timing and Effort of Lexical Access in Natural and Degraded Speech.

    Wagner, Anita E; Toffanin, Paolo; Başkent, Deniz

    2016-01-01

    Understanding speech is effortless in ideal situations, and although adverse conditions, such as caused by hearing impairment, often render it an effortful task, they do not necessarily suspend speech comprehension. A prime example of this is speech perception by cochlear implant users, whose hearing prostheses transmit speech as a significantly degraded signal. It is yet unknown how mechanisms of speech processing deal with such degraded signals, and whether they are affected by effortful processing of speech. This paper compares the automatic process of lexical competition between natural and degraded speech, and combines gaze fixations, which capture the course of lexical disambiguation, with pupillometry, which quantifies the mental effort involved in processing speech. Listeners' ocular responses were recorded during disambiguation of lexical embeddings with matching and mismatching durational cues. Durational cues were selected due to their substantial role in listeners' quick limitation of the number of lexical candidates for lexical access in natural speech. Results showed that lexical competition increased mental effort in processing natural stimuli in particular in presence of mismatching cues. Signal degradation reduced listeners' ability to quickly integrate durational cues in lexical selection, and delayed and prolonged lexical competition. The effort of processing degraded speech was increased overall, and because it had its sources at the pre-lexical level this effect can be attributed to listening to degraded speech rather than to lexical disambiguation. In sum, the course of lexical competition was largely comparable for natural and degraded speech, but showed crucial shifts in timing, and different sources of increased mental effort. We argue that well-timed progress of information from sensory to pre-lexical and lexical stages of processing, which is the result of perceptual adaptation during speech development, is the reason why in ideal

  19. Hate Speech/Free Speech: Using Feminist Perspectives To Foster On-Campus Dialogue.

    Cornwell, Nancy; Orbe, Mark P.; Warren, Kiesha

    1999-01-01

    Explores the complex issues inherent in the tension between hate speech and free speech, focusing on the phenomenon of hate speech on college campuses. Describes the challenges to hate speech made by critical race theorists and explains how a feminist critique can reorient the parameters of hate speech. (SLD)

  20. Neurophysiological Evidence That Musical Training Influences the Recruitment of Right Hemispheric Homologues for Speech Perception

    McNeelGordonJantzen

    2014-03-01

    Full Text Available Musicians have a more accurate temporal and tonal representation of auditory stimuli than their non-musician counterparts (Kraus & Chandrasekaran, 2010; Parbery-Clark, Skoe, & Kraus, 2009; Zendel & Alain, 2008; Musacchia, Sams, Skoe, & Kraus, 2007. Musicians who are adept at the production and perception of music are also more sensitive to key acoustic features of speech such as voice onset timing and pitch. Together, these data suggest that musical training may enhance the processing of acoustic information for speech sounds. In the current study, we sought to provide neural evidence that musicians process speech and music in a similar way. We hypothesized that for musicians, right hemisphere areas traditionally associated with music are also engaged for the processing of speech sounds. In contrast we predicted that in non-musicians processing of speech sounds would be localized to traditional left hemisphere language areas. Speech stimuli differing in voice onset time was presented using a dichotic listening paradigm. Subjects either indicated aural location for a specified speech sound or identified a specific speech sound from a directed aural location. Musical training effects and organization of acoustic features were reflected by activity in source generators of the P50. This included greater activation of right middle temporal gyrus (MTG and superior temporal gyrus (STG in musicians. The findings demonstrate recruitment of right hemisphere in musicians for discriminating speech sounds and a putative broadening of their language network. Musicians appear to have an increased sensitivity to acoustic features and enhanced selective attention to temporal features of speech that is facilitated by musical training and supported, in part, by right hemisphere homologues of established speech processing regions of the brain.

  1. A window into the intoxicated mind? Speech as an index of psychoactive drug effects.

    Bedi, Gillinder; Cecchi, Guillermo A; Slezak, Diego F; Carrillo, Facundo; Sigman, Mariano; de Wit, Harriet

    2014-09-01

    Abused drugs can profoundly alter mental states in ways that may motivate drug use. These effects are usually assessed with self-report, an approach that is vulnerable to biases. Analyzing speech during intoxication may present a more direct, objective measure, offering a unique 'window' into the mind. Here, we employed computational analyses of speech semantic and topological structure after ±3,4-methylenedioxymethamphetamine (MDMA; 'ecstasy') and methamphetamine in 13 ecstasy users. In 4 sessions, participants completed a 10-min speech task after MDMA (0.75 and 1.5 mg/kg), methamphetamine (20 mg), or placebo. Latent Semantic Analyses identified the semantic proximity between speech content and concepts relevant to drug effects. Graph-based analyses identified topological speech characteristics. Group-level drug effects on semantic distances and topology were assessed. Machine-learning analyses (with leave-one-out cross-validation) assessed whether speech characteristics could predict drug condition in the individual subject. Speech after MDMA (1.5 mg/kg) had greater semantic proximity than placebo to the concepts friend, support, intimacy, and rapport. Speech on MDMA (0.75 mg/kg) had greater proximity to empathy than placebo. Conversely, speech on methamphetamine was further from compassion than placebo. Classifiers discriminated between MDMA (1.5 mg/kg) and placebo with 88% accuracy, and MDMA (1.5 mg/kg) and methamphetamine with 84% accuracy. For the two MDMA doses, the classifier performed at chance. These data suggest that automated semantic speech analyses can capture subtle alterations in mental state, accurately discriminating between drugs. The findings also illustrate the potential for automated speech-based approaches to characterize clinically relevant alterations to mental state, including those occurring in psychiatric illness. PMID:24694926

  2. The Stylistic Analysis of Public Speech

    李龙

    2011-01-01

    Public speech is a very important part in our daily life.The ability to deliver a good public speech is something we need to learn and to have,especially,in the service sector.This paper attempts to analyze the style of public speech,in the hope of providing inspiration to us whenever delivering such a speech.

  3. Phonetic Recalibration Only Occurs in Speech Mode

    Vroomen, Jean; Baart, Martijn

    2009-01-01

    Upon hearing an ambiguous speech sound dubbed onto lipread speech, listeners adjust their phonetic categories in accordance with the lipread information (recalibration) that tells what the phoneme should be. Here we used sine wave speech (SWS) to show that this tuning effect occurs if the SWS sounds are perceived as speech, but not if the sounds…

  4. Infant Perception of Atypical Speech Signals

    Vouloumanos, Athena; Gelfand, Hanna M.

    2013-01-01

    The ability to decode atypical and degraded speech signals as intelligible is a hallmark of speech perception. Human adults can perceive sounds as speech even when they are generated by a variety of nonhuman sources including computers and parrots. We examined how infants perceive the speech-like vocalizations of a parrot. Further, we examined how…

  5. From data to speech: a general approach

    Theune, M.; Klabbers, E.A.M.; Pijper, de J.R.; Krahmer, E.; Odijk, J.; Boguraev, B.; Tait, J.; Jacquemin, C.

    2001-01-01

    We present a data-to-speech system called D2S, which can be used for the creation of data-to-speech systems in different languages and domains. The most important characteristic of a data-to-speech system is that it combines language and speech generation: language generation is used to produce a na

  6. Automated Speech Rate Measurement in Dysarthria

    Martens, Heidi; Dekens, Tomas; Van Nuffelen, Gwen; Latacz, Lukas; Verhelst, Werner; De Bodt, Marc

    2015-01-01

    Purpose: In this study, a new algorithm for automated determination of speech rate (SR) in dysarthric speech is evaluated. We investigated how reliably the algorithm calculates the SR of dysarthric speech samples when compared with calculation performed by speech-language pathologists. Method: The new algorithm was trained and tested using Dutch…

  7. Voice liveness detection algorithms based on pop noise caused by human breath for automatic speaker verification

    Shiota, Sayaka; Villavicencio, Fernando; Yamagishi, Junichi; Ono, Nobutaka; Echizen, Isao; Matsui, Tomoko

    2015-01-01

    This paper proposes a novel countermeasure framework to detect spoofing attacks to reduce the vulnerability of automatic speaker verification (ASV) systems. Recently, ASV systems have reached equivalent performances equivalent to those of other biometric modalities. However, spoofing techniques against these systems have also progressed drastically. Experimentation using advanced speech synthesis and voice conversion techniques has showed unacceptable false acceptance rates and several new co...

  8. Visualizing structures of speech expressiveness

    Herbelin, Bruno; Jensen, Karl Kristoffer; Graugaard, Lars

    2008-01-01

    Speech is both beautiful and informative. In this work, a conceptual study of the speech, through investigation of the tower of Babel, the archetypal phonemes, and a study of the reasons of uses of language is undertaken in order to create an artistic work investigating the nature of speech. The...... Babel myth speaks about distance created when aspiring to the heaven as the reason for language division. Meanwhile, Locquin states through thorough investigations that only a few phonemes are present throughout history. Our interpretation is that a system able to recognize archetypal phonemes through...

  9. Speech enhancement theory and practice

    Loizou, Philipos C

    2013-01-01

    With the proliferation of mobile devices and hearing devices, including hearing aids and cochlear implants, there is a growing and pressing need to design algorithms that can improve speech intelligibility without sacrificing quality. Responding to this need, Speech Enhancement: Theory and Practice, Second Edition introduces readers to the basic problems of speech enhancement and the various algorithms proposed to solve these problems. Updated and expanded, this second edition of the bestselling textbook broadens its scope to include evaluation measures and enhancement algorithms aimed at impr

  10. Speech recovery device

    Frankle, Christen M.

    2004-04-20

    There is provided an apparatus and method for assisting speech recovery in people with inability to speak due to aphasia, apraxia or another condition with similar effect. A hollow, rigid, thin-walled tube with semi-circular or semi-elliptical cut out shapes at each open end is positioned such that one end mates with the throat/voice box area of the neck of the assistor and the other end mates with the throat/voice box area of the assisted. The speaking person (assistor) makes sounds that produce standing wave vibrations at the same frequency in the vocal cords of the assisted person. Driving the assisted person's vocal cords with the assisted person being able to hear the correct tone enables the assisted person to speak by simply amplifying the vibration of membranes in their throat.

  11. Speech Enhancement via EMD

    Monia Turki-Hadj Alouane

    2008-06-01

    Full Text Available In this study, two new approaches for speech signal noise reduction based on the empirical mode decomposition (EMD recently introduced by Huang et al. (1998 are proposed. Based on the EMD, both reduction schemes are fully data-driven approaches. Noisy signal is decomposed adaptively into oscillatory components called intrinsic mode functions (IMFs, using a temporal decomposition called sifting process. Two strategies for noise reduction are proposed: filtering and thresholding. The basic principle of these two methods is the signal reconstruction with IMFs previously filtered, using the minimum mean-squared error (MMSE filter introduced by I. Y. Soon et al. (1998, or thresholded using a shrinkage function. The performance of these methods is analyzed and compared with those of the MMSE filter and wavelet shrinkage. The study is limited to signals corrupted by additive white Gaussian noise. The obtained results show that the proposed denoising schemes perform better than the MMSE filter and wavelet approach.

  12. Silog: Speech Input Logon

    Grau, Sergio; Allen, Tony; Sherkat, Nasser

    Silog is a biometrie authentication system that extends the conventional PC logon process using voice verification. Users enter their ID and password using a conventional Windows logon procedure but then the biometrie authentication stage makes a Voice over IP (VoIP) call to a VoiceXML (VXML) server. User interaction with this speech-enabled component then allows the user's voice characteristics to be extracted as part of a simple user/system spoken dialogue. If the captured voice characteristics match those of a previously registered voice profile, then network access is granted. If no match is possible, then a potential unauthorised system access has been detected and the logon process is aborted.

  13. Speech recovery device

    Frankle, Christen M.

    2000-10-19

    There is provided an apparatus and method for assisting speech recovery in people with inability to speak due to aphasia, apraxia or another condition with similar effect. A hollow, rigid, thin-walled tube with semi-circular or semi-elliptical cut out shapes at each open end is positioned such that one end mates with the throat/voice box area of the neck of the assistor and the other end mates with the throat/voice box area of the assisted. The speaking person (assistor) makes sounds that produce standing wave vibrations at the same frequency in the vocal cords of the assisted person. Driving the assisted person's vocal cords with the assisted person being able to hear the correct tone enables the assisted person to speak by simply amplifying the vibration of membranes in their throat.

  14. Speech is Golden

    Juel Henrichsen, Peter

    2014-01-01

    on the supply side. The present article reports on a new public action strategy which has taken shape in the course of 2013-14. While Denmark is a small language area, our public sector is well organised and has considerable purchasing power. Across this past year, Danish local authorities have...... organised around the speech technology challenge, they have formulated a number of joint questions and new requirements to be met by suppliers and have deliberately worked towards formulating tendering material which will allow fair competition. Public researchers have contributed to this work, including...... the author of the present article, in the role of economically neutral advisers. The aim of the initiative is to pave the way for the first profitable contract in the field - which we hope to see in 2014 - an event which would precisely break the present deadlock and open up a billion EUR market for...

  15. Microphone Array Speech Recognition : Experiments on Overlapping Speech in Meetings

    Moore, Darren; McCowan, Iain A.

    2002-01-01

    This paper investigates the use of microphone arrays to acquire and recognise speech in meetings. Meetings pose several interesting problems for speech processing, as they consist of multiple competing speakers within a small space, typically around a table. Due to their ability to provide hands-free acquisition and directional discrimination, microphone arrays present a potential alternative to close-talking microphones in such an application. We first propose an appropriate microphone array...

  16. Development and Evaluation of the Emotional Slovenian Speech Database – EmoLUKS

    Tadej Justin

    2015-12-01

    Full Text Available The paper describes development of the Slovenian emotional speech database for its primary use in speech synthesis. We also explore the potential of additional annotation for extending it for the use in emotion recognition tasks. The paper focus in methodology for annotating paralingual speaker information on the example of annotating speaker emotions in Slovenian radio dramas. Emotional speech database EmoLUKS was built from speech material of 17 Slovenian radio dramas. We obtained them from the national radio-and-television station (RTV Slovenia, which were given to the universities disposal with an academic license for processing and annotating the audio material. The utterances of one male and one female speaker were transcribed, segmented and then annotated with emotional states. The annotation of the emotional states was conducted in two stages with our own web-based application for crowdsourcing. Annotating assessments in different time periods with same volunteers allows us to compare the obtained annotator’s decisions, therefore we report about annotator’s decisions consistency. Based on annotators majority vote of each annotated utterance we label speech material and join it to emotional speech database named Emo-LUKS. The material currently consists of 1385 recordings from one male (975 recordings and one female (410 recordings speaker and contains labeled emotional speech with a total duration of around 1 hour and 15 minutes. The paper presents the two-stage annotation process used to label the data and demonstrates the usefulness of used annotation methodology. We evaluate the consistency of the annotated speech material with the speaker dependent automatic emotion recognition system. The reported results are presented with the un-weighted as well as weighted average recalls and precisions for 2-class and 7-class recognition experiments. Results additionally conrms our presumption, that emotional speech database despite its

  17. Automatic Sequencing for Experimental Protocols

    Hsieh, Paul F.; Stern, Ivan

    We present a paradigm and implementation of a system for the specification of the experimental protocols to be used for the calibration of AXAF mirrors. For the mirror calibration, several thousand individual measurements need to be defined. For each measurement, over one hundred parameters need to be tabulated for the facility test conductor and several hundred instrument parameters need to be set. We provide a high level protocol language which allows for a tractable representation of the measurement protocol. We present a procedure dispatcher which automatically sequences a protocol more accurately and more rapidly than is possible by an unassisted human operator. We also present back-end tools to generate printed procedure manuals and database tables required for review by the AXAF program. This paradigm has been tested and refined in the calibration of detectors to be used in mirror calibration.

  18. What automaticity deficit? Activation of lexical information by readers with dyslexia in a rapid automatized naming Stroop-switch task.

    Jones, Manon W; Snowling, Margaret J; Moll, Kristina

    2016-03-01

    Reading fluency is often predicted by rapid automatized naming (RAN) speed, which as the name implies, measures the automaticity with which familiar stimuli (e.g., letters) can be retrieved and named. Readers with dyslexia are considered to have less "automatized" access to lexical information, reflected in longer RAN times compared with nondyslexic readers. We combined the RAN task with a Stroop-switch manipulation to test the automaticity of dyslexic and nondyslexic readers' lexical access directly within a fluency task. Participants named letters in 10 × 4 arrays while eye movements and speech responses were recorded. Upon fixation, specific letter font colors changed from black to a different color, whereupon the participant was required to rapidly switch from naming the letter to naming the letter color. We could therefore measure reading group differences on "automatic" lexical processing, insofar as it was task-irrelevant. Readers with dyslexia showed obligatory lexical processing and a timeline for recognition that was overall similar to typical readers, but a delay emerged in the output (naming) phase. Further delay was caused by visual-orthographic competition between neighboring stimuli. Our findings outline the specific processes involved when researchers speak of "impaired automaticity" in dyslexic readers' fluency, and are discussed in the context of the broader literature in this field. (PsycINFO Database Record PMID:26414305

  19. Automatic Implantable Cardiac Defibrillator

    Full Text Available Automatic Implantable Cardiac Defibrillator February 19, 2009 Halifax Health Medical Center, Daytona Beach, FL Welcome to Halifax Health Daytona Beach, Florida. Over the next hour you' ...

  20. Automatic Payroll Deposit System.

    Davidson, D. B.

    1979-01-01

    The Automatic Payroll Deposit System in Yakima, Washington's Public School District No. 7, directly transmits each employee's salary amount for each pay period to a bank or other financial institution. (Author/MLF)

  1. Automatic Arabic Text Classification

    Al-harbi, S; Almuhareb, A.; Al-Thubaity , A; Khorsheed, M. S.; Al-Rajeh, A.

    2008-01-01

    Automated document classification is an important text mining task especially with the rapid growth of the number of online documents present in Arabic language. Text classification aims to automatically assign the text to a predefined category based on linguistic features. Such a process has different useful applications including, but not restricted to, e-mail spam detection, web page content filtering, and automatic message routing. This paper presents the results of experiments on documen...

  2. Delayed Speech or Language Development

    ... distinction between the two: Speech is the verbal expression of language and includes articulation, which is the ... sounds or words repeatedly and can't use oral language to communicate more than his or her ...

  3. Emotion Recognition using Speech Features

    Rao, K Sreenivasa

    2013-01-01

    “Emotion Recognition Using Speech Features” covers emotion-specific features present in speech and discussion of suitable models for capturing emotion-specific information for distinguishing different emotions.  The content of this book is important for designing and developing  natural and sophisticated speech systems. Drs. Rao and Koolagudi lead a discussion of how emotion-specific information is embedded in speech and how to acquire emotion-specific knowledge using appropriate statistical models. Additionally, the authors provide information about using evidence derived from various features and models. The acquired emotion-specific knowledge is useful for synthesizing emotions. Discussion includes global and local prosodic features at syllable, word and phrase levels, helpful for capturing emotion-discriminative information; use of complementary evidences obtained from excitation sources, vocal tract systems and prosodic features in order to enhance the emotion recognition performance;  and pro...

  4. Speech and Language Developmental Milestones

    ... of “brain plasticity”—the ways in which the brain is influenced by health conditions or life experiences—and how it can be used to develop learning strategies that encourage healthy language and speech development in ...

  5. HarkMan-A Vocabulary-Independent Keyword Spotter for Spontaneous Chinese Speech

    ZHENG Fang; XU Mingxing; MOU Xiaolong; WU Jian; WU Wenhu; FANG Ditang

    1999-01-01

    In this paper, a novel technique adopted in HarkMan is introduced. HarkMan is a keyword-spotter designed to automatically spot the given words of avocabulary-independent task in unconstrained Chinese telephone speech. The speaking manner and the number of keywords are not limited. This paper focuses on the novel technique which addresses acoustic modeling, keyword spotting network, search strategies, robustness, and rejection.The underlying technologies used in HarkMan given in this paper are useful not only for keyword spotting but also for continuous speech recognition. The system has achieved a figure-of-merit value over 90%.

  6. Environment-dependent denoising autoencoder for distant-talking speech recognition

    Ueda, Yuma; Wang, Longbiao; Kai, Atsuhiko; Ren, Bo

    2015-12-01

    In this paper, we propose an environment-dependent denoising autoencoder (DAE) and automatic environment identification based on a deep neural network (DNN) with blind reverberation estimation for robust distant-talking speech recognition. Recently, DAEs have been shown to be effective in many noise reduction and reverberation suppression applications because higher-level representations and increased flexibility of the feature mapping function can be learned. However, a DAE is not adequate in mismatched training and test environments. In a conventional DAE, parameters are trained using pairs of reverberant speech and clean speech under various acoustic conditions (that is, an environment-independent DAE). To address the above problem, we propose two environment-dependent DAEs to reduce the influence of mismatches between training and test environments. In the first approach, we train various DAEs using speech from different acoustic environments, and the DAE for the condition that best matches the test condition is automatically selected (that is, a two-step environment-dependent DAE). To improve environment identification performance, we propose a DNN that uses both reverberant speech and estimated reverberation. In the second approach, we add estimated reverberation features to the input of the DAE (that is, a one-step environment-dependent DAE or a reverberation-aware DAE). The proposed method is evaluated using speech in simulated and real reverberant environments. Experimental results show that the environment-dependent DAE outperforms the environment-independent one in both simulated and real reverberant environments. For two-step environment-dependent DAE, the performance of environment identification based on the proposed DNN approach is also better than that of the conventional DNN approach, in which only reverberant speech is used and reverberation is not blindly estimated. And, the one-step environment-dependent DAE significantly outperforms the two

  7. Can blind persons accurately assess body size from the voice?

    Pisanski, Katarzyna; Oleszkiewicz, Anna; Sorokowska, Agnieszka

    2016-04-01

    Vocal tract resonances provide reliable information about a speaker's body size that human listeners use for biosocial judgements as well as speech recognition. Although humans can accurately assess men's relative body size from the voice alone, how this ability is acquired remains unknown. In this study, we test the prediction that accurate voice-based size estimation is possible without prior audiovisual experience linking low frequencies to large bodies. Ninety-one healthy congenitally or early blind, late blind and sighted adults (aged 20-65) participated in the study. On the basis of vowel sounds alone, participants assessed the relative body sizes of male pairs of varying heights. Accuracy of voice-based body size assessments significantly exceeded chance and did not differ among participants who were sighted, or congenitally blind or who had lost their sight later in life. Accuracy increased significantly with relative differences in physical height between men, suggesting that both blind and sighted participants used reliable vocal cues to size (i.e. vocal tract resonances). Our findings demonstrate that prior visual experience is not necessary for accurate body size estimation. This capacity, integral to both nonverbal communication and speech perception, may be present at birth or may generalize from broader cross-modal correspondences. PMID:27095264

  8. Lattice Parsing for Speech Recognition

    Chappelier, Jean-Cédric; Rajman, Martin; Aragües, Ramon; Rozenknop, Antoine

    1999-01-01

    A lot of work remains to be done in the domain of a better integration of speech recognition and language processing systems. This paper gives an overview of several strategies for integrating linguistic models into speech understanding systems and investigates several ways of producing sets of hypotheses that include more "semantic" variability than usual language models. The main goal is to present and demonstrate by actual experiments that sequential couplingmay be efficiently achieved byw...

  9. Implementation of Phonetic Context Variable Length Unit Selection Module for Malay Text to Speech

    Tian-Swee Tan

    2008-01-01

    Full Text Available Problem statement: The main problem with current Malay Text-To-Speech (MTTS synthesis system is the poor quality of the generated speech sound due to the inability of traditional TTS system to provide multiple choices of unit for generating more accurate synthesized speech. Approach: This study proposes a phonetic context variable length unit selection MTTS system that is capable of providing more natural and accurate unit selection for synthesized speech. It implemented a phonetic context algorithm for unit selection for MTTS. The unit selection method (without phonetic context may encounter the problem of selecting the speech unit from different sources and affect the quality of concatenation. This study proposes the design of speech corpus and unit selection method according to phonetic context so that it can select a string of continuous phoneme from same source instead of individual phoneme from different sources. This can further reduce the concatenation point and increase the quality of concatenation. The speech corpus was transcribed according to phonetic context to preserve the phonetic information. This method utilizes word base concatenation method. Firstly it will search through the speech corpus for the target word, if the target is found; it will be used for concatenation. If the word does not exist, then it will construct the words from phoneme sequence. Results: This system had been tested with 40 participants in Mean Opinion Score (MOS listening test with the average rates for naturalness, pronunciation and intelligibility are 3.9, 4.1 and 3.9. Conclusion/Recommendation: Through this study, a very first version of Corpus-based MTTS has been designed; it has improved the naturalness, pronunciation and intelligibility of synthetic speech. But it still has some lacking that need to be perfected such as the prosody module to support the phrasing analysis and intonation of input text to match with the waveform modifier.

  10. Automatic Chessboard Detection for Intrinsic and Extrinsic Camera Parameter Calibration

    Jose María Armingol; Arturo de la Escalera

    2010-01-01

    There are increasing applications that require precise calibration of cameras to perform accurate measurements on objects located within images, and an automatic algorithm would reduce this time consuming calibration procedure. The method proposed in this article uses a pattern similar to that of a chess board, which is found automatically in each image, when no information regarding the number of rows or columns is supplied to aid its detection. This is carried out by means of a combined ana...

  11. JAABA: interactive machine learning for automatic annotation of animal behavior

    Kabra, Mayank; Robie, Alice A.; Rivera-Alba, Marta; Branson, Steven; Branson, Kristin

    2013-01-01

    We present a machine learning-based system for automatically computing interpretable, quantitative measures of animal behavior. Through our interactive system, users encode their intuition about behavior by annotating a small set of video frames. These manual labels are converted into classifiers that can automatically annotate behaviors in screen-scale data sets. Our general-purpose system can create a variety of accurate individual and social behavior classifiers for different organisms, in...

  12. Effect of speaking styles on the relevance of the Speech Transmission Index in the presence of reverberation

    van Wijngaarden, Sander J.; Houtgast, Tammo

    2003-04-01

    The Speech Transmission Index (STI) predicts speech intelligibility under various speech degrading conditions, including noise and reverberation. Measurements of sentence intelligibility in noise and reverberation, for three different talkers adopting different speaking styles, indicate that the STI is sometimes inaccurate in the presence of reverberation. For a trained talker speaking clearly, the STI predictions are accurate, but for conversational speech by untrained talkers, the effect of reverberation is underestimated. By using a wider range of modulation frequencies than prescribed for the standardized STI calculation (up to 31.5 Hz instead of 12.5 Hz), more accurate predictions are obtained for conversational speech. This can be understood by comparing the envelope spectrum between talkers and speaking styles: conversational speech tends to spread more of the energy in its envelope spectrum to higher modulation frequencies. Since higher modulation frequencies are more susceptible to reverberation, conversational speech tends to be affected more by reverberation than clear speech. To investigate the range of between-talker variations, envelope spectra were calculated for a population of 134 talkers. Upper boundaries of the STI modulation frequency range of 12.5 and 31.5 Hz seem approximately appropriate for the 5th and 95th percentile of this population.

  13. Neural pathways for visual speech perception

    Lynne E Bernstein

    2014-12-01

    Full Text Available This paper examines the questions, what levels of speech can be perceived visually, and how is visual speech represented by the brain? Review of the literature leads to the conclusions that every level of psycholinguistic speech structure (i.e., phonetic features, phonemes, syllables, words, and prosody can be perceived visually, although individuals differ in their abilities to do so; and that there are visual modality-specific representations of speech qua speech in higher-level vision brain areas. That is, the visual system represents the modal patterns of visual speech. The suggestion that the auditory speech pathway receives and represents visual speech is examined in light of neuroimaging evidence on the auditory speech pathways. We outline the generally agreed-upon organization of the visual ventral and dorsal pathways and examine several types of visual processing that might be related to speech through those pathways, specifically, face and body, orthography, and sign language processing. In this context, we examine the visual speech processing literature, which reveals widespread diverse patterns activity in posterior temporal cortices in response to visual speech stimuli. We outline a model of the visual and auditory speech pathways and make several suggestions: (1 The visual perception of speech relies on visual pathway representations of speech qua speech. (2 A proposed site of these representations, the temporal visual speech area (TVSA has been demonstrated in posterior temporal cortex, ventral and posterior to multisensory posterior superior temporal sulcus (pSTS. (3 Given that visual speech has dynamic and configural features, its representations in feedforward visual pathways are expected to integrate these features, possibly in TVSA.

  14. Auditory Sensitivity, Speech Perception, L1 Chinese, and L2 English Reading Abilities in Hong Kong Chinese Children

    Zhang, Juan; McBride-Chang, Catherine

    2014-01-01

    A 4-stage developmental model, in which auditory sensitivity is fully mediated by speech perception at both the segmental and suprasegmental levels, which are further related to word reading through their associations with phonological awareness, rapid automatized naming, verbal short-term memory and morphological awareness, was tested with…

  15. Investigation of in-vehicle speech intelligibility metrics for normal hearing and hearing impaired listeners

    Samardzic, Nikolina

    The effectiveness of in-vehicle speech communication can be a good indicator of the perception of the overall vehicle quality and customer satisfaction. Currently available speech intelligibility metrics do not account in their procedures for essential parameters needed for a complete and accurate evaluation of in-vehicle speech intelligibility. These include the directivity and the distance of the talker with respect to the listener, binaural listening, hearing profile of the listener, vocal effort, and multisensory hearing. In the first part of this research the effectiveness of in-vehicle application of these metrics is investigated in a series of studies to reveal their shortcomings, including a wide range of scores resulting from each of the metrics for a given measurement configuration and vehicle operating condition. In addition, the nature of a possible correlation between the scores obtained from each metric is unknown. The metrics and the subjective perception of speech intelligibility using, for example, the same speech material have not been compared in literature. As a result, in the second part of this research, an alternative method for speech intelligibility evaluation is proposed for use in the automotive industry by utilizing a virtual reality driving environment for ultimately setting targets, including the associated statistical variability, for future in-vehicle speech intelligibility evaluation. The Speech Intelligibility Index (SII) was evaluated at the sentence Speech Receptions Threshold (sSRT) for various listening situations and hearing profiles using acoustic perception jury testing and a variety of talker and listener configurations and background noise. In addition, the effect of individual sources and transfer paths of sound in an operating vehicle to the vehicle interior sound, specifically their effect on speech intelligibility was quantified, in the framework of the newly developed speech intelligibility evaluation method. Lastly

  16. Automatic River Network Extraction from LIDAR Data

    Maderal, E. N.; Valcarcel, N.; Delgado, J.; Sevilla, C.; Ojeda, J. C.

    2016-06-01

    National Geographic Institute of Spain (IGN-ES) has launched a new production system for automatic river network extraction for the Geospatial Reference Information (GRI) within hydrography theme. The goal is to get an accurate and updated river network, automatically extracted as possible. For this, IGN-ES has full LiDAR coverage for the whole Spanish territory with a density of 0.5 points per square meter. To implement this work, it has been validated the technical feasibility, developed a methodology to automate each production phase: hydrological terrain models generation with 2 meter grid size and river network extraction combining hydrographic criteria (topographic network) and hydrological criteria (flow accumulation river network), and finally the production was launched. The key points of this work has been managing a big data environment, more than 160,000 Lidar data files, the infrastructure to store (up to 40 Tb between results and intermediate files), and process; using local virtualization and the Amazon Web Service (AWS), which allowed to obtain this automatic production within 6 months, it also has been important the software stability (TerraScan-TerraSolid, GlobalMapper-Blue Marble , FME-Safe, ArcGIS-Esri) and finally, the human resources managing. The results of this production has been an accurate automatic river network extraction for the whole country with a significant improvement for the altimetric component of the 3D linear vector. This article presents the technical feasibility, the production methodology, the automatic river network extraction production and its advantages over traditional vector extraction systems.

  17. Speech vs. singing: infants choose happier sounds

    Corbeil, Marieve; Trehub, Sandra E.; Peretz, Isabelle

    2013-01-01

    Infants prefer speech to non-vocal sounds and to non-human vocalizations, and they prefer happy-sounding speech to neutral speech. They also exhibit an interest in singing, but there is little knowledge of their relative interest in speech and singing. The present study explored infants' attention to unfamiliar audio samples of speech and singing. In Experiment 1, infants 4–13 months of age were exposed to happy-sounding infant-directed speech vs. hummed lullabies by the same woman. They list...

  18. Speech versus singing: Infants choose happier sounds

    Marieve eCorbeil; Trehub, Sandra E.; Isabelle ePeretz

    2013-01-01

    Infants prefer speech to non-vocal sounds and to non-human vocalizations, and they prefer happy-sounding speech to neutral speech. They also exhibit an interest in singing, but there is little knowledge of their relative interest in speech and singing. The present study explored infants’ attention to unfamiliar audio samples of speech and singing. In Experiment 1, infants 4-13 months of age were exposed to happy-sounding infant-directed speech versus hummed lullabies by the same woman. They l...

  19. APPLICATION OF WAVELET TRANSFORM FOR SPEECH PROCESSING

    SUBHRA DEBDAS

    2011-08-01

    Full Text Available The distinctive feature of wavelet transforms applications it is used in speech signals. Some problem are produced in speech signals their synthesis, analysis compression and classification. A method being evaluated uses wavelets for speech analysis and synthesis distinguishing between voiced and unvoiced speech, determining pitch, and methods for choosing optimum wavelets for speech compression are discussed. This comparative perception results that are obtained by listening to the synthesized speech using both scalar and vector quantized wavelet parameters are reported in this paper.

  20. A novel automated image analysis method for accurate adipocyte quantification

    Osman, Osman S.; Selway, Joanne L; Kępczyńska, Małgorzata A; Stocker, Claire J.; O’Dowd, Jacqueline F; Cawthorne, Michael A.; Arch, Jonathan RS; Jassim, Sabah; Langlands, Kenneth

    2013-01-01

    Increased adipocyte size and number are associated with many of the adverse effects observed in metabolic disease states. While methods to quantify such changes in the adipocyte are of scientific and clinical interest, manual methods to determine adipocyte size are both laborious and intractable to large scale investigations. Moreover, existing computational methods are not fully automated. We, therefore, developed a novel automatic method to provide accurate measurements of the cross-section...

  1. Building with Drones: Accurate 3D Facade Reconstruction using MAVs

    Daftry, Shreyansh; Hoppe, Christof; Bischof, Horst

    2015-01-01

    Automatic reconstruction of 3D models from images using multi-view Structure-from-Motion methods has been one of the most fruitful outcomes of computer vision. These advances combined with the growing popularity of Micro Aerial Vehicles as an autonomous imaging platform, have made 3D vision tools ubiquitous for large number of Architecture, Engineering and Construction applications among audiences, mostly unskilled in computer vision. However, to obtain high-resolution and accurate reconstruc...

  2. Strategy for accurate liver intervention by an optical tracking system

    Lin, Qinyong; Yang, Rongqian; Cai, Ken; Guan, Peifeng; Xiao, Weihu; Wu, Xiaoming

    2015-01-01

    Image-guided navigation for radiofrequency ablation of liver tumors requires the accurate guidance of needle insertion into a tumor target. The main challenge of image-guided navigation for radiofrequency ablation of liver tumors is the occurrence of liver deformations caused by respiratory motion. This study reports a strategy of real-time automatic registration to track custom fiducial markers glued onto the surface of a patient’s abdomen to find the respiratory phase, in which the static p...

  3. Emotional speech processing at the intersection of prosody and semantics.

    Schwartz, Rachel; Pell, Marc D

    2012-01-01

    The ability to accurately perceive emotions is crucial for effective social interaction. Many questions remain regarding how different sources of emotional cues in speech (e.g., prosody, semantic information) are processed during emotional communication. Using a cross-modal emotional priming paradigm (Facial affect decision task), we compared the relative contributions of processing utterances with single-channel (prosody-only) versus multi-channel (prosody and semantic) cues on the perception of happy, sad, and angry emotional expressions. Our data show that emotional speech cues produce robust congruency effects on decisions about an emotionally related face target, although no processing advantage occurred when prime stimuli contained multi-channel as opposed to single-channel speech cues. Our data suggest that utterances with prosodic cues alone and utterances with combined prosody and semantic cues both activate knowledge that leads to emotional congruency (priming) effects, but that the convergence of these two information sources does not always heighten access to this knowledge during emotional speech processing. PMID:23118868

  4. Musical melody and speech intonation: singing a different tune.

    Robert J Zatorre

    Full Text Available Music and speech are often cited as characteristically human forms of communication. Both share the features of hierarchical structure, complex sound systems, and sensorimotor sequencing demands, and both are used to convey and influence emotions, among other functions [1]. Both music and speech also prominently use acoustical frequency modulations, perceived as variations in pitch, as part of their communicative repertoire. Given these similarities, and the fact that pitch perception and production involve the same peripheral transduction system (cochlea and the same production mechanism (vocal tract, it might be natural to assume that pitch processing in speech and music would also depend on the same underlying cognitive and neural mechanisms. In this essay we argue that the processing of pitch information differs significantly for speech and music; specifically, we suggest that there are two pitch-related processing systems, one for more coarse-grained, approximate analysis and one for more fine-grained accurate representation, and that the latter is unique to music. More broadly, this dissociation offers clues about the interface between sensory and motor systems, and highlights the idea that multiple processing streams are a ubiquitous feature of neuro-cognitive architectures.

  5. Head movements encode emotions during speech and song.

    Livingstone, Steven R; Palmer, Caroline

    2016-04-01

    When speaking or singing, vocalists often move their heads in an expressive fashion, yet the influence of emotion on vocalists' head motion is unknown. Using a comparative speech/song task, we examined whether vocalists' intended emotions influence head movements and whether those movements influence the perceived emotion. In Experiment 1, vocalists were recorded with motion capture while speaking and singing each statement with different emotional intentions (very happy, happy, neutral, sad, very sad). Functional data analyses showed that head movements differed in translational and rotational displacement across emotional intentions, yet were similar across speech and song, transcending differences in F0 (varied freely in speech, fixed in song) and lexical variability. Head motion specific to emotional state occurred before and after vocalizations, as well as during sound production, confirming that some aspects of movement were not simply a by-product of sound production. In Experiment 2, observers accurately identified vocalists' intended emotion on the basis of silent, face-occluded videos of head movements during speech and song. These results provide the first evidence that head movements encode a vocalist's emotional intent and that observers decode emotional information from these movements. We discuss implications for models of head motion during vocalizations and applied outcomes in social robotics and automated emotion recognition. PMID:26501928

  6. Objective Speech Quality Measurement Using Statistical Data Mining

    Wai-Yip Chan

    2005-06-01

    Full Text Available Measuring speech quality by machines overcomes two major drawbacks of subjective listening tests, their low speed and high cost. Real-time, accurate, and economical objective measurement of speech quality opens up a wide range of applications that cannot be supported with subjective listening tests. In this paper, we propose a statistical data mining approach to design objective speech quality measurement algorithms. A large pool of perceptual distortion features is extracted from the speech signal. We examine using classification and regression trees (CART and multivariate adaptive regression splines (MARS, separately and jointly, to select the most salient features from the pool, and to construct good estimators of subjective listening quality based on the selected features. We show designs that use perceptually significant features and outperform the state-of-the-art objective measurement algorithm. The designed algorithms are computationally simple, making them suitable for real-time implementation. The proposed design method is scalable with the amount of learning data; thus, performance can be improved with more offline or online training.

  7. Emotional speech processing at the intersection of prosody and semantics.

    Rachel Schwartz

    Full Text Available The ability to accurately perceive emotions is crucial for effective social interaction. Many questions remain regarding how different sources of emotional cues in speech (e.g., prosody, semantic information are processed during emotional communication. Using a cross-modal emotional priming paradigm (Facial affect decision task, we compared the relative contributions of processing utterances with single-channel (prosody-only versus multi-channel (prosody and semantic cues on the perception of happy, sad, and angry emotional expressions. Our data show that emotional speech cues produce robust congruency effects on decisions about an emotionally related face target, although no processing advantage occurred when prime stimuli contained multi-channel as opposed to single-channel speech cues. Our data suggest that utterances with prosodic cues alone and utterances with combined prosody and semantic cues both activate knowledge that leads to emotional congruency (priming effects, but that the convergence of these two information sources does not always heighten access to this knowledge during emotional speech processing.

  8. Event-Synchronous Analysis for Connected-Speech Recognition.

    Morgan, David Peter

    The motivation for event-synchronous speech analysis originates from linear system theory where the speech-source transfer function is excited by an impulse-like driving function. In speech processing, the impulse response obtained from this linear system contains both semantic information and the vocal tract transfer function. Typically, an estimate of the transfer function is obtained via the spectrum by assuming a short-time stationary signal within some analysis window. However, this spectrum is often distorted by the periodic effects which occur when multiple (pitch) impulses are included in the analysis window. One method to remove these effects would be to deconvolve the excitation function from the speech signal to obtain the transfer function. The more attractive approach is to locate and identify the excitation function and synchronize the analysis frame with it. Event-synchronous analysis differs from pitch -synchronous analysis in that there are many events useful for speech recognition which are not pitch excited. In addition, event-synchronous analysis locates the important boundaries between speech events, such as voiced to unvoiced and silence to burst transitions. In asynchronous processing, an analysis frame which contains portions of two adjacent but dissimilar speech events is often so ambiguous as to distort or mask the important "phonetic" features of both events. Thus event-syncronous processing is employed to obtain an accurate spectral estimate and in turn enhance the estimate of the vocal-tract transfer function. Among the issues which have been addressed in implementing an event-synchronous recognition system are those of developing robust event (pitch, burst, etc.) detectors, synchronous-analysis methodologies, more meaningful feature sets, and dynamic programming algorithms for nonlinear time alignment. An advantage of event-synchronous processing is that the improved representation of the transfer function creates an opportunity for

  9. Do We Perceive Others Better than Ourselves? A Perceptual Benefit for Noise-Vocoded Speech Produced by an Average Speaker.

    William L Schuerman

    Full Text Available In different tasks involving action perception, performance has been found to be facilitated when the presented stimuli were produced by the participants themselves rather than by another participant. These results suggest that the same mental representations are accessed during both production and perception. However, with regard to spoken word perception, evidence also suggests that listeners' representations for speech reflect the input from their surrounding linguistic community rather than their own idiosyncratic productions. Furthermore, speech perception is heavily influenced by indexical cues that may lead listeners to frame their interpretations of incoming speech signals with regard to speaker identity. In order to determine whether word recognition evinces similar self-advantages as found in action perception, it was necessary to eliminate indexical cues from the speech signal. We therefore asked participants to identify noise-vocoded versions of Dutch words that were based on either their own recordings or those of a statistically average speaker. The majority of participants were more accurate for the average speaker than for themselves, even after taking into account differences in intelligibility. These results suggest that the speech representations accessed during perception of noise-vocoded speech are more reflective of the input of the speech community, and hence that speech perception is not necessarily based on representations of one's own speech.

  10. Audiovisual integration of speech falters under high attention demands.

    Alsius, Agnès; Navarra, Jordi; Campbell, Ruth; Soto-Faraco, Salvador

    2005-05-10

    One of the most commonly cited examples of human multisensory integration occurs during exposure to natural speech, when the vocal and the visual aspects of the signal are integrated in a unitary percept. Audiovisual association of facial gestures and vocal sounds has been demonstrated in nonhuman primates and in prelinguistic children, arguing for a general basis for this capacity. One critical question, however, concerns the role of attention in such multisensory integration. Although both behavioral and neurophysiological studies have converged on a preattentive conceptualization of audiovisual speech integration, this mechanism has rarely been measured under conditions of high attentional load, when the observers' attention resources are depleted. We tested the extent to which audiovisual integration was modulated by the amount of available attentional resources by measuring the observers' susceptibility to the classic McGurk illusion in a dual-task paradigm. The proportion of visually influenced responses was severely, and selectively, reduced if participants were concurrently performing an unrelated visual or auditory task. In contrast with the assumption that crossmodal speech integration is automatic, our results suggest that these multisensory binding processes are subject to attentional demands. PMID:15886102

  11. Merge-Weighted Dynamic Time Warping for Speech Recognition

    张湘莉兰; 骆志刚; 李明

    2014-01-01

    Obtaining training material for rarely used English words and common given names from countries where English is not spoken is difficult due to excessive time, storage and cost factors. By considering personal privacy, language-independent (LI) with lightweight speaker-dependent (SD) automatic speech recognition (ASR) is a convenient option to solve the problem. The dynamic time warping (DTW) algorithm is the state-of-the-art algorithm for small-footprint SD ASR for real-time applications with limited storage and small vocabularies. These applications include voice dialing on mobile devices, menu-driven recognition, and voice control on vehicles and robotics. However, traditional DTW has several limitations, such as high computational complexity, constraint induced coarse approximation, and inaccuracy problems. In this paper, we introduce the merge-weighted dynamic time warping (MWDTW) algorithm. This method defines a template confidence index for measuring the similarity between merged training data and testing data, while following the core DTW process. MWDTW is simple, efficient, and easy to implement. With extensive experiments on three representative SD speech recognition datasets, we demonstrate that our method outperforms DTW, DTW on merged speech data, the hidden Markov model (HMM) significantly, and is also six times faster than DTW overall.

  12. Objective Gender and Age Recognition from Speech Sentences

    Fatima K. Faek

    2015-10-01

    Full Text Available In this work, an automatic gender and age recognizer from speech is investigated. The relevant features to gender recognition are selected from the first four formant frequencies and twelve MFCCs and feed the SVM classifier. While the relevant features to age has been used with k-NN classifier for the age recognizer model, using MATLAB as a simulation tool. A special selection of robust features is used in this work to improve the results of the gender and age classifiers based on the frequency range that the feature represents. The gender and age classification algorithms are evaluated using 114 (clean and noisy speech samples uttered in Kurdish language. The model of two classes (adult males and adult females gender recognition, reached 96% recognition accuracy. While for three categories classification (adult males, adult females, and children, the model achieved 94% recognition accuracy. For the age recognition model, seven groups according to their ages are categorized. The model performance after selecting the relevant features to age achieved 75.3%. For further improvement a de-noising technique is used with the noisy speech signals, followed by selecting the proper features that are affected by the de-noising process and result in 81.44% recognition accuracy.

  13. Automatic Program Development

    Automatic Program Development is a tribute to Robert Paige (1947-1999), our accomplished and respected colleague, and moreover our good friend, whose untimely passing was a loss to our academic and research community. We have collected the revised, updated versions of the papers published in his...... honor in the Higher-Order and Symbolic Computation Journal in the years 2003 and 2005. Among them there are two papers by Bob: (i) a retrospective view of his research lines, and (ii) a proposal for future studies in the area of the automatic program derivation. The book also includes some papers by...... members of the IFIP Working Group 2.1 of which Bob was an active member. All papers are related to some of the research interests of Bob and, in particular, to the transformational development of programs and their algorithmic derivation from formal specifications. Automatic Program Development offers a...

  14. Semi-automatic knee cartilage segmentation

    Dam, Erik B.; Folkesson, Jenny; Pettersen, Paola C.; Christiansen, Claus

    2006-03-01

    Osteo-Arthritis (OA) is a very common age-related cause of pain and reduced range of motion. A central effect of OA is wear-down of the articular cartilage that otherwise ensures smooth joint motion. Quantification of the cartilage breakdown is central in monitoring disease progression and therefore cartilage segmentation is required. Recent advances allow automatic cartilage segmentation with high accuracy in most cases. However, the automatic methods still fail in some problematic cases. For clinical studies, even if a few failing cases will be averaged out in the overall results, this reduces the mean accuracy and precision and thereby necessitates larger/longer studies. Since the severe OA cases are often most problematic for the automatic methods, there is even a risk that the quantification will introduce a bias in the results. Therefore, interactive inspection and correction of these problematic cases is desirable. For diagnosis on individuals, this is even more crucial since the diagnosis will otherwise simply fail. We introduce and evaluate a semi-automatic cartilage segmentation method combining an automatic pre-segmentation with an interactive step that allows inspection and correction. The automatic step consists of voxel classification based on supervised learning. The interactive step combines a watershed transformation of the original scan with the posterior probability map from the classification step at sub-voxel precision. We evaluate the method for the task of segmenting the tibial cartilage sheet from low-field magnetic resonance imaging (MRI) of knees. The evaluation shows that the combined method allows accurate and highly reproducible correction of the segmentation of even the worst cases in approximately ten minutes of interaction.

  15. Spoken language and the decision to move the eyes: To what extent are language-mediated eye movements automatic?

    Mishra, R.; Olivers, C.; Huettig, F.

    2013-01-01

    Recent eye-tracking research has revealed that spoken language can guide eye gaze very rapidly (and closely time-locked to the unfolding speech) toward referents in the visual world. We discuss whether, and to what extent, such language-mediated eye movements are automatic rather than subject to conscious and controlled decision-making. We consider whether language-mediated eye movements adhere to four main criteria of automatic behavior, namely, whether they are fast and efficient, unintenti...

  16. Auditory short-term memory trace formation for nonspeech and speech in SLI and dyslexia as indexed by the N100 and mismatch negativity electrophysiological responses

    Tuomainen, O. T.

    2015-01-01

    This study investigates nonspeech and speech processing in specific language impairment (SLI) and dyslexia. We used a passive mismatch negativity (MMN) task to tap automatic brain responses and an active behavioural task to tap attended discrimination of nonspeech and speech sounds. Using the roving standard MMN paradigm, we varied the number of standards ('few' vs. 'many') to investigate the effect of sound repetition on N100 and MMN responses. The results revealed that the SLI group needed ...

  17. Automatic utilities auditing

    Smith, Colin Boughton [Energy Metering Technology (United Kingdom)

    2000-08-01

    At present, energy audits represent only snapshot situations of the flow of energy. The normal pattern of energy audits as seen through the eyes of an experienced energy auditor is described. A brief history of energy auditing is given. It is claimed that the future of energy auditing lies in automatic meter reading with expert data analysis providing continuous automatic auditing thereby reducing the skill element. Ultimately, it will be feasible to carry out auditing at intervals of say 30 minutes rather than five years.

  18. Automatic Camera Control

    Burelli, Paolo; Preuss, Mike

    2014-01-01

    Automatically generating computer animations is a challenging and complex problem with applications in games and film production. In this paper, we investigate howto translate a shot list for a virtual scene into a series of virtual camera configurations — i.e automatically controlling the virtual...... camera. We approach this problem by modelling it as a dynamic multi-objective optimisation problem and show how this metaphor allows a much richer expressiveness than a classical single objective approach. Finally, we showcase the application of a multi-objective evolutionary algorithm to generate a shot...

  19. Automatic text summarization

    Torres Moreno, Juan Manuel

    2014-01-01

    This new textbook examines the motivations and the different algorithms for automatic document summarization (ADS). We performed a recent state of the art. The book shows the main problems of ADS, difficulties and the solutions provided by the community. It presents recent advances in ADS, as well as current applications and trends. The approaches are statistical, linguistic and symbolic. Several exemples are included in order to clarify the theoretical concepts.  The books currently available in the area of Automatic Document Summarization are not recent. Powerful algorithms have been develop

  20. Lecturer’s Speech Competence

    Svetlana Viktorovna Panina

    2014-11-01

    Full Text Available The analysis of the issue of lecturer’s speech competence is presented. Lecturer’s speech competence is the main component of professional image, the indicator of communicative culture, having a great impact on the quality of pedagogical activity Research objective: to define the main drawbacks of speech competence of lecturers of North-Eastern Federal University named after M. K. Ammosov (NEFU (Russia, Yakutsk and suggest the ways of drawbacks corrections in terms of multilingual educational environment of higher education institution. The method of questionnaire was used in the research. The NEFU students took part in the research. The answers to the questionnaire allowed defining the most typical drawbacks for lecturers, working in the multicultural educational environment of region higher education institution. The mentioned drawbacks: words repetition, language rules breaking, wrong vocabulary or pronunciation of foreign words, use of colloquial language, etc. breaking the speech standards and decreasing the quality of lecture material presentation. The authors suggest improving lecturer’s speech competence through the organization of special advanced training courses, business games, discussion platforms, teaching aids and handbooks broadcasting, on-line tutorials, and motivated dialogue mastering as the most effective way of students training process organization.

  1. Recognition of Emotions in Mexican Spanish Speech: An Approach Based on Acoustic Modelling of Emotion-Specific Vowels

    Santiago-Omar Caballero-Morales

    2013-01-01

    Full Text Available An approach for the recognition of emotions in speech is presented. The target language is Mexican Spanish, and for this purpose a speech database was created. The approach consists in the phoneme acoustic modelling of emotion-specific vowels. For this, a standard phoneme-based Automatic Speech Recognition (ASR system was built with Hidden Markov Models (HMMs, where different phoneme HMMs were built for the consonants and emotion-specific vowels associated with four emotional states (anger, happiness, neutral, sadness. Then, estimation of the emotional state from a spoken sentence is performed by counting the number of emotion-specific vowels found in the ASR’s output for the sentence. With this approach, accuracy of 87–100% was achieved for the recognition of emotional state of Mexican Spanish speech.

  2. Effects of age on speech and voice quality ratings.

    Goy, Huiwen; Kathleen Pichora-Fuller, M; van Lieshout, Pascal

    2016-04-01

    The quality of communication may be affected by listeners' perception of talkers' characteristics. This study examined if there were effects of talker and listener age on the perception of speech and voice qualities. Younger and older listeners judged younger and older talkers' gender and age, then rated speech samples on pleasantness, naturalness, clarity, ease of understanding, loudness, and the talker's suitability to be an audiobook reader. For the same talkers, listeners also rated voice samples on pleasantness, roughness, and power. Younger and older talkers were perceived to be similar on most qualities except age. Younger and older listeners rated talkers similarly, except that younger listeners perceived younger voices to be more pleasant and less rough than older voices. For vowel samples, younger listeners were more accurate than older listeners at age estimation, while older listeners were more accurate than younger listeners at gender identification, suggesting that younger and older listeners differ in their evaluation of specific talker characteristics. Thus, the perception of quality was generally more affected by the age of the listener than the age of the talker, and age-related differences between listeners depended on whether voice or speech samples were used and the rating being made. PMID:27106312

  3. The treatment of apraxia of speech : Speech and music therapy, an innovative joint effort

    Hurkmans, Josephus Johannes Stephanus

    2016-01-01

    Apraxia of Speech (AoS) is a neurogenic speech disorder. A wide variety of behavioural methods have been developed to treat AoS. Various therapy programmes use musical elements to improve speech production. A unique therapy programme combining elements of speech therapy and music therapy is called S

  4. Predicting Speech Intelligibility with a Multiple Speech Subsystems Approach in Children with Cerebral Palsy

    Lee, Jimin; Hustad, Katherine C.; Weismer, Gary

    2014-01-01

    Purpose: Speech acoustic characteristics of children with cerebral palsy (CP) were examined with a multiple speech subsystems approach; speech intelligibility was evaluated using a prediction model in which acoustic measures were selected to represent three speech subsystems. Method: Nine acoustic variables reflecting different subsystems, and…

  5. Modeling speech intelligibility in adverse conditions

    Dau, Torsten

    the normal as well as impaired auditory system. Jørgensen and Dau [(2011). J. Acoust. Soc. Am. 130, 1475-1487] proposed the speech-based envelope power spectrum model (sEPSM) in an attempt to overcome the limitations of the classical speech transmission index (STI) and speech intelligibility index...... (SII) in conditions with nonlinearly processed speech. Instead of considering the reduction of the temporal modulation energy as the intelligibility metric, as assumed in the STI, the sEPSM applies the signal-to-noise ratio in the envelope domain (SNRenv). This metric was shown to be the key for...... predicting the intelligibility of reverberant speech as well as noisy speech processed by spectral subtraction. However, the sEPSM cannot account for speech subjected to phase jitter, a condition in which the spectral structure of speech is destroyed, while the broadband temporal envelope is kept largely...

  6. Speech Sound Disorders: Articulation and Phonological Processes

    ... an SLP to learn correct speech sounds. Some speech sound errors can result from physical problems, such as: developmental disorders (e.g.,autism) genetic syndromes (e.g., Down syndrome) hearing loss ...

  7. Speech perception and production in severe environments

    Pisoni, David B.

    1990-09-01

    The goal was to acquire new knowledge about speech perception and production in severe environments such as high masking noise, increased cognitive load or sustained attentional demands. Changes were examined in speech production under these adverse conditions through acoustic analysis techniques. One set of studies focused on the effects of noise on speech production. The experiments in this group were designed to generate a database of speech obtained in noise and in quiet. A second set of experiments was designed to examine the effects of cognitive load on the acoustic-phonetic properties of speech. Talkers were required to carry out a demanding perceptual motor task while they read lists of test words. A final set of experiments explored the effects of vocal fatigue on the acoustic-phonetic properties of speech. Both cognitive load and vocal fatigue are present in many applications where speech recognition technology is used, yet their influence on speech production is poorly understood.

  8. Experimental study on phase perception in speech

    BU Fanliang; CHEN Yanpu

    2003-01-01

    As the human ear is dull to the phase in speech, little attention has been paid tophase information in speech coding. In fact, the speech perceptual quality may be degeneratedif the phase distortion is very large. The perceptual effect of the STFT (Short time Fouriertransform) phase spectrum is studied by auditory subjective hearing tests. Three main con-clusions are (1) If the phase information is neglected completely, the subjective quality of thereconstructed speech may be very poor; (2) Whether the neglected phase is in low frequencyband or high frequency band, the difference from the original speech can be perceived by ear;(3) It is very difficult for the human ear to perceive the difference of speech quality betweenoriginal speech and reconstructed speech while the phase quantization step size is shorter thanπ/7.

  9. STUDY ON PHASE PERCEPTION IN SPEECH

    Tong Ming; Bian Zhengzhong; Li Xiaohui; Dai Qijun; Chen Yanpu

    2003-01-01

    The perceptual effect of the phase information in speech has been studied by auditorysubjective tests. On the condition that the phase spectrum in speech is changed while amplitudespectrum is unchanged, the tests show that: (1) If the envelop of the reconstructed speech signalis unchanged, there is indistinctive auditory perception between the original speech and thereconstructed speech; (2) The auditory perception effect of the reconstructed speech mainly lieson the amplitude of the derivative of the additive phase; (3) td is the maximum relative time shiftbetween different frequency components of the reconstructed speech signal. The speech qualityis excellent while td <10ms; good while 10ms< td <20ms; common while 20ms< td <35ms, andpoor while td >35ms.

  10. Speech and Language Problems in Children

    Children vary in their development of speech and language skills. Health professionals have milestones for what's normal. ... it may be due to a speech or language disorder. Language disorders can mean that the child ...

  11. Real-time automatic registration in optical surgical navigation

    Lin, Qinyong; Yang, Rongqian; Cai, Ken; Si, Xuan; Chen, Xiuwen; Wu, Xiaoming

    2016-05-01

    An image-guided surgical navigation system requires the improvement of the patient-to-image registration time to enhance the convenience of the registration procedure. A critical step in achieving this aim is performing a fully automatic patient-to-image registration. This study reports on a design of custom fiducial markers and the performance of a real-time automatic patient-to-image registration method using these markers on the basis of an optical tracking system for rigid anatomy. The custom fiducial markers are designed to be automatically localized in both patient and image spaces. An automatic localization method is performed by registering a point cloud sampled from the three dimensional (3D) pedestal model surface of a fiducial marker to each pedestal of fiducial markers searched in image space. A head phantom is constructed to estimate the performance of the real-time automatic registration method under four fiducial configurations. The head phantom experimental results demonstrate that the real-time automatic registration method is more convenient, rapid, and accurate than the manual method. The time required for each registration is approximately 0.1 s. The automatic localization method precisely localizes the fiducial markers in image space. The averaged target registration error for the four configurations is approximately 0.7 mm. The automatic registration performance is independent of the positions relative to the tracking system and the movement of the patient during the operation.

  12. Auditory plasticity and speech motor learning

    Nasir, Sazzad M.; Ostry, David J.

    2009-01-01

    Is plasticity in sensory and motor systems linked? Here, in the context of speech motor learning and perception, we test the idea sensory function is modified by motor learning and, in particular, that speech motor learning affects a speaker's auditory map. We assessed speech motor learning by using a robotic device that displaced the jaw and selectively altered somatosensory feedback during speech. We found that with practice speakers progressively corrected for the mechanical perturbation a...

  13. Learning Representations of Affect from Speech

    Ghosh, Sayan; Laksana, Eugene; Morency, Louis-Philippe; Scherer, Stefan

    2015-01-01

    There has been a lot of prior work on representation learning for speech recognition applications, but not much emphasis has been given to an investigation of effective representations of affect from speech, where the paralinguistic elements of speech are separated out from the verbal content. In this paper, we explore denoising autoencoders for learning paralinguistic attributes i.e. categorical and dimensional affective traits from speech. We show that the representations learnt by the bott...

  14. Review of Speech & Language Assessment Tests

    Mohadeseh Tarazani; Nasrin Keramati; Asma Sheikh Najdi; Nilofar Rastegarian; Shohreh Jalaei; Maryam Tarameshlo; Meisam Amid Far; Masomeh Radaei; Maryam Faghani Abokheili

    2010-01-01

    Background and Aim: The standard tests are tools for quantifying different aspects of speech & language abilities and communicative skills. However developing and applying standard tests is necessary for assessing, screening, defining the speech & language abilities, diagnosing the speech &language disorders and attending to outcomes of treatment process. Therefore in this study for providing general view of speech & language assessment tests, we reviewed some of tests related to some imp...

  15. Neural bases of accented speech perception

    Adank, Patti; Nuttall, Helen E.; Banks, Briony; Kennedy-Higgins, Daniel

    2015-01-01

    The recognition of unfamiliar regional and foreign accents represents a challenging task for the speech perception system (Floccia et al., 2006; Adank et al., 2009). Despite the frequency with which we encounter such accents, the neural mechanisms supporting successful perception of accented speech are poorly understood. Nonetheless, candidate neural substrates involved in processing speech in challenging listening conditions, including accented speech, are beginning to be identified. This re...

  16. Software Requirement Specification Using Reverse Speech Technology

    2014-01-01

    Speech analysis had been taken to a new level with the discovery of Reverse Speech (RS). RS is the discovery of hidden messages, referred as reversals, in normal speech. Works are in progress for exploiting the relevance of RS in different real world applications such as investigation, medical field etc. In this paper we represent an innovative method for preparing a reliable Software Requirement Specification (SRS) document with the help of reverse speech. As SRS act as the backbone for the ...

  17. Audio-Visual Tibetan Speech Recognition Based on a Deep Dynamic Bayesian Network for Natural Human Robot Interaction

    Yue Zhao

    2012-12-01

    Full Text Available Audio‐visual speech recognition is a natural and robust approach to improving human-robot interaction in noisy environments. Although multi‐stream Dynamic Bayesian Network and coupled HMM are widely used for audio‐visual speech recognition, they fail to learn the shared features between modalities and ignore the dependency of features among the frames within each discrete state. In this paper, we propose a Deep Dynamic Bayesian Network (DDBN to perform unsupervised extraction of spatial‐temporal multimodal features from Tibetan audio‐visual speech data and build an accurate audio‐visual speech recognition model under a no frame‐independency assumption. The experiment results on Tibetan speech data from some real‐world environments showed the proposed DDBN outperforms the state‐of‐art methods in word recognition accuracy.

  18. Human and automatic speaker recognition over telecommunication channels

    Fernández Gallardo, Laura

    2016-01-01

    This work addresses the evaluation of the human and the automatic speaker recognition performances under different channel distortions caused by bandwidth limitation, codecs, and electro-acoustic user interfaces, among other impairments. Its main contribution is the demonstration of the benefits of communication channels of extended bandwidth, together with an insight into how speaker-specific characteristics of speech are preserved through different transmissions. It provides sufficient motivation for considering speaker recognition as a criterion for the migration from narrowband to enhanced bandwidths, such as wideband and super-wideband.

  19. Automatic Dance Lesson Generation

    Yang, Yang; Leung, H.; Yue, Lihua; Deng, LiQun

    2012-01-01

    In this paper, an automatic lesson generation system is presented which is suitable in a learning-by-mimicking scenario where the learning objects can be represented as multiattribute time series data. The dance is used as an example in this paper to illustrate the idea. Given a dance motion sequence as the input, the proposed lesson generation…

  20. Automatic Complexity Analysis

    Rosendahl, Mads

    1989-01-01

    One way to analyse programs is to to derive expressions for their computational behaviour. A time bound function (or worst-case complexity) gives an upper bound for the computation time as a function of the size of input. We describe a system to derive such time bounds automatically using abstract...

  1. Evaluating the benefit of recorded early reflections from a classroom for speech intelligibility

    Larsen, Jeffery B.

    Recent standards for classrooms acoustics recommend achieving low levels of reverberation to provide suitable conditions for speech communication (ANSI, 2002; ASHA, 1995). Another viewpoint recommends optimizing classroom acoustics to emphasize early reflections and reduce later arriving reflections (Boothroyd, 2004; Bradley, Sato, & Picard, 2003). The idea of emphasizing early reflections is based in the useful-to-detrimental ratio (UDR) model of speech intelligibility in rooms (Lochner & Burger, 1964). The UDR model predicts that listeners integrate energy from early reflections to improve the signal-to-noise (SNR) of the direct speech signal. However, both early and more recent studies of early reflections and speech intelligibility have used simulated reflections that may not accurately represent the effects of real early reflections on the speech intelligibility of listeners. Is speech intelligibility performance enhanced by the presence of real early reflections in noisy classroom environments? The effect of actual early reflections on speech intelligibility was evaluated by recording a binaural impulse response (BRIR) with a K.E.M.A.R. in a college classroom. From the BRIR, five listening conditions were created with varying amounts of early reflections. Young-adult listeners with normal hearing participated in a fixed SNR word intelligibility task and a variable SNR task to test if speech intelligibility was improved in competing noise when recorded early reflections were present as compared to direct speech alone. Mean speech intelligibility performance gains or SNR benefits were not observed with recorded early reflections. When simulated early reflections were included, improved speech understanding was observed for simulated reflections but for with real reflections. Spectral, temporal, and phonemic analyses were performed to investigate acoustic differences in recorded and simulated reflections. Spectral distortions in the recorded reflections may have

  2. An RBFN-based system for speaker-independent speech recognition

    Huliehel, Fakhralden A.

    1995-01-01

    A speaker-independent isolated-word small vocabulary system is developed for applications such as voice-driven menu systems. The design of a cascade of recognition layers is presented. Several feature sets are compared. Phone recognition is performed using a radial basis function network (RBFN). Dynamic time warping (DTW) is used for word recognition. The TIMIT database is used to design and test the automatic speech recognition (ASR) system. Several feature sets using mel-s...

  3. Degrees of freedom of facial movements in face-to-face conversational speech

    Bailly, Gérard; Elisei, Frédéric,; Badin, Pierre; Savariaux, Christophe

    2006-01-01

    International audience In this paper we analyze the degrees of freedom (DoF) of facial movements in face-to-face conversation. We propose here a method for automatically selecting expressive frames in a large fine-grained motion capture corpus that best complement an initial shape model built using neutral speech. Using conversational data from one speaker, we extract 11 DoF that reconstruct facial deformations with a average precision less than a millimeter. Gestural scores are then built...

  4. Automatic Lipreading in the Dutch Language

    Wojdel, J.C.

    2003-01-01

    This thesis deals with many aspects of the bimodal speech processing research. It lays out the general framework of visually enhanced speech processing computer systems together with some insight in the human speech bimodal speech perception. There are three main contributions to the field of bimoda

  5. Hate Speech and the First Amendment.

    Rainey, Susan J.; Kinsler, Waren S.; Kannarr, Tina L.; Reaves, Asa E.

    This document is comprised of California state statutes, federal legislation, and court litigation pertaining to hate speech and the First Amendment. The document provides an overview of California education code sections relating to the regulation of speech; basic principles of the First Amendment; government efforts to regulate hate speech,…

  6. Liberalism, Speech Codes, and Related Problems.

    Sunstein, Cass R.

    1993-01-01

    It is argued that universities are pervasively and necessarily engaged in regulation of speech, which complicates many existing claims about hate speech codes on campus. The ultimate test is whether the restriction on speech is a legitimate part of the institution's mission, commitment to liberal education. (MSE)

  7. Hate Speech on Campus: A Practical Approach.

    Hogan, Patrick

    1997-01-01

    Looks at arguments concerning hate speech and speech codes on college campuses, arguing that speech codes are likely to be of limited value in achieving civil rights objectives, and that there are alternatives less harmful to civil liberties and more successful in promoting civil rights. Identifies specific goals, and considers how restriction of…

  8. Characteristics of Speech Motor Development in Children.

    Ostry, David J.; And Others

    1984-01-01

    Pulsed ultrasound was used to study tongue movements in the speech of children from 3 to 11 years of age. Speech data attained were characteristic of systems that can be described by second-order differential equations. Relationships observed in these systems may indicate that speech control involves tonic and phasic muscle inputs. (Author/RH)

  9. Interventions for Speech Sound Disorders in Children

    Williams, A. Lynn, Ed.; McLeod, Sharynne, Ed.; McCauley, Rebecca J., Ed.

    2010-01-01

    With detailed discussion and invaluable video footage of 23 treatment interventions for speech sound disorders (SSDs) in children, this textbook and DVD set should be part of every speech-language pathologist's professional preparation. Focusing on children with functional or motor-based speech disorders from early childhood through the early…

  10. Factors of Politeness and Indirect Speech Acts

    杨雪梅

    2016-01-01

    Polite principle is influenced deeply by a nation's history,culture,custom and so on,therefor different countries have different understandings and expressions of politeness and indirect speech acts.This paper shows some main factors influencing a polite speech.Through this article,readers can comprehensively know about politeness and indirect speech acts.

  11. Application of wavelets in speech processing

    Farouk, Mohamed Hesham

    2014-01-01

    This book provides a survey on wide-spread of employing wavelets analysis  in different applications of speech processing. The author examines development and research in different application of speech processing. The book also summarizes the state of the art research on wavelet in speech processing.

  12. Acoustics of Clear Speech: Effect of Instruction

    Lam, Jennifer; Tjaden, Kris; Wilding, Greg

    2012-01-01

    Purpose: This study investigated how different instructions for eliciting clear speech affected selected acoustic measures of speech. Method: Twelve speakers were audio-recorded reading 18 different sentences from the Assessment of Intelligibility of Dysarthric Speech (Yorkston & Beukelman, 1984). Sentences were produced in habitual, clear,…

  13. Discriminative learning for speech recognition

    He, Xiadong

    2008-01-01

    In this book, we introduce the background and mainstream methods of probabilistic modeling and discriminative parameter optimization for speech recognition. The specific models treated in depth include the widely used exponential-family distributions and the hidden Markov model. A detailed study is presented on unifying the common objective functions for discriminative learning in speech recognition, namely maximum mutual information (MMI), minimum classification error, and minimum phone/word error. The unification is presented, with rigorous mathematical analysis, in a common rational-functio

  14. Reproducible Research in Speech Sciences

    Kandaacute;lmandaacute;n Abari

    2012-11-01

    Full Text Available Reproducible research is the minimum standard of scientific claims in cases when independent replication proves to be difficult. With the special combination of available software tools, we provide a reproducibility recipe for the experimental research conducted in some fields of speech sciences. We have based our model on the triad of the R environment, the EMU-format speech database, and the executable publication. We present the use of three typesetting systems (LaTeX, Markdown, Org, with the help of a mini research.

  15. Assessing a speaker for fast speech in unit selection speech synthesis

    Moers, Donata; Wagner, Petra

    2009-01-01

    This paper describes work in progress concerning the ad- equate modeling of fast speech in unit selection speech synthesis systems, mostly having in mind blind and visually impaired users. Initially, a survey of the main characteristics of fast speech will be given. Subsequently, strategies for fast speech production will be discussed. Certain requirements concerning the ability of a speaker of a fast speech unit selection inventory are drawn. The following section deals with a perception ...

  16. The treatment of apraxia of speech: Speech and music therapy, an innovative joint effort

    Hurkmans, Josephus Johannes Stephanus

    2016-01-01

    Apraxia of Speech (AoS) is a neurogenic speech disorder. A wide variety of behavioural methods have been developed to treat AoS. Various therapy programmes use musical elements to improve speech production. A unique therapy programme combining elements of speech therapy and music therapy is called Speech-Music Therapy for Aphasia (SMTA). In clinical practice, patients with AoS have experienced positive outcomes of SMTA; however, there was no evidence of this treatment’s effectiveness. This th...

  17. ANPS - AUTOMATIC NETWORK PROGRAMMING SYSTEM

    Schroer, B. J.

    1994-01-01

    Development of some of the space program's large simulation projects -- like the project which involves simulating the countdown sequence prior to spacecraft liftoff -- requires the support of automated tools and techniques. The number of preconditions which must be met for a successful spacecraft launch and the complexity of their interrelationship account for the difficulty of creating an accurate model of the countdown sequence. Researchers developed ANPS for the Nasa Marshall Space Flight Center to assist programmers attempting to model the pre-launch countdown sequence. Incorporating the elements of automatic programming as its foundation, ANPS aids the user in defining the problem and then automatically writes the appropriate simulation program in GPSS/PC code. The program's interactive user dialogue interface creates an internal problem specification file from user responses which includes the time line for the countdown sequence, the attributes for the individual activities which are part of a launch, and the dependent relationships between the activities. The program's automatic simulation code generator receives the file as input and selects appropriate macros from the library of software modules to generate the simulation code in the target language GPSS/PC. The user can recall the problem specification file for modification to effect any desired changes in the source code. ANPS is designed to write simulations for problems concerning the pre-launch activities of space vehicles and the operation of ground support equipment and has potential for use in developing network reliability models for hardware systems and subsystems. ANPS was developed in 1988 for use on IBM PC or compatible machines. The program requires at least 640 KB memory and one 360 KB disk drive, PC DOS Version 2.0 or above, and GPSS/PC System Version 2.0 from Minuteman Software. The program is written in Turbo Prolog Version 2.0. GPSS/PC is a trademark of Minuteman Software. Turbo Prolog

  18. Speech intelligibility of native and non-native speech

    Wijngaarden, S.J. van

    1999-01-01

    The intelligibility of speech is known to be lower if the talker is non-native instead of native for the given language. This study is aimed at quantifying the overall degradation due to acoustic-phonetic limitations of non-native talkers of Dutch, specifically of Dutch-speaking Americans who have l

  19. Development of an automatic pipeline scanning system

    Kim, Jae H.; Lee, Jae C.; Moon, Soon S.; Eom, Heung S.; Choi, Yu R

    1999-11-01

    Pressure pipe inspection in nuclear power plants is one of the mandatory regulation items. Comparing to manual ultrasonic inspection, automatic inspection has the benefits of more accurate and reliable inspection results and reduction of radiation disposal. final object of this project is to develop an automatic pipeline inspection system of pressure pipe welds in nuclear power plants. We developed a pipeline scanning robot with four magnetic wheels and 2-axis manipulator for controlling ultrasonic transducers, and developed the robot control computer which controls the robot to navigate along inspection path exactly. We expect our system can contribute to reduction of inspection time, performance enhancement, and effective management of inspection results. The system developed by this project can be practically used for inspection works after field tests. (author)

  20. Automatic differentiation and lattice function matching

    Although popularized in accelerator physics for the calculation of the Taylor coefficients of phase space maps, automatic differentiation is a generic and powerful technique applicable to any problem where fast and accurate evaluation of (usually first order) derivatives is needed. Problems requiring the evaluation of sensitivities with respect to parameters of a model or the minimization of an error function permeate applications in accelerator physics. The advent of languages with built-in support for operator overloading-such as C++-results in greatly simplified syntax and highly maintainable software. To illustrate this point in practice, we apply automatic differentiation techniques to linear lattice function matching, an important problem encountered in the context of designing insertions or transfer lines between accelerators

  1. Automatic modulation recognition of communication signals

    Azzouz, Elsayed Elsayed

    1996-01-01

    Automatic modulation recognition is a rapidly evolving area of signal analysis. In recent years, interest from the academic and military research institutes has focused around the research and development of modulation recognition algorithms. Any communication intelligence (COMINT) system comprises three main blocks: receiver front-end, modulation recogniser and output stage. Considerable work has been done in the area of receiver front-ends. The work at the output stage is concerned with information extraction, recording and exploitation and begins with signal demodulation, that requires accurate knowledge about the signal modulation type. There are, however, two main reasons for knowing the current modulation type of a signal; to preserve the signal information content and to decide upon the suitable counter action, such as jamming. Automatic Modulation Recognition of Communications Signals describes in depth this modulation recognition process. Drawing on several years of research, the authors provide a cr...

  2. Text To Speech System for Telugu Language

    Siva kumar, M; E. Prakash Babu

    2014-01-01

    Telugu is one of the oldest languages in India. This paper describes the development of Telugu Text-to-Speech System (TTS).In Telugu TTS the input is Telugu text in Unicode. The voices are sampled from real recorded speech. The objective of a text to speech system is to convert an arbitrary text into its corresponding spoken waveform. Speech synthesis is a process of building machinery that can generate human-like speech from any text input to imitate human speakers. Text proc...

  3. Recent Advances in Robust Speech Recognition Technology

    Ramírez, Javier

    2011-01-01

    This E-book is a collection of articles that describe advances in speech recognition technology. Robustness in speech recognition refers to the need to maintain high speech recognition accuracy even when the quality of the input speech is degraded, or when the acoustical, articulate, or phonetic characteristics of speech in the training and testing environments differ. Obstacles to robust recognition include acoustical degradations produced by additive noise, the effects of linear filtering, nonlinearities in transduction or transmission, as well as impulsive interfering sources, and diminishe

  4. Speech in Mobile and Pervasive Environments

    Rajput, Nitendra

    2012-01-01

    This book brings together the latest research in one comprehensive volume that deals with issues related to speech processing on resource-constrained, wireless, and mobile devices, such as speech recognition in noisy environments, specialized hardware for speech recognition and synthesis, the use of context to enhance recognition, the emerging and new standards required for interoperability, speech applications on mobile devices, distributed processing between the client and the server, and the relevance of Speech in Mobile and Pervasive Environments for developing regions--an area of explosiv

  5. Speech perception of noise with binary gains

    Wang, DeLiang; Kjems, Ulrik; Pedersen, Michael Syskind;

    2008-01-01

    For a given mixture of speech and noise, an ideal binary time-frequency mask is constructed by comparing speech energy and noise energy within local time-frequency units. It is observed that listeners achieve nearly perfect speech recognition from gated noise with binary gains prescribed by the i...... by the ideal binary mask. Only 16 filter channels and a frame rate of 100 Hz are sufficient for high intelligibility. The results show that, despite a dramatic reduction of speech information, a pattern of binary gains provides an adequate basis for speech perception....

  6. Perceived Speech Quality Estimation Using DTW Algorithm

    S. Arsenovski

    2009-06-01

    Full Text Available In this paper a method for speech quality estimation is evaluated by simulating the transfer of speech over packet switched and mobile networks. The proposed system uses Dynamic Time Warping algorithm for test and received speech comparison. Several tests have been made on a test speech sample of a single speaker with simulated packet (frame loss effects on the perceived speech. The achieved results have been compared with measured PESQ values on the used transmission channel and their correlation has been observed.

  7. Automatic indexing, compiling and classification

    A review of the principles of automatic indexing, is followed by a comparison and summing-up of work by the authors and by a Soviet staff from the Moscou INFORM-ELECTRO Institute. The mathematical and linguistic problems of the automatic building of thesaurus and automatic classification are examined

  8. Embedding speech into virtual realities

    Bohn, Christian-Arved; Krueger, Wolfgang

    1993-05-01

    In this work a speaker-independent speech recognition system is presented, which is suitable for implementation in Virtual Reality applications. The use of an artificial neural network in connection with a special compression of the acoustic input leads to a system, which is robust, fast, easy to use and needs no additional hardware, beside a common VR-equipment.

  9. Paraconsistent semantics of speech acts

    Dunin-Kȩplicz, Barbara; Strachocka, Alina; Szałas, Andrzej; Verbrugge, Rineke

    2015-01-01

    This paper discusses an implementation of four speech acts: assert, concede, request and challenge in a paraconsistent framework. A natural four-valued model of interaction yields multiple new cognitive situations. They are analyzed in the context of communicative relations, which partially replace

  10. The Ontogenesis of Speech Acts

    Bruner, Jerome S.

    1975-01-01

    A speech act approach to the transition from pre-linguistic to linguistic communication is adopted in order to consider language in relation to behavior and to allow for an emphasis on the use, rather than the form, of language. A pilot study of mothers and infants is discussed. (Author/RM)

  11. Prosodic Contrasts in Ironic Speech

    Bryant, Gregory A.

    2010-01-01

    Prosodic features in spontaneous speech help disambiguate implied meaning not explicit in linguistic surface structure, but little research has examined how these signals manifest themselves in real conversations. Spontaneously produced verbal irony utterances generated between familiar speakers in conversational dyads were acoustically analyzed…

  12. On speech recognition during anaesthesia

    Alapetite, Alexandre

    2007-01-01

    This PhD thesis in human-computer interfaces (informatics) studies the case of the anaesthesia record used during medical operations and the possibility to supplement it with speech recognition facilities. Problems and limitations have been identified with the traditional paper-based anaesthesia ...... accuracy. Finally, the last part of the thesis looks at the acceptance and success of a speech recognition system introduced in a Danish hospital to produce patient records.......This PhD thesis in human-computer interfaces (informatics) studies the case of the anaesthesia record used during medical operations and the possibility to supplement it with speech recognition facilities. Problems and limitations have been identified with the traditional paper-based anaesthesia...... inaccuracies in the anaesthesia record. Supplementing the electronic anaesthesia record interface with speech input facilities is proposed as one possible solution to a part of the problem. The testing of the various hypotheses has involved the development of a prototype of an electronic anaesthesia record...

  13. The Use of Artificial Neural Networks to Estimate Speech Intelligibility from Acoustic Variables: A Preliminary Analysis.

    Metz, Dale Evan; And Others

    1992-01-01

    A preliminary scheme for estimating the speech intelligibility of hearing-impaired speakers from acoustic parameters, using a computerized artificial neural network to process mathematically the acoustic input variables, is outlined. Tests with 60 hearing-impaired speakers found the scheme to be highly accurate in identifying speakers separated by…

  14. Signal-to-Signal Ratio Independent Speaker Identification for Co-channel Speech Signals

    Saeidi, Rahim; Mowlaee, Pejman; Kinnunen, Tomi;

    2010-01-01

    In this paper, we consider speaker identification for the co-channel scenario in which speech mixture from speakers is recorded by one microphone only. The goal is to identify both of the speakers from their mixed signal. High recognition accuracies have already been reported when an accurately e...... target speakers enlisted as three and two most probable speakers, respectively....

  15. Effect of talker and speaking style on the Speech Transmission Index (L)

    Wijngaarden, S.J. van; Houtgast, T.

    2004-01-01

    The Speech Transmission Index (STI) is routinely applied for predicting the intelligibility of messages (sentences) in noise and reverberation. Despite clear evidence that the STI is capable of doing so accurately, recent results indicate that the STI sometimes underestimates the effect of reverbera

  16. Image Based Hair Segmentation Algorithm for the Application of Automatic Facial Caricature Synthesis

    2014-01-01

    Hair is a salient feature in human face region and are one of the important cues for face analysis. Accurate detection and presentation of hair region is one of the key components for automatic synthesis of human facial caricature. In this paper, an automatic hair detection algorithm for the application of automatic synthesis of facial caricature based on a single image is proposed. Firstly, hair regions in training images are labeled manually and then the hair position prior distributions an...

  17. A Decision-Tree-Based Algorithm for Speech/Music Classification and Segmentation

    Lavner Yizhar

    2009-01-01

    Full Text Available We present an efficient algorithm for segmentation of audio signals into speech or music. The central motivation to our study is consumer audio applications, where various real-time enhancements are often applied. The algorithm consists of a learning phase and a classification phase. In the learning phase, predefined training data is used for computing various time-domain and frequency-domain features, for speech and music signals separately, and estimating the optimal speech/music thresholds, based on the probability density functions of the features. An automatic procedure is employed to select the best features for separation. In the test phase, initial classification is performed for each segment of the audio signal, using a three-stage sieve-like approach, applying both Bayesian and rule-based methods. To avoid erroneous rapid alternations in the classification, a smoothing technique is applied, averaging the decision on each segment with past segment decisions. Extensive evaluation of the algorithm, on a database of more than 12 hours of speech and more than 22 hours of music showed correct identification rates of 99.4% and 97.8%, respectively, and quick adjustment to alternating speech/music sections. In addition to its accuracy and robustness, the algorithm can be easily adapted to different audio types, and is suitable for real-time operation.

  18. The automatic NMR gaussmeter

    The paper describes the automatic gaussmeter operating according to the principle of nuclear magnetic resonance. There have been discussed the operating principle, the block diagram and operating parameters of the meter. It can be applied to measurements of induction in electromagnets of wide-line radio-spectrometers EPR and NMR and in calibration stands of magnetic induction values. Frequency range of an autodyne oscillator from 0,6 up to 86 MHz for protons is corresponding to the field range from 0.016 up to 2T. Applicaton of other nuclei, such as 7Li and 2D is also foreseen. The induction measurement is carried over automatically, and the NMR signal and value of measured induction are displayed on a monitor screen. (author)

  19. Automatic trend estimation

    Vamos¸, C˘alin

    2013-01-01

    Our book introduces a method to evaluate the accuracy of trend estimation algorithms under conditions similar to those encountered in real time series processing. This method is based on Monte Carlo experiments with artificial time series numerically generated by an original algorithm. The second part of the book contains several automatic algorithms for trend estimation and time series partitioning. The source codes of the computer programs implementing these original automatic algorithms are given in the appendix and will be freely available on the web. The book contains clear statement of the conditions and the approximations under which the algorithms work, as well as the proper interpretation of their results. We illustrate the functioning of the analyzed algorithms by processing time series from astrophysics, finance, biophysics, and paleoclimatology. The numerical experiment method extensively used in our book is already in common use in computational and statistical physics.

  20. Modelling speech intelligibility in adverse conditions

    Jørgensen, Søren; Dau, Torsten

    2013-01-01

    index (STMI) (Elhilali et al., Speech Commun 41:331-348, 2003), which assumes an explicit analysis of the spectral "ripple" structure of the speech signal. However, since the STMI applies the same decision metric as the STI, it fails to account for spectral subtraction. The results from this study....... Instead of considering the reduction of the temporal modulation energy as the intelligibility metric, as assumed in the STI, the sEPSM applies the signal-to-noise ratio in the envelope domain (SNRenv). This metric was shown to be the key for predicting the intelligibility of reverberant speech as well as...... noisy speech processed by spectral subtraction. The key role of the SNRenv metric is further supported here by the ability of a short-term version of the sEPSM to predict speech masking release for different speech materials and modulated interferers. However, the sEPSM cannot account for speech...

  1. Optimal Wavelets for Speech Signal Representations

    Shonda L. Walker

    2003-08-01

    Full Text Available It is well known that in many speech processing applications, speech signals are characterized by their voiced and unvoiced components. Voiced speech components contain dense frequency spectrum with many harmonics. The periodic or semi-periodic nature of voiced signals lends itself to Fourier Processing. Unvoiced speech contains many high frequency components and thus resembles random noise. Several methods for voiced and unvoiced speech representations that utilize wavelet processing have been developed. These methods seek to improve the accuracy of wavelet-based speech signal representations using adaptive wavelet techniques, superwavelets, which uses a linear combination of adaptive wavelets, gaussian methods and a multi-resolution sinusoidal transform approach to mention a few. This paper addresses the relative performance of these wavelet methods and evaluates the usefulness of wavelet processing in speech signal representations. In addition, this paper will also address some of the hardware considerations for the wavelet methods presented.

  2. Automatic Wall Painting Robot

    P.KEERTHANAA, K.JEEVITHA, V.NAVINA, G.INDIRA, S.JAYAMANI

    2013-01-01

    The Primary Aim Of The Project Is To Design, Develop And Implement Automatic Wall Painting Robot Which Helps To Achieve Low Cost Painting Equipment. Despite The Advances In Robotics And Its Wide Spreading Applications, Interior Wall Painting Has Shared Little In Research Activities. The Painting Chemicals Can Cause Hazards To The Human Painters Such As Eye And Respiratory System Problems. Also The Nature Of Painting Procedure That Requires Repeated Work And Hand Rising Makes It Boring, Time A...

  3. Automatic Program Reports

    Lígia Maria da Silva Ribeiro; Gabriel de Sousa Torcato David

    2007-01-01

    To profit from the data collected by the SIGARRA academic IS, a systematic setof graphs and statistics has been added to it and are available on-line. Thisanalytic information can be automatically included in a flexible yearly report foreach program as well as in a synthesis report for the whole school. Somedifficulties in the interpretation of some graphs led to the definition of new keyindicators and the development of a data warehouse across the university whereeffective data consolidation...

  4. Automatic Inductive Programming Tutorial

    Aler, Ricardo

    2006-01-01

    Computers that can program themselves is an old dream of Artificial Intelligence, but only nowadays there is some progress of remark. In relation to Machine Learning, a computer program is the most powerful structure that can be learned, pushing the final goal well beyond neural networks or decision trees. There are currently many separate areas, working independently, related to automatic programming, both deductive and inductive. The first goal of this tutorial is to give to the attendants ...

  5. Automatic food decisions

    Mueller Loose, Simone

    Consumers' food decisions are to a large extent shaped by automatic processes, which are either internally directed through learned habits and routines or externally influenced by context factors and visual information triggers. Innovative research methods such as eye tracking, choice experiments...... and food diaries allow us to better understand the impact of unconscious processes on consumers' food choices. Simone Mueller Loose will provide an overview of recent research insights into the effects of habit and context on consumers' food choices....

  6. Automatic Differentiation Variational Inference

    Kucukelbir, Alp; Tran, Dustin; Ranganath, Rajesh; Gelman, Andrew; Blei, David M.

    2016-01-01

    Probabilistic modeling is iterative. A scientist posits a simple model, fits it to her data, refines it according to her analysis, and repeats. However, fitting complex models to large data is a bottleneck in this process. Deriving algorithms for new models can be both mathematically and computationally challenging, which makes it difficult to efficiently cycle through the steps. To this end, we develop automatic differentiation variational inference (ADVI). Using our method, the scientist on...

  7. Automaticity or active control

    Tudoran, Ana Alina; Olsen, Svein Ottar

    This study addresses the quasi-moderating role of habit strength in explaining action loyalty. A model of loyalty behaviour is proposed that extends the traditional satisfaction–intention–action loyalty network. Habit strength is conceptualised as a cognitive construct to refer to the psychologic......, respectively, between intended loyalty and action loyalty. At high levels of habit strength, consumers are more likely to free up cognitive resources and incline the balance from controlled to routine and automatic-like responses....

  8. Automatic digital image registration

    Goshtasby, A.; Jain, A. K.; Enslin, W. R.

    1982-01-01

    This paper introduces a general procedure for automatic registration of two images which may have translational, rotational, and scaling differences. This procedure involves (1) segmentation of the images, (2) isolation of dominant objects from the images, (3) determination of corresponding objects in the two images, and (4) estimation of transformation parameters using the center of gravities of objects as control points. An example is given which uses this technique to register two images which have translational, rotational, and scaling differences.

  9. Automatic image classification for the urinoculture screening.

    Andreini, Paolo; Bonechi, Simone; Bianchini, Monica; Garzelli, Andrea; Mecocci, Alessandro

    2016-03-01

    Urinary tract infections (UTIs) are considered to be the most common bacterial infection and, actually, it is estimated that about 150 million UTIs occur world wide yearly, giving rise to roughly $6 billion in healthcare expenditures and resulting in 100,000 hospitalizations. Nevertheless, it is difficult to carefully assess the incidence of UTIs, since an accurate diagnosis depends both on the presence of symptoms and on a positive urinoculture, whereas in most outpatient settings this diagnosis is made without an ad hoc analysis protocol. On the other hand, in the traditional urinoculture test, a sample of midstream urine is put onto a Petri dish, where a growth medium favors the proliferation of germ colonies. Then, the infection severity is evaluated by a visual inspection of a human expert, an error prone and lengthy process. In this paper, we propose a fully automated system for the urinoculture screening that can provide quick and easily traceable results for UTIs. Based on advanced image processing and machine learning tools, the infection type recognition, together with the estimation of the bacterial load, can be automatically carried out, yielding accurate diagnoses. The proposed AID (Automatic Infection Detector) system provides support during the whole analysis process: first, digital color images of Petri dishes are automatically captured, then specific preprocessing and spatial clustering algorithms are applied to isolate the colonies from the culture ground and, finally, an accurate classification of the infections and their severity evaluation are performed. The AID system speeds up the analysis, contributes to the standardization of the process, allows result repeatability, and reduces the costs. Moreover, the continuous transition between sterile and external environments (typical of the standard analysis procedure) is completely avoided. PMID:26780249

  10. Speech Enhancement with Natural Sounding Residual Noise Based on Connected Time-Frequency Speech Presence Regions

    Sørensen Karsten Vandborg

    2005-01-01

    Full Text Available We propose time-frequency domain methods for noise estimation and speech enhancement. A speech presence detection method is used to find connected time-frequency regions of speech presence. These regions are used by a noise estimation method and both the speech presence decisions and the noise estimate are used in the speech enhancement method. Different attenuation rules are applied to regions with and without speech presence to achieve enhanced speech with natural sounding attenuated background noise. The proposed speech enhancement method has a computational complexity, which makes it feasible for application in hearing aids. An informal listening test shows that the proposed speech enhancement method has significantly higher mean opinion scores than minimum mean-square error log-spectral amplitude (MMSE-LSA and decision-directed MMSE-LSA.

  11. Speech Evaluation with Special Focus on Children Suffering from Apraxia of Speech

    Manasi Dixit

    2013-07-01

    Full Text Available Speech disorders are very complicated in individuals suffering from Apraxia of Speech-AOS. In this paper ,the pathological cases of speech disabled children affected with AOS are analyzed. The speech signalsamples of childrenSpeech disorders are very complicated in individuals suffering from Apraxia of Speech-AOS. In this paper ,the pathological cases of speech disabled children affected with AOS are analyzed. The speech signalsamples of children of age between three to eight years are considered for the present study. These speechsignals are digitized and enhanced using the using the Speech Pause Index, Jitter,Skew ,Kurtosis analysisThis analysis is conducted on speech data samples which are concerned with both place of articulation andmanner of articulation. The speech disability of pathological subjects was estimated using results of aboveanalysis. of age between three to eight years are considered for the present study. These speechsignals are digitized and enhanced using the using the Speech Pause Index, Jitter,Skew ,Kurtosis analysisThis analysis is conducted on speech data samples which are concerned with both place of articulation andmanner of articulation. The speech disability of pathological subjects was estimated using results of aboveanalysis.

  12. Automatic bootstrapping of a morphable face model using multiple components

    Haar, F.B. ter; Veltkamp, R.C.

    2009-01-01

    We present a new bootstrapping algorithm to automatically enhance a 3D morphable face model with new face data. Our algorithm is based on a morphable model fitting method that uses a set of predefined face components. This fitting method produces accurate model fits to 3D face data with noise and ho

  13. Automatic 3D Modeling of the Urban Landscape

    Esteban, I.; Dijk, J.; Groen, F.A.

    2010-01-01

    In this paper we present a fully automatic system for building 3D models of urban areas at the street level. We propose a novel approach for the accurate estimation of the scale consistent camera pose given two previous images. We employ a new method for global optimization and use a novel sampling

  14. Automatic classification of trees from laser scanning point clouds

    Sirmacek, B.; Lindenbergh, R.C.

    2015-01-01

    Development of laser scanning technologies has promoted tree monitoring studies to a new level, as the laser scanning point clouds enable accurate 3D measurements in a fast and environmental friendly manner. In this paper, we introduce a probability matrix computation based algorithm for automatical

  15. Towards accurate emergency response behavior

    Nuclear reactor operator emergency response behavior has persisted as a training problem through lack of information. The industry needs an accurate definition of operator behavior in adverse stress conditions, and training methods which will produce the desired behavior. Newly assembled information from fifty years of research into human behavior in both high and low stress provides a more accurate definition of appropriate operator response, and supports training methods which will produce the needed control room behavior. The research indicates that operator response in emergencies is divided into two modes, conditioned behavior and knowledge based behavior. Methods which assure accurate conditioned behavior, and provide for the recovery of knowledge based behavior, are described in detail

  16. Automatic differentiation for reduced sequential quadratic programming

    Liao Liangcai; Li Jin; Tan Yuejin

    2007-01-01

    In order to slove the large-scale nonlinear programming (NLP) problems efficiently, an efficient optimization algorithm based on reduced sequential quadratic programming (rSQP) and automatic differentiation (AD) is presented in this paper. With the characteristics of sparseness, relatively low degrees of freedom and equality constraints utilized, the nonlinear programming problem is solved by improved rSQP solver. In the solving process, AD technology is used to obtain accurate gradient information. The numerical results show that the combined algorithm, which is suitable for large-scale process optimization problems, can calculate more efficiently than rSQP itself.

  17. Accurate determination of antenna directivity

    Dich, Mikael

    1997-01-01

    The derivation of a formula for accurate estimation of the total radiated power from a transmitting antenna for which the radiated power density is known in a finite number of points on the far-field sphere is presented. The main application of the formula is determination of directivity from power...

  18. Speech and language delay in children.

    McLaughlin, Maura R

    2011-05-15

    Speech and language delay in children is associated with increased difficulty with reading, writing, attention, and socialization. Although physicians should be alert to parental concerns and to whether children are meeting expected developmental milestones, there currently is insufficient evidence to recommend for or against routine use of formal screening instruments in primary care to detect speech and language delay. In children not meeting the expected milestones for speech and language, a comprehensive developmental evaluation is essential, because atypical language development can be a secondary characteristic of other physical and developmental problems that may first manifest as language problems. Types of primary speech and language delay include developmental speech and language delay, expressive language disorder, and receptive language disorder. Secondary speech and language delays are attributable to another condition such as hearing loss, intellectual disability, autism spectrum disorder, physical speech problems, or selective mutism. When speech and language delay is suspected, the primary care physician should discuss this concern with the parents and recommend referral to a speech-language pathologist and an audiologist. There is good evidence that speech-language therapy is helpful, particularly for children with expressive language disorder. PMID:21568252

  19. Sensorimotor influences on speech perception in infancy.

    Bruderer, Alison G; Danielson, D Kyle; Kandhadai, Padmapriya; Werker, Janet F

    2015-11-01

    The influence of speech production on speech perception is well established in adults. However, because adults have a long history of both perceiving and producing speech, the extent to which the perception-production linkage is due to experience is unknown. We addressed this issue by asking whether articulatory configurations can influence infants' speech perception performance. To eliminate influences from specific linguistic experience, we studied preverbal, 6-mo-old infants and tested the discrimination of a nonnative, and hence never-before-experienced, speech sound distinction. In three experimental studies, we used teething toys to control the position and movement of the tongue tip while the infants listened to the speech sounds. Using ultrasound imaging technology, we verified that the teething toys consistently and effectively constrained the movement and positioning of infants' tongues. With a looking-time procedure, we found that temporarily restraining infants' articulators impeded their discrimination of a nonnative consonant contrast but only when the relevant articulator was selectively restrained to prevent the movements associated with producing those sounds. Our results provide striking evidence that even before infants speak their first words and without specific listening experience, sensorimotor information from the articulators influences speech perception. These results transform theories of speech perception by suggesting that even at the initial stages of development, oral-motor movements influence speech sound discrimination. Moreover, an experimentally induced "impairment" in articulator movement can compromise speech perception performance, raising the question of whether long-term oral-motor impairments may impact perceptual development. PMID:26460030

  20. Speech graphs provide a quantitative measure of thought disorder in psychosis.

    Natalia B Mota

    Full Text Available BACKGROUND: Psychosis has various causes, including mania and schizophrenia. Since the differential diagnosis of psychosis is exclusively based on subjective assessments of oral interviews with patients, an objective quantification of the speech disturbances that characterize mania and schizophrenia is in order. In principle, such quantification could be achieved by the analysis of speech graphs. A graph represents a network with nodes connected by edges; in speech graphs, nodes correspond to words and edges correspond to semantic and grammatical relationships. METHODOLOGY/PRINCIPAL FINDINGS: To quantify speech differences related to psychosis, interviews with schizophrenics, manics and normal subjects were recorded and represented as graphs. Manics scored significantly higher than schizophrenics in ten graph measures. Psychopathological symptoms such as logorrhea, poor speech, and flight of thoughts were grasped by the analysis even when verbosity differences were discounted. Binary classifiers based on speech graph measures sorted schizophrenics from manics with up to 93.8% of sensitivity and 93.7% of specificity. In contrast, sorting based on the scores of two standard psychiatric scales (BPRS and PANSS reached only 62.5% of sensitivity and specificity. CONCLUSIONS/SIGNIFICANCE: The results demonstrate that alterations of the thought process manifested in the speech of psychotic patients can be objectively measured using graph-theoretical tools, developed to capture specific features of the normal and dysfunctional flow of thought, such as divergence and recurrence. The quantitative analysis of speech graphs is not redundant with standard psychometric scales but rather complementary, as it yields a very accurate sorting of schizophrenics and manics. Overall, the results point to automated psychiatric diagnosis based not on what is said, but on how it is said.